Loading…

Loading grant details…

Active NON-SBIR/STTR RPGS NIH (US)

Novel subsampling and analysis of big data using examples from IRIS ® Registry

$2.17M USD

Funder NATIONAL EYE INSTITUTE
Recipient Organization Wills Eye Health System
Country United States
Start Date Sep 01, 2024
End Date Aug 31, 2026
Duration 729 days
Number of Grantees 2
Roles Co-Investigator; Principal Investigator
Data Source NIH (US)
Grant ID 10988038
Grant Description

Abstract Observational studies based on big data from electronic medical records (EMRs) have been conducted recently in many areas of medical research [1, 2]. These results provide high impact information on rare events or rare diseases that otherwise would not be available in non-EMR studies with much smaller sample sizes. Many EMR

studies have also been done in Ophthalmology [3], especially using data from the Intelligent Research in Sight (IRIS®) [4]. These studies face several challenges which could affect the validity of the study results. First, one common issue is the EMR data may not fully represent the background population, especially minority groups.

This may cause biased disease estimates for underrepresented groups and gives invalid conclusions, if the ob- served differences are not taken into consideration during data analysis. Second, when estimating prevalence and incidence of target diseases and their associated risk factors, the entire cohort without the primary disease

or a group of healthy individuals with similar sample sizes could be considered as the control group (over 70 mil- lions records in IRIS). Optimal sampling methods adjusting for related risk factors are inevitable tools (currently unavailable) for selecting equally informative study groups with much smaller sample sizes and higher compu-

tational efficiency. Thirdly, among about half of the publsihed IRIS studies to date, the primary outcomes are frequently rare events. Moreover, when we combine classes from categorical variables, unbalanced subgroups with much smaller sample sizes often appear and this may lead to unreliable estimates with much wider confi-

dence intervals. The results become even less trustworthy when the variable recombination happens to a rare disease outcome. It is evident that other big data EMR studies could also face the same challenges. In addi- tion, these issues may have significant financial consequences, e.g, the lengthy running times are costly when

the EMR is hosted in secure cloud environment and the situation becomes even worse when statistical software would crash without giving any meaningful results after long runs. To address these challenges, in this applica- tion, through a collaborative effort between Wills Eye Hospital and the University of Connecticut, that combines

theoretical and applied statistical expertise, we propose to develop and evaluate novel subsampling and optimal analysis methods which to the best of our knowledge do not exist to date. This application proposes to achieve the following aims: 1) Derive optimal subsampling probabilities for both rare and non-rare events data with both

categorical and numerical covariates, which are also invariant to measurement scales for numerical covariates; 2) Design an effect balancing approach for covariates with rare category combinations to better include underrepre- sented subgroups to prevent potential disparity in analysis results and protect health disparity. 3) Develop optimal

sampling strategies to adjust for selection bias in EMR studies. Most importantly, we will create user-friendly software packages on optimal subsampling for practitioners that will be applicable for similar settings in medical research.

All Grantees

Wills Eye Health System

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant