Loading…
Loading grant details…
| Funder | Cancer Research UK |
|---|---|
| Recipient Organization | University of Cambridge |
| Country | United Kingdom |
| Start Date | Jun 01, 2023 |
| End Date | May 31, 2024 |
| Duration | 365 days |
| Data Source | Europe PMC |
| Grant ID | EDDPMA-May22\100062 |
>> Background: Much effort is going into developing risk prediction models for early detection and diagnostic (EDD) of cancers using designed, in-depth cohort datasets, with the aim of eventual clinical application.
However, models trained and validated on these datasets may not be fully representative of clinical realities, due to selection bias, recording bias and disruptions over time, e.g. changes in medical and recording practices in cancer, including feedback loops from deploying the EDD cancer models, and the COVID pandemic.
These distribution differences in the derivation and target electronic health records (EHRs) could significantly compromise model performance in clinical practice and exacerbate health inequity for the underrepresented.
The recent availability of population-wide, near real-time EHRs gives us the opportunity to quantify how models developed in cohort data, e.g.
UKBiobank, generalise under population and dynamic shifts. >> Aims: Using population-wide EHRs that capture the highly-disrupted COVID-19 pandemic, we will investigate the critical but unexplored concerns in the development of EDD cancer models, regarding model generalisability and transportability across populations, datasets and time: -Identify the sources and nature of distribution shifts present in population-wide EHRs. -Identify population subgroups, characterised by combinations of shared attributes, most negatively-impacted by distribution shifts, thereby promoting health equity. >> Methods: -Define required risk predictors and outcomes in the England-wide EHRs, including data from primary and secondary care, death registrations on approximately 57 million individuals. -Describe and interpret distributions of predictors and cancer diagnoses across populations and time. -Implement and assess a candidate set of scalable clustering and tree-based algorithms to identify population subgroups, characterised by combinations of shared characteristics, most likely to suffer predictive performance degradation when exposed to distribution shifts. >> How the results of this research will be used: -Provide evidence for distribution shifts that EDD models encounter in real world settings, elucidating their sources, nature and quantified impact on predictive performances of contemporary models. - Highlight critical flaws to be accounted for in future model development and model-based decision making to enable clinical uptake for multiple cancers. -Catalyses subsequent multidisciplinary methodological development projects for shift detection and correction. -Validation of EDD models on representative data, allowing novel investigations of disparity in EDD model predictive performances for minority groups. -Creation of shared reusable code lists, wrangling and modelling code bases for future EDD projects.
No grantees listed
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant