Completed EARLY DETECTION AND DIAGNOSIS COMMITTEE - PILOT Europe PMC

Investigating transportability of cancer detection models across datasets and time using population-wide electronic health data

Funder	Cancer Research UK
Recipient Organization	University of Cambridge
Country	United Kingdom
Start Date	Jun 01, 2023
End Date	May 31, 2024
Duration	365 days
Data Source	Europe PMC
Grant ID	`EDDPMA-May22\100062`

Grant Description

>> Background: Much effort is going into developing risk prediction models for early detection and diagnostic (EDD) of cancers using designed, in-depth cohort datasets, with the aim of eventual clinical application.

However, models trained and validated on these datasets may not be fully representative of clinical realities, due to selection bias, recording bias and disruptions over time, e.g. changes in medical and recording practices in cancer, including feedback loops from deploying the EDD cancer models, and the COVID pandemic.

These distribution differences in the derivation and target electronic health records (EHRs) could significantly compromise model performance in clinical practice and exacerbate health inequity for the underrepresented.

The recent availability of population-wide, near real-time EHRs gives us the opportunity to quantify how models developed in cohort data, e.g.

UKBiobank, generalise under population and dynamic shifts. >> Aims: Using population-wide EHRs that capture the highly-disrupted COVID-19 pandemic, we will investigate the critical but unexplored concerns in the development of EDD cancer models, regarding model generalisability and transportability across populations, datasets and time: -Identify the sources and nature of distribution shifts present in population-wide EHRs. -Identify population subgroups, characterised by combinations of shared attributes, most negatively-impacted by distribution shifts, thereby promoting health equity. >> Methods: -Define required risk predictors and outcomes in the England-wide EHRs, including data from primary and secondary care, death registrations on approximately 57 million individuals. -Describe and interpret distributions of predictors and cancer diagnoses across populations and time. -Implement and assess a candidate set of scalable clustering and tree-based algorithms to identify population subgroups, characterised by combinations of shared characteristics, most likely to suffer predictive performance degradation when exposed to distribution shifts. >> How the results of this research will be used: -Provide evidence for distribution shifts that EDD models encounter in real world settings, elucidating their sources, nature and quantified impact on predictive performances of contemporary models. - Highlight critical flaws to be accounted for in future model development and model-based decision making to enable clinical uptake for multiple cancers. -Catalyses subsequent multidisciplinary methodological development projects for shift detection and correction. -Validation of EDD models on representative data, allowing novel investigations of disparity in EDD model predictive performances for minority groups. -Creation of shared reusable code lists, wrangling and modelling code bases for future EDD projects.

All Grantees

No grantees listed

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

Investigating transportability of cancer detection models across datasets and time using population-wide electronic health data

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants