Loading…
Loading grant details…
| Funder | National Institute for Health and Care Research |
|---|---|
| Recipient Organization | King's College London |
| Country | United Kingdom |
| Start Date | Jan 02, 2025 |
| End Date | Jul 01, 2026 |
| Duration | 545 days |
| Number of Grantees | 3 |
| Roles | Co-Principal Investigator; Principal Investigator; Award Holder |
| Data Source | NIHR Open Data-Funded Portfolio |
| Grant ID | NIHR206858 |
Background No tools currently exist to estimate the sample size needed to develop prediction models using machine learning or longitudinal data. Tools for regression models underestimate the sample size requirements of data-hungry machine-learning algorithms.
Clinical prediction models are increasingly used across the NHS, accelerated by the digital transformation of healthcare.
Prediction-based tools enable proactive learning healthcare systems ( predict and pre-empt ) through stratified treatment, preventative screening, and tailored services.
However, most models are developed with small sample sizes, leading to inaccurate risk estimates and unstable models that do not generalise to underrepresented groups. Adequate development samples are crucial to create reliable and ethical models.
Aims and objectives We aim to address a major gap in current methodologies by developing an accessible, user-friendly tool to estimate minimum samples for machine learning and longitudinal prediction models.
Specific objectives include: To work with patients to understand their views about predictive tools, create generalisable synthetic datasets, co-design an accessible project website and reference materials, and learn the best ways of disseminating the tool (WP1, WP5).
To meet with prediction modelling and machine learning experts to understand their requirements and address barriers to uptake (WP1, WP5).
To validate our prototype software to ensure consistency with existing tools ( pmsampsize ) in simple settings and optimise computational efficiency (WP2).
To release a user-friendly, open-source software package (WP3) and demonstrate the use of the software in a comprehensive benchmarking study (WP4).
To disseminate our package to patients and the public and encourage uptake among researchers through training and documentation (WP5). Methods We have developed a prototype software package that works for simple models.
This project will validate and extend our prototype, focusing on four important additions: Sample size calculations for popular machine learning models (penalised regression, random forests, and gradient boosting). Sample size calculations for longitudinal prediction models (landmarking and joint modelling).
Create flexible data generators to provide fine-grained control over the simulated datasets. Add prediction stability as a criterion for determining the minimum sample size. Timelines for delivery The project will last 18 months. We will be guided throughout by a patient and researcher advisory group.
In months 0-6, we will establish our advisory group, validate our prototype software, co-create the project website, and meet local charity and service user groups. In months 4-15, we will implement the planned extensions.
In months 12-18, we will deliver training to researchers across the UK and online and disseminate reference materials to a wide range of stakeholders to raise awareness, increase uptake, and enable patient advocacy for better methods. Anticipated impact and dissemination Our study has the potential for immediate and transformative impact.
Thousands of prediction models are developed annually, but most have inadequate samples. No sample size tools exist for widely used machine learning or longitudinal prediction models.
Our software will revolutionise the toolkit available to applied researchers, enhancing the quality and ethical standards of prediction modelling studies by ensuring adequate sample sizes.
King's College London
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant