Completed NON-SBIR/STTR RPGS NIH (US)

Domain-Knowledge Informed Deep Learning for Early Detection of Pancreatic Cancer

$1.77M USD

Funder	NATIONAL CANCER INSTITUTE
Recipient Organization	Columbia University Health Sciences
Country	United States
Start Date	Jul 28, 2021
End Date	Jun 30, 2023
Duration	702 days
Number of Grantees	2
Roles	Principal Investigator; Co-Investigator
Data Source	NIH (US)
Grant ID	`10317236`

Grant Description

PROJECT SUMMARY The goal of this project is to leverage deep-learning algorithms on Electronic Health Records (EHRs) to improve early detection of pancreatic ductal adenocarcinoma (PDAC), a malignancy with high mortality and morbidity.

Although numerous risk factors have been identified, PDAC is most often found in later stages when effective treatments are not feasible or their survival benefit is limited.

In this R21, we aim to develop novel structured methodologies for systematically incorporating feature grouping strategy from expert domain knowledge into the training procedure of deep-learning algorithms for improving PDAC diagnosis.

The overarching hypothesis for this study is that the groups of highly correlated variables will combine to form superior and interpretable predictors compared to individual clinical variables (current proposal).

Furthermore, these new predictors represented by the group of related data will be useful for other downstream tasks such as risk factor identification via causal discovery (future research).

The proposed research presents an innovative approach towards unifying human and artificial intelligence, using explainable algorithms to build interpretable prediction models, in contrast to conventional deep-learning algorithms which are non-traceable by humans due to their black-box nature.

An optimal strategy for creating composite (grouped) variables should maximize both predictive power as well as human-interpretability.

We will thus explore a variety of grouping strategies relying heavily on human-expert knowledge (e.g. clinical workflows) as well as auto-correlation tests.

An effective grouping strategy will allow our prediction model to learn the relative importance of both individual measurements as well as interpretable groups of measurements in predicting PDAC.

Examples in the literature show that such grouped predictors often have superior predictive power compared to their individual components, which can be attributed to the mutual information shared within the group.

Different types of explainable (attention) neural networks may also be applied depending on the group characteristics to further improve interpretability as well as prediction accuracy.

We believe that similar methodologies applied to predictive modeling in healthcare data have the potential to fundamentally advance clinical decision making with improved model interpretability.

The success of this proposal will be leveraged in a larger ongoing project which aims to establish new causal relationships between various risk factors associated with PDAC. This involves an advanced graph-based approach for building interpretable models.

Our direct application of causal discoveries in the future research will be a program for collecting patient-generated health data (PGHD) for PDAC early diagnosis.

All Grantees

Columbia University Health Sciences

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

Domain-Knowledge Informed Deep Learning for Early Detection of Pancreatic Cancer

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants