Active NON-SBIR/STTR RPGS NIH (US)

A machine-learning platform to illuminate the chemical dark matter in mass spectrometry-based metabolomics

$3.92M USD

Funder	OFFICE OF THE DIRECTOR, NATIONAL INSTITUTES OF HEALTH
Recipient Organization	Princeton University
Country	United States
Start Date	Sep 19, 2024
End Date	Jul 31, 2029
Duration	1,776 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	NIH (US)
Grant ID	`10910517`

Grant Description

7. PROJECT SUMMARY/ABSTRACT

–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– The human body contains thousands of small molecules, and is exposed to thousands more during daily life.

This complex chemical ecosystem reflects both the endogenous metabolism of human cells, as well as xenobiotic exposures from our diets, our gut flora, and our natural and built environments. At present, however, the vast majority of these small molecules remain unknown. Remarkably, this gap is not due to a lack of

appropriate experimental technology: mass spectrometry-based metabolomics routinely detects thousands of distinct chemical signals in any biological sample. However, only a small fraction of these signals are routinely identified. The remaining profusion of unidentified chemical entities has been dubbed the “dark matter” of the

metabolome. Computational tools to shed light on this chemical dark matter could transform our understanding of disease pathobiology, open new avenues for personalized medicine, and increase the scope and efficiency of any metabolomic study. At the same time, true chemical dark matter must be differentiated from the variety

of technical artefacts, contaminants, and redundant forms of the same biomolecules that are also detected by mass spectrometry. This project proposes to establish a suite of computational tools that will dramatically advance our ability to interpret mass spectrometry-based metabolomic datasets, and thereby begin to unlock

the dark metabolome. These tools will apply emerging techniques from the field of natural language processing, including the same large language model (LLM) architectures that power tools like ChatGPT, to address two of the most important unmet needs in small molecule mass spectrometry. In Aim 1, we will

develop DecipherMS, a computational tool for de novo annotation of both known and unknown chemical structures from MS/MS spectra. Despite decades of work in computational mass spectrometry, de novo annotation of unknown molecules remains a critical gap, with virtually all existing tools designed to search in a

database of known structures. DecipherMS will overcome this gap by using language models to decode unknown chemical structures directly from MS/MS spectra, using a novel data augmentation strategy to learn effectively from limited training data. In Aim 2, we will develop FoundationMS, a foundation model for mass

spectrometry-based metabolomics. FoundationMS will standardize data preprocessing workflows that are required to identify mass spectrometric signals that should be brought forward for annotation in the first place, which will be achieved by learning from a repository-scale corpus of metabolomic data in a self-supervised

manner. The resulting model will be fine-tuned to perform common preprocessing tasks including peak picking, retention time alignment, adduct removal, and chemical formula assignment. Both DecipherMS and FoundationMS will be rigorously benchmarked using appropriate datasets. Implementing these approaches in

well-documented, user-friendly, and computationally efficient software will address central gaps in our ability to measure small molecules and shift existing paradigms in metabolomic data analysis.

All Grantees

Princeton University

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

A machine-learning platform to illuminate the chemical dark matter in mass spectrometry-based metabolomics

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants