Completed RESEARCH GRANT UKRI Gateway to Research

3D-Proteomics: FAIRification of proteomics data for comprehensive integration with structural biology information

£7.02M GBP

Funder	Biotechnology and Biological Sciences Research Council
Recipient Organization	Embl - European Bioinformatics Institute
Country	United Kingdom
Start Date	Apr 18, 2022
End Date	Apr 17, 2025
Duration	1,095 days
Number of Grantees	2
Roles	Co-Investigator; Principal Investigator
Data Source	UKRI Gateway to Research
Grant ID	`BB/V018779/1`

Grant Description

Proteins are molecules found in all living organisms that provide structure and carry out most of the important functions in a cell, including catalysing (causing or speeding up) chemical reactions and signalling between different cells. Proteomics is the study of the entire set of proteins in a given biological sample such as a cell or an organism like a bacteria, plant or human.

Since proteins are essential for so many crucial functions, proteomics can tell us a lot about how organisms work and also about what happens in illnesses, as well as helping to identify potential treatments. This means that proteomics is used across many areas of beneficial biological and biomedical research.

Currently the primary technology used in proteomics is a technique called mass spectrometry (MS), which works by breaking up a protein into small fragments, sorting them and then reporting their mass. The quantity and identity of the protein can then be determined using different software tools. The structure of a protein is also very important, as the way that a protein is organised via folding will help it to carry out its job.

The structure also determines how it is able to interact with other proteins, for example a protein that transports another protein around a cell needs to have a part that binds to it specifically. Protein structure can be studied using techniques like x-ray crystallography, which makes use of the way that different structures diffract (bend) x-rays.

A more recent development called cross-linking MS (CL-MS) is a powerful tool for visualising how proteins fold and join together, and it works by running MS on proteins that are linked by specialised chemical reagents called cross-linkers. Unfortunately, CL-MS does not yet have coordinated mature open standards and existing datasets are not well linked to other information about protein structure.

This means that it is difficult to compare and integrate findings between research groups and that important knowledge may be missed.

It is important that proteomics databases follow the FAIR principles of being easy to find (Findable), free and open source (Accessible), easily shared and processed (Interoperable) and Reusable. Our research groups manage two world-leading databases: the PRoteomics IDEntifications database (PRIDE), which is a repository for proteomics data generated using MS, and the Protein Data Bank (PDB), which is home to 3D structural data for large molecules including proteins.

This project will combine these tools with our expertise in CL-MS in order to develop FAIR data standards and software so that proteomics data generated using CL-MS has a common format and processing pipeline, and so that a suite of software tools is made available in order to process and analyse the data freely and easily. PRIDE will be extended to include these standardised CL-MS data formats, and key software tools for data deposition and visualisation will be made available.

As a key point, we will create links between PRIDE and PDB in order to allow for joined-up examination of structural data, including integration between the PDB and PRIDE submission systems. This will mean that researchers will be able to more easily analyse proteins and identify links between their research and other projects, even if they don't have access to CL-MS equipment themselves.

The tools and standards that will be generated by this project will benefit researchers across a wide range of biological and biomedical fields, and will provide an interface between proteomics and structural biology information that will enhance and connect research findings. The software will ensure that important and novel structural proteomics data are made accessible and findable, and the standards will maintain its interoperability and reusability.

We will make sure that our work is disseminated widely and we will deliver workshops to train and assist researchers in making full use of these valuable resources.

All Grantees

Embl - European Bioinformatics Institute

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

3D-Proteomics: FAIRification of proteomics data for comprehensive integration with structural biology information

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants