Completed RESEARCH AND INNOVATION UKRI Gateway to Research

Balancing the data scales: A cost-benefit analysis of low-fidelity synthetic data for data owners and providers

£926.3K GBP

Funder	Economic and Social Research Council
Recipient Organization	University of Essex
Country	United Kingdom
Start Date	Apr 07, 2024
End Date	Apr 06, 2025
Duration	364 days
Number of Grantees	4
Roles	Co-Investigator; Principal Investigator
Data Source	UKRI Gateway to Research
Grant ID	`ES/Z502467/1`

Grant Description

The growing discourse around synthetic data underscores its potential not only in addressing data challenges in a fast-paced changing landscape but for fostering innovation and accelerating advancements in data analytics and artificial intelligence. From optimising data sharing and utility (James et al., 2021), to sustaining and promoting reproducibility (Burgard et al., 2017) to mitigating disclosure (Nikolenko, 2021) synthetic data has emerged as a solution to various complexities of the data ecosystem.

The project proposes a mixed-methods approach and seeks to explore the operational, economic, and efficiency aspects of using low-fidelity synthetic data from the perspectives of data owners and Trusted Research Environments (TREs).

The essence of the challenge is in understanding the tangible and intangible costs associated with creating and sharing low-fidelity synthetic data, alongside measuring its utility and acceptance among data producers, data oweners and TREs. The broader aim of the project is to foster a nuanced understanding that could potentially catalyse a shift towards a more efficient and publicly acceptable model of synthetic data dissemination.

This project is centred around three primary goals:

1. to evaluate the comprehensive costs incurred by data owners and TREs in the creation and ongoing maintenance of low-fidelity synthetic data, including the initial production of synthetic data and subsequent costs;

2. to assess the various models of synthetic data sharing, evaluating the implications and efficiencies for data owners and TREs, covering all aspects from pre-ingest to curation procedures, metadata sharing, and data discoverability; and

3. to measure the efficiency improvements for data owners and TREs when synthetic data is available, analysing impacts on resources, secure environment usage load, and the uptake dynamics between synthetic and real datasets by researchers.

Commencing in March 2024, the project will begin with stakeholder engagement, forming an expert panel and aligning collaborative efforts with parallel projects. Following a robust literature review, the project will embark on a methodical data collection journey through a targeted survey with data creators, case studies with d and data owners and providers of synthetic data, and a focus group with TRE representatives.

The insights collected from these activities will be analysed and synthesized to draft a comprehensive report delineating the findings and sensible recommendations for scaling up the production and dissemination of low-fidelity synthetic data as applicable.

The potential applications and benefits of the proposed work are diverse. The project aims to provide a solid foundation for data owners and TREs to make informed decisions regarding synthetic data production and sharing. Furthermore, the findings could significantly influence future policy concerning data privacy thereby having a broader impact on the research community and public perception.

By fostering a deeper understanding and establishing a dialogue among key stakeholders, this project strives to bridge the existing knowledge gap and push the domain of synthetic data into a new era of informed and efficient usage. Through meticulous data collection and analysis, the project aims to unravel the intricacies of low-fidelity synthetic data, aiming to pave the way for an efficient, cost-effective, and publicly acceptable framework of synthetic data production and dissemination.

All Grantees

University of Essex; The University of Manchester

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

Balancing the data scales: A cost-benefit analysis of low-fidelity synthetic data for data owners and providers

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants