Loading…

Loading grant details…

Completed STANDARD GRANT National Science Foundation (US)

Data CI Pilot: VariMat Streaming Polystore Integration of Varied Experimental Materials Data

$13.16M USD

Funder National Science Foundation (US)
Recipient Organization Johns Hopkins University
Country United States
Start Date Oct 01, 2021
End Date Sep 30, 2025
Duration 1,460 days
Number of Grantees 5
Roles Principal Investigator; Co-Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2129051
Grant Description

VariMat is a pilot project to integrate experimental data sources with cyberinfrastructure components and bridge the data variety gap in experimental materials research data. The central goal of Materials Science and Engineering is to discover and deploy innovative materials to serve society, but the process is complex and frequently slow. Recent work to accelerate such materials discovery is focused by the Materials Genome Initiative (MGI), which recognizes the critical need for new materials in fields as diverse as energy, transportation, and national security.

The MGI centers on harnessing the data revolution and fueling data-intensive methods including artificial intelligence and machine learning. To reach the potential of the MGI, however, there is pressing need for robust, high-performance data cyberinfrastructure (CI) that facilitates machine-actionable data and better implementation of FAIR data principles suited to the materials domain.

Among remaining CI gaps, none is more important than the need to integrate experimental data in ways that make it more operable and consonant to the research community needs. This complex gap is compounded by the transdisciplinary nature of materials science and engineering where most projects depend on experimental data collected by highly varied techniques, in distributed labs, and by multiple investigators.

The variety and volume of this data layers onto the dispersed nature of materials research to create a data variety gap that impedes both rapid use and valuable reuse of data as required by many data-hungry machine learning methods. The VariMat Data CI is designed to break these barriers with a pilot instantiation in the subdomain of quantum materials and maximize experimental data value across its whole lifecycle.

The project links teams from the UCSB Quantum Foundry and PARADIM, a Materials Innovation Platform (MIP) – two of the NSF's premier investments in the Quantum Leap. This linkage integrates strong science drivers with infrastructure development while adopting a strategic focus on influential centers of high-quality, high-volume data production that are conduits to user training and workflows.

VariMat will provide investigators with integrated and timely access to the breadth of experimental data needed to enable new discovery pathways and drive novel-materials development that relies on controlled, replicable synthesis; structural and compositional characterization; property determination; and connectability to theory and modeling studies.

The proposed research will establish a new paradigm for an integrated data infrastructure leveraging a streaming layer for real-time ingest to a polystore. VariMat uses an automated streaming layer to link instrumental data to a polystore of heterogeneous data management systems that optimize storage, query, and access for disparate data types. The polystore encompasses multiple data models while unifying the query process for users.

The VariMat polystore creates a new option for unified management and query of disparate experimental Big Data created across distributed facilities and to expand FAIR data compliance in the materials domain. VariMat implements an innovative stream processing Data Ingress Module for analytics driven ingest of experimental data. Together, the streaming and polystore layers serve a user-oriented web portal that combines advanced search with data analysis, visualization, and compute resources.

Such integration will be facilitated by a unified semantic standard specific to the instantiation and that spans the project. VariMat will leverage the PARADIM data model describing synthesis and characterization with a directed acyclic graph (DAG) allowing traversal of the materials entire history. VariMat's "loosely coupled" architecture provides operational and managerial independence of subsystems well suited for geographically distributed systems with on-going evolution in components as is typical in mid-scale or larger materials science research.

While the infrastructure fills a critical, community identified gap, VariMat components will be readily deployable and have broad applicability in other scientific fields dependent on distributed, operationally independent instrumental laboratories. Automated deployment and open source components will facilitate ready instantiation in new domains. To maximize impact, concepts and tools developed will be disseminated through freely available, open source codes, online tutorials and data sets, and trainings that leverage existing schools and workshops at the Quantum Foundry and PARADIM.

This award by the Office of Advanced Cyberinfrastructure is jointly supported by the Division of Materials Research within the NSF Directorate for Mathematical and Physical Sciences.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Johns Hopkins University

Advertisement
Discover thousands of grant opportunities
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant