Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of Chicago |
| Country | United States |
| Start Date | Aug 01, 2024 |
| End Date | Jul 31, 2028 |
| Duration | 1,460 days |
| Number of Grantees | 3 |
| Roles | Principal Investigator; Co-Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2411188 |
Increasing data sizes, greater hardware specialization, faster networks, and larger collaborative teams result in modern research employing ever-more distributed cyberinfrastructure (CI). It is now commonplace for data to be produced in multiple locations (e.g., in research laboratories or on supercomputers), analyzed in others (e.g., local, campus, or national CI), and shared, published, or archived in yet others.
This increasingly distributed CI is enabling exciting discoveries in many domains, but also leads to difficulties for researchers who must manage, discover, and act upon large volumes of distributed data. Growing amounts of valuable research time is spent on mundane but necessary data management tasks; crucial data are lost; important provenance information cannot be determined; and analyses are repeated.
To tackle these problems, this project will build Globus Search, a new capability integrated into the widely used Globus platform, that will enable the creation of, and search within and across, distributed Globus collections. By thus allowing researchers to easily discover data regardless of location, group data into “virtual” collections, and act on virtual collections irrespective of where individual files reside, Globus Search will allow even the largest and most distributed teams to organize, navigate, and operate on their data.
Globus has emerged as an essential tool for alleviating the numerous frictions associated with managing, accessing, moving, and sharing data within and across the many distinct data collections that constitute the modern CI experience. However, an implicit assumption has always been that researchers know where data reside: an assumption that becomes increasingly untenable as data and CI grow in complexity.
This project will implement a suite of new capabilities including methods to crawl parallel and distributed storage systems; capture events (e.g., file creation, modification, deletion) from these storage systems; extract metadata from within diverse scientific file formats; communicate events securely and reliably to the cloud-hosted service; index files and metadata in a secure and accessible manner; and develop new interfaces for navigating distributed data collections, creating virtual collections, and acting on these virtual collections. Leveraging the hybrid cloud/local service deployment approach that has proven so successful for other Globus services, Globus Search will build on powerful, scalable, and robust cloud-hosted search services to deliver a rich search experience to users via the Globus web app, command line interface, and Python and Javascript libraries.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of Chicago
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant