Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | Tulane University |
| Country | United States |
| Start Date | Oct 01, 2021 |
| End Date | Sep 30, 2023 |
| Duration | 729 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2136744 |
Cutting-edge science relies on scientists’ ability to sift through and access the massive amounts of data that are being produced by the latest research. Much of that data is stored in online databases and is searchable only by using specific, scientific terms, like keywords, tags, or descriptions. If someone doesn’t know exactly the right terms to use, they often can’t access all the data that might be useful for their research.
By using mathematical approaches for information retrieval in a new way, this project will study whether a powerful search tool, called content-based search, can be modified for scientific data. If successful, this project will free data users from needing to know exactly which keywords to use, transforming how scientists are able to access and share data and creating new opportunities for scientists with vastly different expertise to work together.
One particularly promising way to describe the content of scientific data is through a dataset’s topology. Therefore, this project will develop approaches to compute topological similarity that are smaller, faster, and more scalable than previously thought possible, with the goal of creating a method for cross-cutting, content-based search of scientific data.
Specifically, the investigators will develop a learned-hash function to convert a dataset’s persistence diagram - the common encoding of its topology - to a simple binary code. This hash will be trained such that the bitwise distance between codes will maintain a measure of topological similarity between datasets. This will convert topological comparisons from the current state of an expensive bottleneck to one with nominal processing costs that can scale to large database queries.
Initially, this project will focus on binary codes that maintain clusters and neighborhoods, ultimately developing codes that are rank or semi-metric preserving. The investigators will also explore strategies for training a learned-hash function on synthetic data, with the goal of developing a fully domain-oblivious approach to content-based search.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Tulane University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant