Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | Louisiana State University |
| Country | United States |
| Start Date | May 01, 2025 |
| End Date | Apr 30, 2027 |
| Duration | 729 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2434261 |
The field of computing is central to scientific pursuit and has touched almost every area of science, technology and humanity. Computing has ushered us into a whole new era of possibilities and discoveries as more and more data and computing resources are available. While throwing computational resources at a problem has been a successful initial approach, eventually the efficiency of resources becomes important.
Theoretical computer science rigorously studies the difficulties of problems based on resource limitations like time, energy, space, compressibility, input/output, etc. For the text data, space resources and compressibility become important factors. In the field of compressed data structures, the goal is to develop data structures which take space nearly optimal with respect to the best compression while at the same time can answer the queries as fast as the best uncompressed data structure can answer.
This project considers parameterized pattern matching problem and constructing compressed index for it. The problem has direct applicability in software plagiarism detection, cloning and versioning systems. Many of the results and components considered in this project are likely good inclusion for course projects and homework problems.
The project will support a graduate student. The results of this work will be widely disseminated and published in conferences and journals.
In parameterized pattern matching, the text consists of parameterized characters and static characters. Two strings are parameterized match if there is a bijection between parameterized alphabets of the two strings, which when applied to the characters of one of the strings it becomes equal to the other. A known encoding ‘prev’ can convert the text (or each suffix of the text) into a canonical form such that two parameterized strings are a match if and only if their ‘prev’ encoding is the same.
In the investigator’s previous work, a Burrows-Wheeler transform (BWT) based approach as well as a Phi-function based approach have been shown to obtain a compact index for the problem. In this project. we aim to: (1) Define entropy-based compression measures for parameterized text and achieve an index with such space occupancies, (2) Explore the construction aspects of parameterized index – including implementation issues like memory bottleneck and external memory construction, and (3) Obtain best space-time tradeoff results based on combination of Phi function and BWT.
Apart from these, the project will also focus on the longest common extension LCE problem and indexing for two-dimensional pattern matching.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Louisiana State University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant