Loading…

Loading grant details…

Active HORIZON European Commission

Compressed Indexes for Regular Languages with Applications to Computational Pan-genomics

€1.39M EUR

Funder European Commission
Recipient Organization Universita Ca' Foscari Venezia
Country Italy
Start Date Sep 01, 2022
End Date Aug 31, 2027
Duration 1,825 days
Number of Grantees 1
Roles Coordinator
Data Source European Commission
Grant ID 101039208
Grant Description

Sorting is, arguably, the most powerful algorithmic primitive when it comes to indexing data. At the same time, the regularities exposed by sorting are precisely those enabling data compression.

In the last two decades, this fascinating duality has led researchers to the design of compressed full-text indexes: data structures supporting fast pattern matching queries over compressed text.

In this project, we revisit the natural generalization of the problem to labeled graphs from a new perspective: we interpret graphs as finite-state automata and investigate the connections existing between their propensity to be sorted and the languages they recognize.

Our novel language-theoretic approach makes it possible to transfer fundamental results between the mature fields of regular language theory and compressed text indexing. We aim at building this bridge by developing a new theory of compressed regular language indexing.

This project finds fundamental applications to the rapidly-expanding field of computational pan-genomics, where the goal is to study the variations contained in the genomes of an entire population.

Recent research has shown that representing pan-genomes as labeled graphs is an important step to reduce reference allele bias.

Existing approaches, however, can index only restricted classes of graphs, thereby limiting the practical applicability of such powerful pan-genome representations.

Our innovative approach, based on sorting regular languages by partial co-lexicographic orders, changes the perspective from which the compressed indexing problem has been tackled in the literature.

This project aims at developing a theory of graph indexing and compression based on the natural interplay between sorting and regular language theory.

We will apply these findings inside practical tools for aligning arbitrarily-long DNA fragments against compressed pan-genome graphs.

All Grantees

Universita Ca' Foscari Venezia

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant