Loading…

Loading grant details…

Active STANDARD GRANT National Science Foundation (US)

CISE MSI: RPP: IIS: Deep Clustering of Unlabeled Tabular Data for Transfer Learning in Heterogeneous Feature Space

$2M USD

Funder National Science Foundation (US)
Recipient Organization Tennessee State University
Country United States
Start Date Jan 01, 2025
End Date Dec 31, 2026
Duration 729 days
Number of Grantees 3
Roles Principal Investigator; Co-Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2431058
Grant Description

The recent artificial intelligence (AI) revolution has been possible due to the availability of large pre-trained image and language models, which are fine-tuned for various domain applications through knowledge transfer. Such cross-domain transfer learning is feasible because image patterns or text semantics are shared across many domains. In contrast, many business, medical, and scientific data sets are structured in rows and columns as tabular data, which surprisingly remain a challenge for modern AI due to heterogeneity in columns and tables.

This project will investigate theoretical and analytical solutions to enable cross-domain transfer learning of tabular data by complementing theoretical statistics and AI expertise. A cross-domain learning framework will enable the aggregation of knowledge from heterogeneous tabular data sources. By leveraging state-of-the-art data clustering methods, this project will provide new computational frameworks to learn from untapped and unlabeled tabular data sources to advance data-driven health science and informatics.

The project will also pave the path for foundation and collaboration in data science education and research to train future data scientists from historically underrepresented groups.

The project aims to investigate two unmet problems in tabular data science. First problem is hybrid deep representation clustering of unlabeled tabular data. Representation learning from unlabeled data is non-trivial, but highly practical when data samples in tables are hard to label by visually reading a heterogeneous feature space.

This project will investigate deep clustering solutions for unlabeled tabular data by integrating multivariate statistical theories into innovative deep representation learning. The second problem is transfer knowledge across unlabeled data tables. Cross-domain and transfer learning approaches are one of the cornerstones of modern AI applied to image and text data.

However, similar approaches are challenging in tabular data due to the heterogeneity in feature space and application domains. The project will leverage recent breakthroughs in modeling data distributions and statistics to learn a novel cluster-friendly deep feature space from unlabeled tabular data. The cluster-friendly representation will facilitate subsequent learning and distillation of mutual and complementary information between data tables to enable transfer learning.

The computational frameworks will be evaluated on tabular data sets from real-world Electronic Health Records in patient risk stratification tasks. The project will share new algorithms with source code, exchange knowledge and publications to strengthen collaboration, and train students. All these project activities are imperative for establishing full-scale tabular data science research.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Tennessee State University

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant