Active STANDARD GRANT National Science Foundation (US)

CISE MSI: RCBP: III: Advancing Speech Detection: A Hybrid Approach Using Large Language Models and Graph Neural Networks

$4M USD

Funder	National Science Foundation (US)
Recipient Organization	Texas A&M University Corpus Christi
Country	United States
Start Date	Jan 01, 2025
End Date	Dec 31, 2026
Duration	729 days
Number of Grantees	3
Roles	Principal Investigator; Co-Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2431176`

Grant Description

Online speech that threatens persons, groups, or organizations necessitates sophisticated tools for effective detection and mitigation. This project aims to construct an advanced hybrid machine learning pipeline to enhance the analysis of speech and detection of speech differences in online environments. The project focuses on three key aspects: detecting the characteristics of individual speech posts, understanding and mitigating the spread of threatening speech, and addressing the lack of comprehensive multilingual datasets, particularly for English and Spanish-speaking communities.

By combining the analytical capabilities of Large Language Models (LLMs) for content analysis with Graph Neural Networks (GNNs) for understanding social dynamics, this project develops a robust suite of tools adaptable to various speech detection scenarios. Additionally, it creates and publishes new datasets that expand the coverage of speech analysis in English and fill the critical gap in speech research for the Spanish-speaking environment.

This project advances speech detection research through an effective hybrid machine learning pipeline. It focuses on three main objectives: enhancing the accuracy and reliability of threat detection using Large Language Models (LLMs), understanding the dynamics of threat propagation with Graph Neural Networks (GNNs), and creating a comprehensive multilingual dataset suite for threat detection.

The LLMs analyze both explicit and implicit speech across various classes in multilingual contexts, primarily focusing on English and Spanish. The GNNs identify the origins and patterns of attacks in speech, predict its spread and trajectory, and develop strategies to mitigate its effects. The multilingual dataset suite supports diverse speech themes, ensuring balanced diversity and addressing size limitations.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Texas A&M University Corpus Christi

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

CISE MSI: RCBP: III: Advancing Speech Detection: A Hybrid Approach Using Large Language Models and Graph Neural Networks

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants