Completed STANDARD GRANT National Science Foundation (US)

SBIR Phase I: Automatic Data Series Extraction from a Text Corpus

$2.56M USD

Funder	National Science Foundation (US)
Recipient Organization	Revelata, Inc.
Country	United States
Start Date	Aug 01, 2021
End Date	May 31, 2023
Duration	668 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2110123`

Grant Description

The broader impact of this Small Business Innovation Research (SBIR) Phase I project will be improved market efficiency and greater competition in financial services and adjacent industries. Currently, these sectors are dominated by large firms that have the resources to create and exploit asymmetric information advantages over smaller firms. A key reason for this asymmetry is that company information – the basis for building high-quality, detailed financial models – can be prohibitively expensive to surface, not because it is unavailable, but because it is reported in non-standardized ways and therefore difficult to extract and make actionable.

To do so systematically and industry-wide requires thousands of man-hours per year of manually sifting through millions of documents, a cost that only the largest firms in the world can bear. Automating the extraction of such information and making it both widely available and easily accessible helps to level the playing field for small firms while simultaneously improving the speed and quality of decision-making – at a lower total cost – for large firms.

This Small Business Innovation Research (SBIR) Phase I project aims to develop a machine learning platform for automatically extracting data from a collection of financial documents. The platform will take advantage of recent advances in natural language processing model architectures, but nonetheless faces the challenges of (a) achieving and maintaining a high level of accuracy, even as document text volume, and thus semantic variation, grows; and (b) generating sufficient labeled training data in a cost-effective way.

This project addresses these dual challenges with a novel framework for continuous model training based on recent meta-learning techniques. Such an approach to supervised learning can substantially accelerate model improvement and simultaneously drive down training costs.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Revelata, Inc.

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

SBIR Phase I: Automatic Data Series Extraction from a Text Corpus

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants