Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of Tennessee Knoxville |
| Country | United States |
| Start Date | Oct 01, 2021 |
| End Date | Sep 30, 2025 |
| Duration | 1,460 days |
| Number of Grantees | 2 |
| Roles | Principal Investigator; Co-Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2120429 |
This project will create public research infrastructure centered around achieving a curated collection of source-code and version control history data approximating the entirety of open-source software (OSS). Real-world data from open-source software development has catalyzed progress in software engineering research in the last two decades. Despite the OSS version control data being public and detailed (with actions of developers and versions of the source code), the sheer scale and the need for curation (collection, contextualization, correction, augmentation, and integration) make such data unsuitable for research.
The data are spread across many platforms, embedded in many tools and formats, and spread across tens of millions of repositories. Moreover, the difficulty of curating data across the entire OSS ecosystem, beyond the capabilities of individual research groups, also leaves many important research questions unanswered. Individual OSS projects depend on each other and share source code and developers among them.
This creates tremendous risks, for example the spread of vulnerable source code and the ripple effects of volunteer maintainers disengaging. The team will create nearly complete, fully curated, and extensively cross-referenced version control data that will enable the research community to measure and understand the dynamics of OSS ecosystems and, thus, help identify and manage risk to OSS in particular and to society in general.
This project will use input from the software engineering community to create a research infrastructure that contains: 1) regularly updated and cross-referenced source-code and version history resource approximating the entirety of OSS; 2) data curation capabilities, e.g., identity disambiguation and extraction of dependencies; 3) easy-to-use web services and applications to support common research tasks; 4) training: tutorials, mentoring, hackathons and seminars to help use the resource effectively and efficiently; 5) a community of researchers, developers, and companies who maintain, guide, enhance, and operate this infrastructure. This will enable answers to an entirely new set of research questions concerning OSS network structure defined by technical dependencies, code sharing, and knowledge flows.
It will also provide accessible means for stratified sampling from the OSS universe of code, improving the generality of research findings.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of Tennessee Knoxville
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant