Loading…

Loading grant details…

Completed STANDARD GRANT National Science Foundation (US)

CNS Core: Small: Intelligent Fault Injection to Expose and Reproduce Production-Grade Bugs in Cloud Systems

$5M USD

Funder National Science Foundation (US)
Recipient Organization Johns Hopkins University
Country United States
Start Date Dec 15, 2021
End Date Jul 31, 2023
Duration 593 days
Number of Grantees 1
Roles Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2149664
Grant Description

Failures of production distributed systems are costly. Despite extensive efforts on testing distributed systems, many bugs remain difficult to find in testing even when a system is tested with appropriate input. This is because these bugs are triggered by the unique faulty events in the production environment.

Fault injection has been proposed to simulate faults during testing with the goal of catching such bugs. However, existing solutions treat the target systems as a black box and only inject simple faults using random choices. Production failures are often caused by bugs that require complex, system-specific faults at careful timing to trigger.

This project takes a holistic approach to address the fundamental limitations in current fault injection testing. The project develops special compiler support to enable the injection of system-specific faults at a fine granularity with precise control. To efficiently explore the large fault injection space and expose bugs, this project designs new fault injection decision algorithms and machine learning methods.

A new adaptive method further analyzes production execution traces to quickly reproduce fault-induced failures in offline environment.

Bugs in production distributed systems have resulted in substantial financial losses to society. The new fault injection techniques developed in this project will help effectively catch a wide range of production-grade bugs in large distributed systems and improve the availability of cloud services. This project will closely engage with developers in the open-source community to improve the distributed systems code quality and testing practice.

The software artifact this project develops will be open sourced and available at https://github.com/OrderLab. The project results, including paper publications, technical reports, and presentations will be made available for free download and be maintained for at least five years beyond the completion of the project.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Johns Hopkins University

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant