Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | Johns Hopkins University |
| Country | United States |
| Start Date | Dec 15, 2021 |
| End Date | Jul 31, 2023 |
| Duration | 593 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2149664 |
Failures of production distributed systems are costly. Despite extensive efforts on testing distributed systems, many bugs remain difficult to find in testing even when a system is tested with appropriate input. This is because these bugs are triggered by the unique faulty events in the production environment.
Fault injection has been proposed to simulate faults during testing with the goal of catching such bugs. However, existing solutions treat the target systems as a black box and only inject simple faults using random choices. Production failures are often caused by bugs that require complex, system-specific faults at careful timing to trigger.
This project takes a holistic approach to address the fundamental limitations in current fault injection testing. The project develops special compiler support to enable the injection of system-specific faults at a fine granularity with precise control. To efficiently explore the large fault injection space and expose bugs, this project designs new fault injection decision algorithms and machine learning methods.
A new adaptive method further analyzes production execution traces to quickly reproduce fault-induced failures in offline environment.
Bugs in production distributed systems have resulted in substantial financial losses to society. The new fault injection techniques developed in this project will help effectively catch a wide range of production-grade bugs in large distributed systems and improve the availability of cloud services. This project will closely engage with developers in the open-source community to improve the distributed systems code quality and testing practice.
The software artifact this project develops will be open sourced and available at https://github.com/OrderLab. The project results, including paper publications, technical reports, and presentations will be made available for free download and be maintained for at least five years beyond the completion of the project.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Johns Hopkins University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant