Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of California-Berkeley |
| Country | United States |
| Start Date | Jan 01, 2025 |
| End Date | Dec 31, 2029 |
| Duration | 1,825 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2442542 |
Modern applications, be it e-commerce sites, ML serving systems or medical applications place increasingly stringent requirements on the cloud storage systems that underpin them. These systems must offer good performance and scalability, as well as robustness against hardware failures and malicious attacks. Consensus systems, specifically, are used as the root of trust to bootstrap application correctness, and ensure that machines agree on a shared state in spite of failures.
Their guarantees rely on the trust model (the set of assumptions about reality) accurately describing the conditions under which the system will operate. This means correctly modeling the network, the types of failures that can arise, as well the number of total possible failures. Unfortunately, existing trust models fail to capture realistic deployment conditions.
The project's novelties come from developing new trust models and protocols that explicitly recognise the true, uncertain nature of large scale distributed systems. The project's broader significance and importance are its ability to significantly improve the performance and robustness of consensus systems, and as such of all the systems that depend on them.
Production consensus implementations are deployed over networks that are heterogeneous between LAN and WAN, with blips, and subject to attack, misconfiguration or link failures. Replicas in these systems all have a probability of failure, and this failure rate evolves over time. Yet, engineers do not currently have a good way to precisely express these realistic setups as current abstractions are too coarse-grained.
They must either over-insure or under-insure, leading to poor performance and unnecessarily high replication factors. This project 1) revisits the network model by eschewing the idea that the network is necessarily fully synchronous/asynchronous, paying particular attention to how protocols recover from blips 2) revisits the failure model and introduces probability-native consensus protocols that view failure rates as dynamically evolving probability distributions.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of California-Berkeley
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant