Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of Southern California |
| Country | United States |
| Start Date | Jul 01, 2021 |
| End Date | Jun 30, 2024 |
| Duration | 1,095 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2113500 |
Classification is a popular data analytical technique in disciplines ranging from biomedical sciences to information technologies. This project will develop theory-backed statistical methods and algorithms to address pressing challenges in the application of classification. These challenges are related to imperfect aspects of training data, which are widespread in high-stake applications such as disease diagnosis and cybersecurity.
In particular, this project will focus on the so-called asymmetric classification problems where a particular class is of greater importance than other classes, and the methods and algorithms will aim to control the classification error of missing the most important class in the population, not just in a particular dataset. This property will make the methods and algorithms powerful for medical diagnosis, for which the primary goal is diagnosis accuracy in the population.
Moreover, this project will provide a suite of projects, ranging from theory to applications, that are suitable for training graduate and undergraduate students. The interdisciplinary nature of this project is expected to attract students from diverse background to join the PIs’ efforts.
The PIs will develop a suite of application-driven, theory-backed methods and algorithms to address pressing data challenges including sample size limitations, sampling biases, and ambiguous class labels. The development will be primarily under the Neyman-Pearson (NP) classification paradigm, which was designed to control the population-level false-negative rate (p-FNR) under a desired level while minimizing the population-level false-positive rate (p-FPR).
This project will integrate the NP classification into cutting-edge statistical learning tasks and enable it to address the aforementioned real-world data challenges. Specifically, this project will include the following four overarching goals. First, the PIs will use random matrix theory to address a long-standing problem in the NP classification methodology: whether NP classifiers can be constructed without a sample-splitting step to improve data efficiency.
Second, because the NP paradigm has an invariance property to sampling bias, the PIs will develop NP classifiers to address the sampling bias issue in biomedical applications. These classifiers can be trained on biased samples but still achieve the p-FNR control. Third, the PIs will develop a model-free feature ranking framework to incorporate multiple classification paradigms including the NP paradigm and to reflect prediction objectives.
Fourth, the PIs will develop the first NP umbrella algorithm under the label noise setting and the first information-theoretic criteria that combine ambiguous classes in multi-class classification. To disseminate the project outcomes, the PIs will give research talks, organize conference sessions, share open-source software packages with tutorials, and reach out to practitioners of classification methods.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of Southern California
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant