Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of California-Riverside |
| Country | United States |
| Start Date | Aug 01, 2022 |
| End Date | Jul 31, 2025 |
| Duration | 1,095 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2210272 |
This project seeks to develop a set of new statistical regression models for abnormal data, such as skewed, truncated, heterogeneous, or noisy data with outliers, which are commonly seen in economics, sociology, medicine, and biology. The new regression method, named modal regression, finds the conditional most probable value (mode) of a dependent variable given covariates, rather than the mean/quantile that the traditional regression models focus on.
As a complement to existing regression tools, the modal regression could reveal interesting new data structure that is possibly missed by the conditional mean or quantiles. In addition, modal regression is resistant to outliers and measurement errors, and can provide shorter prediction intervals when the data are skewed, such as salary, prices, and expenditures in economics and church sizes and symptom indices in sociology.
Furthermore, unlike traditional mean or quantile regression, the modal regression can be directly applied to the truncated data, which arises when the data are observed only when the dependent variable has a lower or upper limit, such as an economic index measured within some range. This work will benefit scientists and researchers who want to analyze skewed or truncated data in fields that include economics, social sciences, marketing, medical studies, public health, biology, and agriculture.
This project will provide training opportunities for graduate students. Software developed for implementing the new modal regression will be made publicly available.
Parallel to existing regression models, the investigator will develop a wide variety of parametric and nonparametric modal regression models for both independent and dependent (time series or spatial) data by imposing some model assumptions on the conditional mode of a dependent variable Y given covariates x. The new method avoids the nonparametric estimation of conditional density of Y given x, which is difficult when the dimension of x is large.
The investigator will develop a modal expectation-maximization algorithm to simplify the computation of the modal regression. The convergence rate and sampling properties of the resulting estimators will be systematically studied. For high dimensional data, the investigator will consider a new feature selection tool and variable selection methods for modal regression.
In addition, the investigator will develop a new sufficient dimension reduction method to reduce the dimension of covariates for modal regression. Furthermore, the investigator will develop a modal clustering tool for heterogeneous/mixture data where multiple modal regression curves exist. The modal clustering method can serve as an alternative tool for mixture regression models to reveal the clustered/inhomogeneous data structure and provide a natural way to estimate the number of components/clusters, which has long been a challenging problem.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of California-Riverside
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant