Active CONTINUING GRANT National Science Foundation (US)

CAREER: Machine Learning Methods for Spatial Data with Applications in Ecology

$5.64M USD

Funder	National Science Foundation (US)
Recipient Organization	Oregon State University
Country	United States
Start Date	Jun 01, 2021
End Date	May 31, 2026
Duration	1,825 days
Number of Grantees	1
Roles	Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2046678`

Grant Description

Species distribution models (SDMs) are widely used tools in ecology and natural resource management. SDMs are built by correlating observations of a species (e.g., whether it is present or absent) with environmental features. Once built, they can be used to predict how likely a species is to occur at a new site or interpreted to understand why species live where they do.

The spatial aspects of species and environmental data present challenges for the machine learning methods often used to build SDMs, and this award focuses on two of those challenges. First, one must assess the quality of an SDM in order to determine the validity of its predictions and interpretation. To assess quality, some data are often held out from model building.

Then, the model’s ability to predict the unseen data are used as a measure of its quality. With spatial data however, randomly selecting data to hold out can lead to optimistic bias in quality estimates. This award will support research into methods for assessing model quality that account for spatial characteristics of the data in order to produce unbiased estimates of model quality.

A second challenge arises when the data supplied to an SDM come from the growing repositories collected by community science programs. Under-reporting is a common phenomenon in biodiversity surveys (since one typically cannot observe all individuals of all species during an observation), and community science is no exception. The error introduced by under-reporting can be corrected by conducting multiple observations at the same site and estimating the probability of detecting the species, but community science programs are often not structured this way.

This award will support research to create groups of multiple observations after the fact, so that under-reporting error can be accounted for better in this growing data resource. In addition to these scientific aims, this award will support education and outreach to graduate, undergraduate, and pre-college students, including the production of a set of benchmark datasets, a new introductory computer science course, and modules for STEM clubs and camps.

The research contributions of this award will enable scientists to build better models of spatial phenomena. The anticipated framework for cross-validation that accounts for domain adaptation between training and test folds will produce better generalization performance estimates by accounting for spatial autocorrelation and admitting target testing distributions.

In SDM, this means that researchers will be able to correct for bias induced by autocorrelation and provide climate model projections to obtain estimates of how species will fare under global change. In addition, it will define and propose solutions to a new type of spatial clustering problem: creating spatial modeling abstractions aimed at meeting the assumptions of a subsequent modeling phase.

In science and management questions informed by SDMs, this will improve the ability to correct for observational errors and translate to better habitat models. The methods will be applicable beyond the motivating applications in SDM to a variety of spatial domains. The education plan will build bridges between ecology and computer science, while drawing on best educational practices to improve recruitment and retention of underserved students.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Oregon State University

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

CAREER: Machine Learning Methods for Spatial Data with Applications in Ecology

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants