Loading…

Loading grant details…

Completed STANDARD GRANT National Science Foundation (US)

HNDS-I: From Stacks to Stats: Unlocking International Census Data from Print Volumes

$10M USD

Funder National Science Foundation (US)
Recipient Organization University of Minnesota-Twin Cities
Country United States
Start Date Sep 01, 2021
End Date Aug 31, 2025
Duration 1,460 days
Number of Grantees 5
Roles Principal Investigator; Co-Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2121891
Grant Description

Nearly every country in the world conducts a census. These censuses generate information about who people are, their education, and where they work and live. Census data are important for understanding how human populations change over time.

However, most of these data are found only in printed volumes on library shelves. It is challenging to get data from printed pages ready for analysis, so these data are difficult to use for research. This project will collect census volumes; scan the data inside; convert the data into digital formats that are easy to analyze; and make them freely available to researchers worldwide.

Making these data easy to access will allow researchers, policy makers, and others to answer questions about important issues like aging, migration, fertility, and mortality.

This project builds on work that created the IPUMS International Historical Geographic Information System (IHGIS). IHGIS software currently works to process and document data from PDF documents. This project extends the IHGIS tools to work with print volumes by using optical character recognition to convert scanned images into digital data tables.

To do so, special problems related to scanning data tables are addressed. For example, there is no way to tell from the number itself that a scanned 3,222 should be 8,222 instead. This project develops software to determine the right value; for example, using a digitized table, the software might determine consistency, within a given age group, among the number of people attending school, the number of people not attending school, and the total number of people in the age group to determine the correct 8,322 value.

More complicated problems arise with multidimensional tables, such as a table that contains the levels of education attained for people in different age groups across many different geographic regions. The software uses structured information about the content of tables to determine consistency across specific row and column elements and performs the checks to find scanning errors.

Automating otherwise labor-intensive problems like these will provide a large collection of data for countries and time periods where digitally published data are not available. Thanks to the geographic detail, historical depth, and global coverage, researchers will be able to use the data to study change over time and differences between places within and between countries.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of Minnesota-Twin Cities

Advertisement
Apply for grants with GrantFunds
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant