The National Household Food Acquisition and Purchase Survey (FoodAPS), sponsored by the United States Department of Agriculture's (USDA) Economic Research Service (ERS) and Food and Nutrition Service (FNS), examines the food purchasing behavior of various subgroups of the U.S. population. These subgroups include participants in the Supplemental Nutrition Assistance Program (SNAP) and the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC), as well as households who are eligible for but don't participate in these programs. Participants in these social protection programs constitute small proportions of the U.S. population; obtaining an adequate number of such participants in a survey would be challenging absent stratified sampling to target SNAP and WIC participating households. This document describes how the U.S. Census Bureau (which is planning to conduct future versions of the FoodAPS survey on behalf of USDA) created sampling strata to flag the FoodAPS targeted subpopulations using machine learning applications in linked survey and administrative data. We describe the data, modeling techniques, and how well the sampling flags target low-income households and households receiving WIC and SNAP benefits. We additionally situate these efforts in the nascent literature on the use of big data and machine learning for the improvement of survey efficiency.
-
Estimating the U.S. Citizen Voting-Age Population (CVAP) Using Blended Survey Data, Administrative Record Data, and Modeling: Technical Report
April 2023
Authors:
J. David Brown,
Danielle H. Sandler,
Lawrence Warren,
Moises Yi,
Misty L. Heggeness,
Joseph L. Schafer,
Matthew Spence,
Marta Murray-Close,
Carl Lieberman,
Genevieve Denoeux,
Lauren Medina
Working Paper Number:
CES-23-21
This report develops a method using administrative records (AR) to fill in responses for nonresponding American Community Survey (ACS) housing units rather than adjusting survey weights to account for selection of a subset of nonresponding housing units for follow-up interviews and for nonresponse bias. The method also inserts AR and modeling in place of edits and imputations for ACS survey citizenship item nonresponses. We produce Citizen Voting-Age Population (CVAP) tabulations using this enhanced CVAP method and compare them to published estimates. The enhanced CVAP method produces a 0.74 percentage point lower citizen share, and it is 3.05 percentage points lower for voting-age Hispanics. The latter result can be partly explained by omissions of voting-age Hispanic noncitizens with unknown legal status from ACS household responses. Weight adjustments may be less effective at addressing nonresponse bias under those conditions.
View Full
Paper PDF
-
Determination of the 2020 U.S. Citizen Voting Age Population (CVAP) Using Administrative Records and Statistical Methodology Technical Report
October 2020
Authors:
John M. Abowd,
J. David Brown,
Lawrence Warren,
Moises Yi,
Misty L. Heggeness,
William R. Bell,
Michael B. Hawes,
Andrew Keller,
Vincent T. Mule Jr.,
Joseph L. Schafer,
Matthew Spence
Working Paper Number:
CES-20-33
This report documents the efforts of the Census Bureau's Citizen Voting-Age Population (CVAP) Internal Expert Panel (IEP) and Technical Working Group (TWG) toward the use of multiple data sources to produce block-level statistics on the citizen voting-age population for use in enforcing the Voting Rights Act. It describes the administrative, survey, and census data sources used, and the four approaches developed for combining these data to produce CVAP estimates. It also discusses other aspects of the estimation process, including how records were linked across the multiple data sources, and the measures taken to protect the confidentiality of the data.
View Full
Paper PDF
-
Incorporating Administrative Data in Survey Weights for the 2018-2022 Survey of Income and Program Participation
October 2024
Working Paper Number:
CES-24-58
Response rates to the Survey of Income and Program Participation (SIPP) have declined over time, raising the potential for nonresponse bias in survey estimates. A potential solution is to leverage administrative data from government agencies and third-party data providers when constructing survey weights. In this paper, we modify various parts of the SIPP weighting algorithm to incorporate such data. We create these new weights for the 2018 through 2022 SIPP panels and examine how the new weights affect survey estimates. Our results show that before weighting adjustments, SIPP respondents in these panels have higher socioeconomic status than the general population. Existing weighting procedures reduce many of these differences. Comparing SIPP estimates between the production weights and the administrative data-based weights yields changes that are not uniform across the joint income and program participation distribution. Unlike other Census Bureau household surveys, there is no large increase in nonresponse bias in SIPP due to the COVID-19 Pandemic. In summary, the magnitude and sign of nonresponse bias in SIPP is complicated, and the existing weighting procedures may change the sign of nonresponse bias for households with certain incomes and program benefit statuses.
View Full
Paper PDF
-
Producing U.S. Population Statistics Using Multiple Administrative Sources
November 2023
Working Paper Number:
CES-23-58
We identify several challenges encountered when constructing U.S. administrative record-based (AR-based) population estimates for 2020. Though the AR estimates are higher than the 2020 Census at the national level, they are over 15 percent lower in 5 percent of counties, suggesting that locational accuracy can be improved. Other challenges include how to achieve comprehensive coverage, maintain consistent coverage across time, filter out nonresidents and people not alive on the reference date, uncover missing links across person and address records, and predict demographic characteristics when multiple ones are reported or when they are missing. We discuss several ways of addressing these issues, e.g., building in redundancy with more sources, linking children to their parents' addresses, and conducting additional record linkage for people without Social Security Numbers and for addresses not initially linked to the Census Bureau's Master Address File. We discuss modeling to predict lower levels of geography for people lacking those geocodes, the probability that a person is a U.S. resident on the reference date, the probability that an address is the person's residence on the reference date, and the probability a person is in each demographic characteristic category. Regression results illustrate how many of these challenges and solutions affect the AR county population estimates.
View Full
Paper PDF
-
Evaluation of Commercial School and Teacher Lists to Enhance Survey Frames
July 2014
Working Paper Number:
carra-2014-07
This report summarizes the potential for teacher lists obtained from commercial vendors for enhancing sampling frames for the National Teacher and Principal Survey (NTPS). We investigate three separate vendor lists, and compare coverage rates across a range of school and teacher characteristics. Across all vendors, coverage rates are higher for regular, non-charter schools. Vendor A stands out as having higher coverage rates than the other two, and we recommend further evaluating Vendor A's teacher lists during the upcoming 2014-2015 NTPS Field Test.
View Full
Paper PDF
-
Potential Bias When Using Administrative Data to Measure the Family Income of School-Aged Children
January 2025
Working Paper Number:
CES-25-03
Researchers and practitioners increasingly rely on administrative data sources to measure family income. However, administrative data sources are often incomplete in their coverage of the population, giving rise to potential bias in family income measures, particularly if coverage deficiencies are not well understood. We focus on the school-aged child population, due to its particular import to research and policy, and because of the unique challenges of linking children to family income information. We find that two of the most significant administrative sources of family income information that permit linking of children and parents'IRS Form 1040 and SNAP participation records'usefully complement each other, potentially reducing coverage bias when used together. In a case study considering how best to measure economic disadvantage rates in the public school student population, we demonstrate the sensitivity of family income statistics to assumptions about individuals who do not appear in administrative data sources.
View Full
Paper PDF
-
Receipt of Public and Private Food Assistance Across the Rural-Urban Continuum Before and During the COVID-19 Pandemic: Analysis of Current Population Survey Data
August 2025
Working Paper Number:
CES-25-51
Background: The nutrition safety net in the United States is critical to supporting food security among households in need. Food assistance in the United States includes both government-funded food programs and private community-based providers who distribute food to in need households. The COVID-19 pandemic impacted experiences of food security and use of private and public food assistance resources. However, this may have differed for households residing in urban versus rural areas. We explored receipt of Supplemental Nutrition Assistance Program (SNAP) benefits or food from community-based emergency food providers across a detailed measure of the rural-urban continuum before and during the COVID-19 pandemic.
Methods: We linked restricted use Current Population Survey Food Security Supplement data to census-tract level United States Department of Agriculture Rural-Urban Commuting Area codes to estimate prevalence of self-reported SNAP participation and receipt of emergency food support across temporal (2015-2019 versus 2020-2021) and socio-spatial (urban, large rural city/town, small rural town, or isolated rural town/area) dimensions. We report prevalences as point estimates with 95% confidence intervals, all weighted for national representation.
Results:
The weighted prevalence of self-reported SNAP participation was 8.9% (8.7-9.2%) in 2015-2019 and 9.1% (8.5-9.5%) in 2020-2021 in urban areas, 11.4% (10.8-12.2%) in 2015-2019 and 11.6% (10.5-12.9%) in 2020-2021 in large rural towns/cities, 13.4% (12.3-14.6%) in 2015-2019 and 12.3% (10.5-14.5%) in 2020-2021 in small rural towns, and 9.7% (8.6-10.9%) in 2015-2019 and 10.9% (8.8-13.4% )in 2020-2021 isolated rural towns. The weighted prevalence of self-reported receipt of emergency food was 4.9% (4.8-5.1%) in 2015-2019 and 6.2% (5.8-6.5%) in 2020-2021 in urban areas, 6.8% (6.2-7.4%) in 2015-2019 and 7.6% (6.6-8.6%) in 2020-2021 in large rural towns/cities, 8.1% (7.3-9.1%) in 2015-2019 and 7.1% (5.7-8.8%) in 2020-2021 in small rural towns, and 6.8% (5.9-7.7%) in 2015-2019 and 8.5% (6.7-10.6%) in 2020-2021 isolated rural towns.
Conclusion: Households in rural communities use public and private food assistance at higher rates than urban areas, but there is variation across communities depending on the level of rurality.
View Full
Paper PDF
-
Gradient Boosting to Address Statistical Problems Arising from Non-Linkage of Census Bureau Datasets
June 2024
Working Paper Number:
CES-24-27
This article introduces the twangRDC package, which contains functions to address non-linkage in US Census Bureau datasets. The Census Bureau's Person Identification Validation System facilitates data linkage by assigning unique person identifiers to federal, third party, decennial census, and survey data. Not all records in these datasets can be linked to the reference file and as such not all records will be assigned an identifier. This article is a tutorial for using the twangRDC to generate nonresponse weights to account for non-linkage of person records across US Census Bureau datasets.
View Full
Paper PDF
-
Capturing More Than Poverty: School Free and Reduced-Price Lunch Data and Household Income
December 2017
Working Paper Number:
carra-2017-09
Educational researchers often use National School Lunch Program (NSLP) data as a proxy for student poverty. Under NSLP policy, students whose household income is less than 130 percent of the poverty line qualify for free lunch and students whose household income is between 130 percent and 185 percent of the poverty line qualify for reduced-price lunch. Linking school administrative records for all 8th graders in a California public school district to household-level IRS income tax data, we examine how well NSLP data capture student disadvantage. We find both that there is substantial disadvantage in household income not captured by NSLP category data, and that NSLP categories capture disadvantage on test scores above and beyond household income.
View Full
Paper PDF
-
The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey
April 2014
Working Paper Number:
carra-2014-08
Record linkage across survey and administrative records sources can greatly enrich data and improve their quality. The linkage can reduce respondent burden and nonresponse follow-up costs. This is particularly important in an era of declining survey response rates and tight budgets. Record linkage also creates statistical bias, however. The U.S. Census Bureau links person records through its Person Identification Validation System (PVS), assigning each record a Protected Identification Key (PIK). It is not possible to reliably assign a PIK to every record, either due to insufficient identifying information or because the information does not uniquely match any of the administrative records used in the person validation process. Non-random ability to assign a PIK can potentially inject bias into statistics using linked data. This paper studies the nature of this bias using the 2009 and 2010 American Community Survey (ACS). The ACS is well-suited for this analysis, as it contains a rich set of person characteristics that can describe the bias. We estimate probit models for whether a record is assigned a PIK. The results suggest that young children, minorities, residents of group quarters, immigrants, recent movers, low-income individuals, and non-employed individuals are less likely to receive a PIK using 2009 ACS. Changes to the PVS process in 2010 significantly addressed the young children deficit, attenuated the other biases, and increased the validated records share from 88.1 to 92.6 percent (person-weighted).
View Full
Paper PDF