-
The Design of Sampling Strata for the National Household Food Acquisition and Purchase Survey
February 2025
Working Paper Number:
CES-25-13
The National Household Food Acquisition and Purchase Survey (FoodAPS), sponsored by the United States Department of Agriculture's (USDA) Economic Research Service (ERS) and Food and Nutrition Service (FNS), examines the food purchasing behavior of various subgroups of the U.S. population. These subgroups include participants in the Supplemental Nutrition Assistance Program (SNAP) and the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC), as well as households who are eligible for but don't participate in these programs. Participants in these social protection programs constitute small proportions of the U.S. population; obtaining an adequate number of such participants in a survey would be challenging absent stratified sampling to target SNAP and WIC participating households. This document describes how the U.S. Census Bureau (which is planning to conduct future versions of the FoodAPS survey on behalf of USDA) created sampling strata to flag the FoodAPS targeted subpopulations using machine learning applications in linked survey and administrative data. We describe the data, modeling techniques, and how well the sampling flags target low-income households and households receiving WIC and SNAP benefits. We additionally situate these efforts in the nascent literature on the use of big data and machine learning for the improvement of survey efficiency.
View Full
Paper PDF
-
Potential Bias When Using Administrative Data to Measure the Family Income of School-Aged Children
January 2025
Working Paper Number:
CES-25-03
Researchers and practitioners increasingly rely on administrative data sources to measure family income. However, administrative data sources are often incomplete in their coverage of the population, giving rise to potential bias in family income measures, particularly if coverage deficiencies are not well understood. We focus on the school-aged child population, due to its particular import to research and policy, and because of the unique challenges of linking children to family income information. We find that two of the most significant administrative sources of family income information that permit linking of children and parents'IRS Form 1040 and SNAP participation records'usefully complement each other, potentially reducing coverage bias when used together. In a case study considering how best to measure economic disadvantage rates in the public school student population, we demonstrate the sensitivity of family income statistics to assumptions about individuals who do not appear in administrative data sources.
View Full
Paper PDF
-
The Census Historical Environmental Impacts Frame
October 2024
Working Paper Number:
CES-24-66
The Census Bureau's Environmental Impacts Frame (EIF) is a microdata infrastructure that combines individual-level information on residence, demographics, and economic characteristics with environmental amenities and hazards from 1999 through the present day. To better understand the long-run consequences and intergenerational effects of exposure to a changing environment, we expand the EIF by extending it backward to 1940. The Historical Environmental Impacts Frame (HEIF) combines the Census Bureau's historical administrative data, publicly available 1940 address information from the 1940 Decennial Census, and historical environmental data. This paper discusses the creation of the HEIF as well as the unique challenges that arise with using the Census Bureau's historical administrative data.
View Full
Paper PDF
-
Nonresponse and Coverage Bias in the Household Pulse Survey: Evidence from Administrative Data
October 2024
Working Paper Number:
CES-24-60
The Household Pulse Survey (HPS) conducted by the U.S. Census Bureau is a unique survey that provided timely data on the effects of the COVID-19 Pandemic on American households and continues to provide data on other emergent social and economic issues. Because the survey has a response rate in the single digits and only has an online response mode, there are concerns about nonresponse and coverage bias. In this paper, we match administrative data from government agencies and third-party data to HPS respondents to examine how representative they are of the U.S. population. For comparison, we create a benchmark of American Community Survey (ACS) respondents and nonrespondents and include the ACS respondents as another point of reference. Overall, we find that the HPS is less representative of the U.S. population than the ACS. However, performance varies across administrative variables, and the existing weighting adjustments appear to greatly improve the representativeness of the HPS. Additionally, we look at household characteristics by their email domain to examine the effects on coverage from limiting email messages in 2023 to addresses from the contact frame with at least 90% deliverability rates, finding no clear change in the representativeness of the HPS afterwards.
View Full
Paper PDF
-
Incorporating Administrative Data in Survey Weights for the 2018-2022 Survey of Income and Program Participation
October 2024
Working Paper Number:
CES-24-58
Response rates to the Survey of Income and Program Participation (SIPP) have declined over time, raising the potential for nonresponse bias in survey estimates. A potential solution is to leverage administrative data from government agencies and third-party data providers when constructing survey weights. In this paper, we modify various parts of the SIPP weighting algorithm to incorporate such data. We create these new weights for the 2018 through 2022 SIPP panels and examine how the new weights affect survey estimates. Our results show that before weighting adjustments, SIPP respondents in these panels have higher socioeconomic status than the general population. Existing weighting procedures reduce many of these differences. Comparing SIPP estimates between the production weights and the administrative data-based weights yields changes that are not uniform across the joint income and program participation distribution. Unlike other Census Bureau household surveys, there is no large increase in nonresponse bias in SIPP due to the COVID-19 Pandemic. In summary, the magnitude and sign of nonresponse bias in SIPP is complicated, and the existing weighting procedures may change the sign of nonresponse bias for households with certain incomes and program benefit statuses.
View Full
Paper PDF
-
Earnings Through the Stages: Using Tax Data to Test for Sources of Error in CPS ASEC Earnings and Inequality Measures
September 2024
Working Paper Number:
CES-24-52
In this paper, I explore the impact of generalized coverage error, item non-response bias, and measurement error on measures of earnings and earnings inequality in the CPS ASEC. I match addresses selected for the CPS ASEC to administrative data from 1040 tax returns. I then compare earnings statistics in the tax data for wage and salary earnings in samples corresponding to seven stages of the CPS ASEC survey production process. I also compare the statistics using the actual survey responses. The statistics I examine include mean earnings, the Gini coefficient, percentile earnings shares, and shares of the survey weight for a range of percentiles. I examine how the accuracy of the statistics calculated using the survey data is affected by including imputed responses for both those who did not respond to the full CPS ASEC and those who did not respond to the earnings question. I find that generalized coverage error and item nonresponse bias are dominated by measurement error, and that an important aspect of measurement error is households reporting no wage and salary earnings in the CPS ASEC when there are such earnings in the tax data. I find that the CPS ASEC sample misses earnings at the high end of the distribution from the initial selection stage and that the final survey weights exacerbate this.
View Full
Paper PDF
-
Citizenship Question Effects on Household Survey Response
June 2024
Working Paper Number:
CES-24-31
Several small-sample studies have predicted that a citizenship question in the 2020 Census would cause a large drop in self-response rates. In contrast, minimal effects were found in Poehler et al.'s (2020) analysis of the 2019 Census Test randomized controlled trial (RCT). We reconcile these findings by analyzing associations between characteristics about the addresses in the 2019 Census Test and their response behavior by linking to independently constructed administrative data. We find significant heterogeneity in sensitivity to the citizenship question among households containing Hispanics, naturalized citizens, and noncitizens. Response drops the most for households containing noncitizens ineligible for a Social Security number (SSN). It falls more for households with Latin American-born immigrants than those with immigrants from other countries. Response drops less for households with U.S.-born Hispanics than households with noncitizens from Latin America. Reductions in responsiveness occur not only through lower unit self-response rates, but also by increased household roster omissions and internet break-offs. The inclusion of a citizenship question increases the undercount of households with noncitizens. Households with noncitizens also have much higher citizenship question item nonresponse rates than those only containing citizens. The use of tract-level characteristics and significant heterogeneity among Hispanics, the foreign-born, and noncitizens help explain why the effects found by Poehler et al. were so small. Linking administrative microdata with the RCT data expands what we can learn from the RCT.
View Full
Paper PDF
-
The Icing on the Cake: The Effects of Monetary Incentives on Income Data Quality in the SIPP
January 2024
Working Paper Number:
CES-24-03
Accurate measurement of key income variables plays a crucial role in economic research and policy decision-making. However, the presence of item nonresponse and measurement error in survey data can cause biased estimates. These biases can subsequently lead to sub-optimal policy decisions and inefficient allocation of resources. While there have been various studies documenting item nonresponse and measurement error in economic data, there have not been many studies investigating interventions that could reduce item nonresponse and measurement error. In our research, we investigate the impact of monetary incentives on reducing item nonresponse and measurement error for labor and investment income in the Survey of Income and Program Participation (SIPP). Our study utilizes a randomized incentive experiment in Waves 1 and 2 of the 2014 SIPP, which allows us to assess the effectiveness of incentives in reducing item nonresponse and measurement error. We find that households receiving incentives had item nonresponse rates that are 1.3 percentage points lower for earnings and 1.5 percentage points lower for Social Security income. Measurement error was 6.31 percentage points lower at the intensive margin for interest income, and 16.48 percentage points lower for dividend income compared to non-incentive recipient households. These findings provide valuable insights for data producers and users and highlight the importance of implementing strategies to improve data quality in economic research.
View Full
Paper PDF
-
Incorporating Administrative Data in Survey Weights for the Basic Monthly Current Population Survey
January 2024
Working Paper Number:
CES-24-02
Response rates to the Current Population Survey (CPS) have declined over time, raising the potential for nonresponse bias in key population statistics. A potential solution is to leverage administrative data from government agencies and third-party data providers when constructing survey weights. In this paper, we take two approaches. First, we use administrative data to build a non-parametric nonresponse adjustment step while leaving the calibration to population estimates unchanged. Second, we use administratively linked data in the calibration process, matching income data from the Internal Return Service and state agencies, demographic data from the Social Security Administration and the decennial census, and industry data from the Census Bureau's Business Register to both responding and nonresponding households. We use the matched data in the household nonresponse adjustment of the CPS weighting algorithm, which changes the weights of respondents to account for differential nonresponse rates among subpopulations.
After running the experimental weighting algorithm, we compare estimates of the unemployment rate and labor force participation rate between the experimental weights and the production weights. Before March 2020, estimates of the labor force participation rates using the experimental weights are 0.2 percentage points higher than the original estimates, with minimal effect on unemployment rate. After March 2020, the new labor force participation rates are similar, but the unemployment rate is about 0.2 percentage points higher in some months during the height of COVID-related interviewing restrictions. These results are suggestive that if there is any nonresponse bias present in the CPS, the magnitude is comparable to the typical margin of error of the unemployment rate estimate. Additionally, the results are overall similar across demographic groups and states, as well as using alternative weighting methodology. Finally, we discuss how our estimates compare to those from earlier papers that calculate estimates of bias in key CPS labor force statistics.
This paper is for research purposes only. No changes to production are being implemented at this time.
View Full
Paper PDF
-
Connected and Uncooperative: The Effects of Homogenous and Exclusive Social Networks on Survey Response Rates and Nonresponse Bias
January 2024
Working Paper Number:
CES-24-01
Social capital, the strength of people's friendship networks and community ties, has been hypothesized as an important determinant of survey participation. Investigating this hypothesis has been difficult given data constraints. In this paper, we provide insights by investigating how response rates and nonresponse bias in the American Community Survey are correlated with county-level social network data from Facebook. We find that areas of the United States where people have more exclusive and homogenous social networks have higher nonresponse bias and lower response rates. These results provide further evidence that the effects of social capital may not be simply a matter of whether people are socially isolated or not, but also what types of social connections people have and the sociodemographic heterogeneity of their social networks.
View Full
Paper PDF