CREAT - Census Bureau

Working Paper

The Design of Sampling Strata for the National Household Food Acquisition and Purchase Survey

February 2025

Authors: Jonathan Eggleston, Mark Klee, Linden McBride

Working Paper Number:

CES-25-13

The National Household Food Acquisition and Purchase Survey (FoodAPS), sponsored by the United States Department of Agriculture's (USDA) Economic Research Service (ERS) and Food and Nutrition Service (FNS), examines the food purchasing behavior of various subgroups of the U.S. population. These subgroups include participants in the Supplemental Nutrition Assistance Program (SNAP) and the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC), as well as households who are eligible for but don't participate in these programs. Participants in these social protection programs constitute small proportions of the U.S. population; obtaining an adequate number of such participants in a survey would be challenging absent stratified sampling to target SNAP and WIC participating households. This document describes how the U.S. Census Bureau (which is planning to conduct future versions of the FoodAPS survey on behalf of USDA) created sampling strata to flag the FoodAPS targeted subpopulations using machine learning applications in linked survey and administrative data. We describe the data, modeling techniques, and how well the sampling flags target low-income households and households receiving WIC and SNAP benefits. We additionally situate these efforts in the nascent literature on the use of big data and machine learning for the improvement of survey efficiency.
View Full Paper PDF
Working Paper

When and Why Does Nonresponse Occur? Comparing the Determinants of Initial Unit Nonresponse and Panel Attrition

September 2023

Authors: Tiffany S. Neman

Working Paper Number:

CES-23-44

Though unit nonresponse threatens data quality in both cross-sectional and panel surveys, little is understood about how initial nonresponse and later panel attrition may be theoretically or empirically distinct phenomena. This study advances current knowledge of the determinants of both unit nonresponse and panel attrition within the context of the U.S. Census Bureau's Survey of Income and Program Participation (SIPP) panel survey, which I link with high-quality federal administrative records, paradata, and geographic data. By exploiting the SIPP's interpenetrated sampling design and relying on cross-classified random effects modeling, this study quantifies the relative effects of sample household, interviewer, and place characteristics on baseline nonresponse and later attrition, addressing a critical gap in the literature. Given the reliance on successful record linkages between survey sample households and federal administrative data in the nonresponse research, this study also undertakes an explicitly spatial analysis of the place-based characteristics associated with successful record linkages in the U.S.
View Full Paper PDF
Working Paper

Some Open Questions on Multiple-Source Extensions of Adaptive-Survey Design Concepts and Methods

February 2023

Authors: Stephanie Coffey, PhD., Jaya Damineni, John Eltinge, PhD., Anup Mathur, PhD., Kayla Varela, Allison Zotti

Working Paper Number:

CES-23-03

Adaptive survey design is a framework for making data-driven decisions about survey data collection operations. This paper discusses open questions related to the extension of adaptive principles and capabilities when capturing data from multiple data sources. Here, the concept of 'design' encompasses the focused allocation of resources required for the production of high-quality statistical information in a sustainable and cost-effective way. This conceptual framework leads to a discussion of six groups of issues including: (i) the goals for improvement through adaptation; (ii) the design features that are available for adaptation; (iii) the auxiliary data that may be available for informing adaptation; (iv) the decision rules that could guide adaptation; (v) the necessary systems to operationalize adaptation; and (vi) the quality, cost, and risk profiles of the proposed adaptations (and how to evaluate them). A multiple data source environment creates significant opportunities, but also introduces complexities that are a challenge in the production of high-quality statistical information.
View Full Paper PDF
Working Paper

Finding Needles in Haystacks: Multiple-Imputation Record Linkage Using Machine Learning

November 2021

Authors: Kristin McCue, John M. Abowd, Matthew D. Shapiro, Trivellore Raghunathan, Margaret C. Levenstein, Joelle Abramowitz, Dhiren Patki, Ann M. Rodgers, Nada Wasi, Dawn Zinsser

Working Paper Number:

CES-21-35

This paper considers the problem of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across establishments is highly skewed. To address these difficulties, this paper develops a probabilistic record linkage methodology that combines machine learning (ML) with multiple imputation (MI). This ML-MI methodology is applied to link survey respondents in the Health and Retirement Study to their workplaces in the Census Business Register. The linked data reveal new evidence that non-sampling errors in household survey data are correlated with respondents' workplace characteristics.
View Full Paper PDF
Working Paper

The Need to Account for Complex Sampling Features when Analyzing Establishment Survey Data: An Illustration using the 2013 Business Research and Development and Innovation Survey (BRDIS)

January 2017

Authors: Joseph W. Sakshaug, Brady T. West

Working Paper Number:

CES-17-62

The importance of correctly accounting for complex sampling features when generating finite population inferences based on complex sample survey data sets has now been clearly established in a variety of fields, including those in both statistical and non statistical domains. Unfortunately, recent studies of analytic error have suggested that many secondary analysts of survey data do not ultimately account for these sampling features when analyzing their data, for a variety of possible reasons (e.g., poor documentation, or a data producer may not provide the information in a publicuse data set). The research in this area has focused exclusively on analyses of household survey data, and individual respondents. No research to date has considered how analysts are approaching the data collected in establishment surveys, and whether published articles advancing science based on analyses of establishment behaviors and outcomes are correctly accounting for complex sampling features. This article presents alternative analyses of real data from the 2013 Business Research and Development and Innovation Survey (BRDIS), and shows that a failure to account for the complex design features of the sample underlying these data can lead to substantial differences in inferences about the target population of establishments for the BRDIS.
View Full Paper PDF
Working Paper

File Matching with Faulty Continuous Matching Variables

January 2017

Authors: Gale Boyd, Jerome P. Reiter, Nicole M. Dalzell

Working Paper Number:

CES-17-45

We present LFCMV, a Bayesian file linking methodology designed to link records using continuous matching variables in situations where we do not expect values of these matching variables to agree exactly across matched pairs. The method involves a linking model for the distance between the matching variables of records in one file and the matching variables of their linked records in the second. This linking model is conditional on a vector indicating the links. We specify a mixture model for the distance component of the linking model, as this latent structure allows the distance between matching variables in linked pairs to vary across types of linked pairs. Finally, we specify a model for the linking vector. We describe the Gibbs sampling algorithm for sampling from the posterior distribution of this linkage model and use artificial data to illustrate model performance. We also introduce a linking application using public survey information and data from the U.S. Census of Manufactures and use LFCMV to link the records.
View Full Paper PDF
Working Paper

The Management and Organizational Practices Survey (MOPS): Cognitive Testing*

January 2016

Authors: Scott Ohlmacher, Catherine Buffington, Kenny Herrell

Working Paper Number:

CES-16-53

All Census Bureau surveys must meet quality standards before they can be sent to the public for data collection. This paper outlines the pretesting process that was used to ensure that the Management and Organizational Practices Survey (MOPS) met those standards. The MOPS is the first large survey of management practices at U.S. manufacturing establishments. The first wave of the MOPS, issued for reference year 2010, was subject to internal expert review and two rounds of cognitive interviews. The results of this pretesting were used to make significant changes to the MOPS instrument and ensure that quality data was collected. The second wave of the MOPS, featuring new questions on data in decision making (DDD) and uncertainty and issued for reference year 2015, was subject to two rounds of cognitive interviews and a round of usability testing. This paper illustrates the effort undertaken by the Census Bureau to ensure that all surveys released into the field are of high quality and provides insight into how respondents interpret the MOPS questionnaire for those looking to utilize the MOPS data.
View Full Paper PDF
Working Paper

Matching Addresses between Household Surveys and Commercial Data

July 2015

Authors: Quentin Brummet

Working Paper Number:

carra-2015-04

Matching third-party data sources to household surveys can benefit household surveys in a number of ways, but the utility of these new data sources depends critically on our ability to link units between data sets. To understand this better, this report discusses potential modifications to the existing match process that could potentially improve our matches. While many changes to the matching procedure produce marginal improvements in match rates, substantial increases in match rates can only be achieved by relaxing the definition of a successful match. In the end, the results show that the most important factor determining the success of matching procedures is the quality and composition of the data sets being matched.
View Full Paper PDF
Working Paper

USING IMPUTATION TECHNIQUES TO EVALUATE STOPPING RULES IN ADAPTIVE SURVEY DESIGN

October 2014

Authors: Thais Paiva, Jerry Reiter

Working Paper Number:

CES-14-40

Adaptive Design methods for social surveys utilize the information from the data as it is collected to make decisions about the sampling design. In some cases, the decision is either to continue or stop the data collection. We evaluate this decision by proposing measures to compare the collected data with follow-up samples. The options are assessed by imputation of the nonrespondents under different missingness scenarios, including Missing Not at Random. The variation in the utility measures is compared to the cost induced by the follow-up sample sizes. We apply the proposed method to the 2007 U.S. Census of Manufacturers.
View Full Paper PDF
Working Paper

Evaluation of Commercial School and Teacher Lists to Enhance Survey Frames

July 2014

Authors: Quentin Brummet, Mark Masterto, Damon Smith

Working Paper Number:

carra-2014-07

This report summarizes the potential for teacher lists obtained from commercial vendors for enhancing sampling frames for the National Teacher and Principal Survey (NTPS). We investigate three separate vendor lists, and compare coverage rates across a range of school and teacher characteristics. Across all vendors, coverage rates are higher for regular, non-charter schools. Vendor A stands out as having higher coverage rates than the other two, and we recommend further evaluating Vendor A's teacher lists during the upcoming 2014-2015 NTPS Field Test.
View Full Paper PDF

1 2 Next Total Results: 13

Papers Containing Keywords(s): 'sample'

See Working Papers by Tag(s), Keywords(s), Author(s), or Search Text

Click here to search again

Frequently Occurring Concepts within this Search

No authors occur more than twice in this search.

Viewing papers 1 through 10 of 13

February 2025

Working Paper Number:

CES-25-13

September 2023

Working Paper Number:

CES-23-44

February 2023

Working Paper Number:

CES-23-03

November 2021

Working Paper Number:

CES-21-35

January 2017

Working Paper Number:

CES-17-62

January 2017

Working Paper Number:

CES-17-45

January 2016

Working Paper Number:

CES-16-53

July 2015

Working Paper Number:

carra-2015-04

October 2014

Working Paper Number:

CES-14-40

July 2014

Working Paper Number:

carra-2014-07