CREAT - Census Bureau

Papers Containing Keywords(s): 'imputation model'

The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.

Click here to search again

Filter Working Papers By Year:

Frequently Occurring Concepts within this Search

Viewing papers 1 through 8 of 8

Working Paper

Manufacturing Dispersion: How Data Cleaning Choices Affect Measured Misallocation and Productivity Growth in the Annual Survey of Manufactures

September 2025

Authors: T. Kirk White, Martin Rotemberg, Hang Kim

Working Paper Number:

CES-25-67

Measurement of dispersion of productivity levels and productivity growth rates across businesses is a key input for answering a variety of important economic questions, such as understanding the allocation of economic inputs across businesses and over time. While item nonresponse is a readily quantifiable issue, we show there is also misreporting by respondents in the Annual Survey of Manufactures (ASM). Aware of these measurement issues, the Census Bureau edits and imputes survey responses before tabulation and dissemination. However, edit and imputation methods that are suitable for publishing aggregate totals may not be suitable for estimating other measures from the microdata. We show that the methods used dramatically affect estimates of productivity dispersion, allocative efficiency, and aggregate productivity growth. Using a Bayesian approach for editing and imputation, we model the joint distributions of all variables needed to estimate these measures, and we quantify the degree of uncertainty in the estimates due to imputations for faulty or missing data.
View Full Paper PDF
Working Paper

Finding Needles in Haystacks: Multiple-Imputation Record Linkage Using Machine Learning

November 2021

Authors: Kristin McCue, John M. Abowd, Matthew D. Shapiro, Trivellore Raghunathan, Margaret C. Levenstein, Joelle Abramowitz, Dhiren Patki, Ann M. Rodgers, Nada Wasi, Dawn Zinsser

Working Paper Number:

CES-21-35

This paper considers the problem of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across establishments is highly skewed. To address these difficulties, this paper develops a probabilistic record linkage methodology that combines machine learning (ML) with multiple imputation (MI). This ML-MI methodology is applied to link survey respondents in the Health and Retirement Study to their workplaces in the Census Business Register. The linked data reveal new evidence that non-sampling errors in household survey data are correlated with respondents' workplace characteristics.
View Full Paper PDF
Working Paper

Total Error and Variability Measures with Integrated Disclosure Limitation for Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in On The Map

January 2017

Authors: Lars Vilhuber, John M. Abowd, Kevin L. McKinney, Andrew S. Green

Working Paper Number:

CES-17-71

We report results from the rst comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total employment, beginning-of-quarter employment, full-quarter employment, total payroll, and average monthly earnings of full-quarter employees. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace reports as displayed in OnTheMap (OTM). The evaluation is conducted by generating multiple threads of the edit and imputation models used in the LEHD Infrastructure File System. These threads conform to the Rubin (1987) multiple imputation model, with each thread or implicate being the output of formal probability models that address coverage, edit, and imputation errors. Design-based sampling variability and nite population corrections are also included in the evaluation. We derive special formulas for the Rubin total variability and its components that are consistent with the disclosure avoidance system used for QWI and LODES/OTM workplace reports. These formulas allow us to publish the complete set of detailed total quality measures for QWI and LODES. The analysis reveals that the five publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs have quality in the range generally deemed acceptable. Tabulations involving zero, one or two jobs, which are generally suppressed in the QWI and synthesized in LODES, have substantial total variability but their publication in LODES allows the formation of larger custom aggregations, which will in general have the accuracy estimated for tabulations in the QWI based on a similar number of workers.
View Full Paper PDF
Working Paper

R&D, Attrition and Multiple Imputation in BRDIS

January 2017

Authors: Juana Sanchez, Sydney Noelle Kahmann

Working Paper Number:

CES-17-13

Multiple imputation in business establishment surveys like BRDIS, an annual business survey in which some companies are sampled every year or multiple years, may enhance the estimates of total R&D in addition to helping researchers estimate models with subpopulations of small sample size. Considering a panel of BRDIS companies throughout the years 2008 to 2013 linked to LBD data, this paper uses the conclusions obtained with missing data visualization and other explorations to come up with a strategy to conduct multiple imputation appropriate to address the item nonresponse in R&D expenditures. Because survey design characteristics are behind much of the item and unit nonresponse, multiple imputation of missing data in BRDIS changes the estimates of total R&D significantly and alters the conclusions reached by models of the determinants of R&D investment obtained with complete case analysis.
View Full Paper PDF
Working Paper

Simultaneous Edit-Imputation for Continuous Microdata

December 2015

Authors: Jerome P. Reiter, Hang J. Kim, Lawrence H. Cox, Alan F. Karr, Quanli Wang

Working Paper Number:

CES-15-44

Many statistical organizations collect data that are expected to satisfy linear constraints; as examples, component variables should sum to total variables, and ratios of pairs of variables should be bounded by expert-specified constants. When reported data violate constraints, organizations identify and replace values potentially in error in a process known as edit-imputation. To date, most approaches separate the error localization and imputation steps, typically using optimization methods to identify the variables to change followed by hot deck imputation. We present an approach that fully integrates editing and imputation for continuous microdata under linear constraints. Our approach relies on a Bayesian hierarchical model that includes (i) a flexible joint probability model for the underlying true values of the data with support only on the set of values that satisfy all editing constraints, (ii) a model for latent indicators of the variables that are in error, and (iii) a model for the reported responses for variables in error. We illustrate the potential advantages of the Bayesian editing approach over existing approaches using simulation studies. We apply the model to edit faulty data from the 2007 U.S. Census of Manufactures. Supplementary materials for this article are available online.
View Full Paper PDF
Working Paper

USING IMPUTATION TECHNIQUES TO EVALUATE STOPPING RULES IN ADAPTIVE SURVEY DESIGN

October 2014

Authors: Thais Paiva, Jerry Reiter

Working Paper Number:

CES-14-40

Adaptive Design methods for social surveys utilize the information from the data as it is collected to make decisions about the sampling design. In some cases, the decision is either to continue or stop the data collection. We evaluate this decision by proposing measures to compare the collected data with follow-up samples. The options are assessed by imputation of the nonrespondents under different missingness scenarios, including Missing Not at Random. The variation in the utility measures is compared to the cost induced by the follow-up sample sizes. We apply the proposed method to the 2007 U.S. Census of Manufacturers.
View Full Paper PDF
Working Paper

COMPARING METHODS FOR IMPUTING EMPLOYER HEALTH INSURANCE CONTRIBUTIONS IN THE CURRENT POPULATION SURVEY

August 2013

Authors: Alice Zawacki, Hubert P. Janicki, Brett O’Hara

Working Paper Number:

CES-13-41

The degree to which firms contribute to the payment of workers' health insurance premiums is an important consideration in the measurement of income and for understanding the potential impact of the 2010 Affordable Care Act on employment-based health insurance participation. Currently the U.S. Census Bureau imputes employer contributions in the Annual Social and Economic Supplement of the Current Population Survey based on data from the 1977 National Medical Care Expenditure Survey. The goal of this paper is to assess the extent to which this imputation methodology produces estimates reflective of the current distribution of employer contributions. The paper uses recent contributions data from the Medical Expenditure Panel Survey-Insurance Component to estimate a new model to inform the imputation procedure and to compare the resulting distribution of contributions. These new estimates are compared with those produced under current production methods across employee and employer characteristics.
View Full Paper PDF
Working Paper

Plant-Level Productivity and Imputation of Missing Data in the Census of Manufactures

January 2011

Authors: T. Kirk White, Amil Petrin, Jerome P. Reiter

Working Paper Number:

CES-11-02

In the U.S. Census of Manufactures, the Census Bureau imputes missing values using a combination of mean imputation, ratio imputation, and conditional mean imputation. It is wellknown that imputations based on these methods can result in underestimation of variability and potential bias in multivariate inferences. We show that this appears to be the case for the existing imputations in the Census of Manufactures. We then present an alternative strategy for handling the missing data based on multiple imputation. Specifically, we impute missing values via sequences of classification and regression trees, which offer a computationally straightforward and flexible approach for semi-automatic, large-scale multiple imputation. We also present an approach to evaluating these imputations based on posterior predictive checks. We use the multiple imputations, and the imputations currently employed by the Census Bureau, to estimate production function parameters and productivity dispersions. The results suggest that the two approaches provide quite different answers about productivity.
View Full Paper PDF

1 Total Results: 8

Papers Containing Keywords(s): 'imputation model'

See Working Papers by Tag(s), Keywords(s), Author(s), or Search Text

Click here to search again

Frequently Occurring Concepts within this Search

Viewing papers 1 through 8 of 8

September 2025

Working Paper Number:

CES-25-67

November 2021

Working Paper Number:

CES-21-35

January 2017

Working Paper Number:

CES-17-71

January 2017

Working Paper Number:

CES-17-13

December 2015

Working Paper Number:

CES-15-44

October 2014

Working Paper Number:

CES-14-40

August 2013

Working Paper Number:

CES-13-41

January 2011

Working Paper Number:

CES-11-02