CREAT - Census Bureau

RECOVERING THE ITEM-LEVEL EDIT AND IMPUTATION FLAGS IN THE 1977-1997 CENSUSES OF MANUFACTURES

September 2014

Written by: T. Kirk White

Working Paper Number:

CES-14-37

Abstract

As part of processing the Census of Manufactures, the Census Bureau edits some data items and imputes for missing data and some data that is deemed erroneous. Until recently it was difficult for researchers using the plant-level microdata to determine which data items were changed or imputed during the editing and imputation process, because the edit/imputation processing flags were not available to researchers. This paper describes the process of reconstructing the edit/imputation flags for variables in the 1977, 1982, 1987, 1992, and 1997 Censuses of Manufactures using recently recovered Census Bureau files. Thepaper also reports summary statistics for the percentage of cases that are imputed for key variables. Excluding plants with fewer than 5 employees, imputation rates for several key variables range from 8% to 54% for the manufacturing sector as a whole, and from 1% to 72% at the 2-digit SIC industry level.

Document Tags and Keywords

Keywords:

analysis, production, econometric, data, manufacturing, report, industrial, microdata, sale, manufacturer, sector, imputation, inventory, record, datasets, imputed

Tags:

Census of Manufactures, Service Annual Survey, Center for Economic Studies, Administrative Records, Permanent Plant Number, Chicago Census Research Data Center, Economic Census

Similar Working Papers

The 10 most similar working papers to the working paper 'RECOVERING THE ITEM-LEVEL EDIT AND IMPUTATION FLAGS IN THE 1977-1997 CENSUSES OF MANUFACTURES' are listed below in order of similarity.

Working Paper
🔥

Plant-Level Productivity and Imputation of Missing Data in the Census of Manufactures

January 2011

Authors: T. Kirk White, Amil Petrin, Jerome P. Reiter

Working Paper Number:

CES-11-02

In the U.S. Census of Manufactures, the Census Bureau imputes missing values using a combination of mean imputation, ratio imputation, and conditional mean imputation. It is wellknown that imputations based on these methods can result in underestimation of variability and potential bias in multivariate inferences. We show that this appears to be the case for the existing imputations in the Census of Manufactures. We then present an alternative strategy for handling the missing data based on multiple imputation. Specifically, we impute missing values via sequences of classification and regression trees, which offer a computationally straightforward and flexible approach for semi-automatic, large-scale multiple imputation. We also present an approach to evaluating these imputations based on posterior predictive checks. We use the multiple imputations, and the imputations currently employed by the Census Bureau, to estimate production function parameters and productivity dispersions. The results suggest that the two approaches provide quite different answers about productivity.
View Full Paper PDF
Working Paper

Redesigning the Longitudinal Business Database

May 2021

Authors: Martha Stinson, T. Kirk White, Teresa C. Fort, Christopher Goetz, Nathan Goldschlag, Melissa Chow, Elisabeth Ruth Perlman, James Lawrence

Working Paper Number:

CES-21-08

In this paper we describe the U.S. Census Bureau's redesign and production implementation of the Longitudinal Business Database (LBD) first introduced by Jarmin and Miranda (2002). The LBD is used to create the Business Dynamics Statistics (BDS), tabulations describing the entry, exit, expansion, and contraction of businesses. The new LBD and BDS also incorporate information formerly provided by the Statistics of U.S. Businesses program, which produced similar year-to-year measures of employment and establishment flows. We describe in detail how the LBD is created from curation of the input administrative data, longitudinal matching, retiming of economic census-year births and deaths, creation of vintage consistent industry codes and noise factors, and the creation and cleaning of each year of LBD data. This documentation is intended to facilitate the proper use and understanding of the data by both researchers with approved projects accessing the LBD microdata and those using the BDS tabulations.
View Full Paper PDF
Working Paper

LEHD Data Documentation LEHD-OVERVIEW-S2008-rev1

December 2011

Authors: Lars Vilhuber, Kevin L. McKinney

Working Paper Number:

CES-11-43

View Full Paper PDF
Working Paper

CONSTRUCTION OF REGIONAL INPUT-OUTPUT TABLES FROM ESTABLISHMENT-LEVEL MICRODATA: ILLINOIS, 1982

August 1993

Authors: Eduardo Martins

Working Paper Number:

CES-93-12

This paper presents a new method for use in the construction of hybrid regional input-output tables, based primarily on individual returns from the Census of Manufactures. Using this method, input- output tables can be completed at a fraction of the cost and time involved in the completion of a full survey table. Special attention is paid to secondary production, a problem often ignored by input-output analysts. A new method to handle secondary production is presented. The method reallocates the amount of secondary production and its associated inputs, on an establishment basis, based on the assumption that the input structure for any given commodity is determined not by the industry in which the commodity was produced, but by the commodity itself -- the commodity-based technology assumption. A biproportional adjustment technique is used to perform the reallocations.
View Full Paper PDF
Working Paper

Simultaneous Edit-Imputation for Continuous Microdata

December 2015

Authors: Jerome P. Reiter, Hang J. Kim, Lawrence H. Cox, Alan F. Karr, Quanli Wang

Working Paper Number:

CES-15-44

Many statistical organizations collect data that are expected to satisfy linear constraints; as examples, component variables should sum to total variables, and ratios of pairs of variables should be bounded by expert-specified constants. When reported data violate constraints, organizations identify and replace values potentially in error in a process known as edit-imputation. To date, most approaches separate the error localization and imputation steps, typically using optimization methods to identify the variables to change followed by hot deck imputation. We present an approach that fully integrates editing and imputation for continuous microdata under linear constraints. Our approach relies on a Bayesian hierarchical model that includes (i) a flexible joint probability model for the underlying true values of the data with support only on the set of values that satisfy all editing constraints, (ii) a model for latent indicators of the variables that are in error, and (iii) a model for the reported responses for variables in error. We illustrate the potential advantages of the Bayesian editing approach over existing approaches using simulation studies. We apply the model to edit faulty data from the 2007 U.S. Census of Manufactures. Supplementary materials for this article are available online.
View Full Paper PDF
Working Paper

The Census of Construction Industries Database

August 1998

Authors: Mark A Calabria

Working Paper Number:

CES-98-10

The Census of Construction Industries (CCI) is conducted every five years as part of the quinquennial Economic Census. The Census of Construction Industries covers all establishments with payroll that are engaged primarily in contract construction or construction on their own account for sale as defined in the Standard Industrial Classification Manual. As previously administered, the CCI is a partial census including all multi-establishments and all establishments with payroll above $480,000, one out of every five establishments with payroll between $480,000 and $120,000 and one out of eight remaining establishments. The resulting database contains for each year approximately 200,000 establishments in the building construction, heavy construction and special trade construction industrial classifications. This paper compares the content, survey procedures, and sample response of the 1982, 1987 and 1992 Censuses of Construction.
View Full Paper PDF
Working Paper

LEHD Infrastructure files in the Census RDC - Overview

June 2014

Authors: Lars Vilhuber, Kevin L. McKinney

Working Paper Number:

CES-14-26

The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau, with the support of several national research agencies, maintains a set of infrastructure files using administrative data provided by state agencies, enhanced with information from other administrative data sources, demographic and economic (business) surveys and censuses. The LEHD Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. This document describes the structure and content of the 2011 Snapshot of the LEHD Infrastructure files as they are made available in the Census Bureaus secure and restricted-access Research Data Center network. The document attempts to provide a comprehensive description of all researcher-accessible files, of their creation, and of any modifcations made to the files to facilitate researcher access.
View Full Paper PDF
Working Paper

Dynamically Consistent Noise Infusion and Partially Synthetic Data as Confidentiality Protection Measures for Related Time Series

July 2012

Authors: Lars Vilhuber, John M. Abowd, Kevin L. McKinney, Bryce Stephens, Simon Woodcock, Kaj Gittings

Working Paper Number:

CES-12-13

The Census Bureau's Quarterly Workforce Indicators (QWI) provide detailed quarterly statistics on employment measures such as worker and job flows, tabulated by worker characteristics in various combinations. The data are released for several levels of NAICS industries and geography, the lowest aggregation of the latter being counties. Disclosure avoidance methods are required to protect the information about individuals and businesses that contribute to the underlying data. The QWI disclosure avoidance mechanism we describe here relies heavily on the use of noise infusion through a permanent multiplicative noise distortion factor, used for magnitudes, counts, differences and ratios. There is minimal suppression and no complementary suppressions. To our knowledge, the release in 2003 of the QWI was the first large-scale use of noise infusion in any official statistical product. We show that the released statistics are analytically valid along several critical dimensions { measures are unbiased and time series properties are preserved. We provide an analysis of the degree to which confidentiality is protected. Furthermore, we show how the judicious use of synthetic data, injected into the tabulation process, can completely eliminate suppressions, maintain analytical validity, and increase the protection of the underlying confidential data.
View Full Paper PDF
Working Paper

Collaborative Micro-productivity Project: Establishment-Level Productivity Dataset, 1972-2020

December 2023

Authors: Cheryl Grim, Zoltan Wolf, Cody Tuttle, G. Jacob Blackwood, Rachel Nesbit

Working Paper Number:

CES-23-65

We describe the process for building the Collaborative Micro-productivity Project (CMP) microdata and calculating establishment-level productivity numbers. The documentation is for version 7 and the data cover the years 1972-2020. These data have been used in numerous research papers and are used to create the experimental public-use data product Dispersion Statistics on Productivity (DiSP).
View Full Paper PDF
Working Paper

Newly Recovered Microdata on U.S. Manufacturing Plants from the 1950s and 1960s: Some Early Glimpses

September 2011

Authors: Randy Becker, Cheryl Grim

Working Paper Number:

CES-11-29

Longitudinally-linked microdata on U.S. manufacturing plants are currently available to researchers for 1963, 1967, and 1972-2009. In this paper, we provide a first look at recently recovered manufacturing microdata files from the 1950s and 1960s. We describe their origins and background, discuss their contents, and begin to explore their sample coverage. We also begin to examine whether the available establishment identifier(s) allow record linking. Our preliminary analyses suggest that longitudinally-linked Annual Survey of Manufactures microdata from the mid-1950s through the present ' containing 16 years of additional data ' appears possible though challenging. While a great deal of work remains, we see tremendous value in extending the manufacturing microdata series back into time. With these data, new lines of research become possible and many others can be revisited.
View Full Paper PDF

RECOVERING THE ITEM-LEVEL EDIT AND IMPUTATION FLAGS IN THE 1977-1997 CENSUSES OF MANUFACTURES

September 2014

Working Paper Number:

CES-14-37

Abstract

Document Tags and Keywords

The 10 most similar working papers to the working paper 'RECOVERING THE ITEM-LEVEL EDIT AND IMPUTATION FLAGS IN THE 1977-1997 CENSUSES OF MANUFACTURES' are listed below in order of similarity.

January 2011

Working Paper Number:

CES-11-02

May 2021

Working Paper Number:

CES-21-08

December 2011

Working Paper Number:

CES-11-43

August 1993

Working Paper Number:

CES-93-12

December 2015

Working Paper Number:

CES-15-44

August 1998

Working Paper Number:

CES-98-10

June 2014

Working Paper Number:

CES-14-26

July 2012

Working Paper Number:

CES-12-13

December 2023

Working Paper Number:

CES-23-65

September 2011

Working Paper Number:

CES-11-29