CREAT - Census Bureau

Redesigning the Longitudinal Business Database

May 2021

Written by: Martha Stinson, T. Kirk White, Teresa C. Fort, Christopher Goetz, Nathan Goldschlag, Melissa Chow, Elisabeth Ruth Perlman, James Lawrence

Working Paper Number:

CES-21-08

Abstract

In this paper we describe the U.S. Census Bureau's redesign and production implementation of the Longitudinal Business Database (LBD) first introduced by Jarmin and Miranda (2002). The LBD is used to create the Business Dynamics Statistics (BDS), tabulations describing the entry, exit, expansion, and contraction of businesses. The new LBD and BDS also incorporate information formerly provided by the Statistics of U.S. Businesses program, which produced similar year-to-year measures of employment and establishment flows. We describe in detail how the LBD is created from curation of the input administrative data, longitudinal matching, retiming of economic census-year births and deaths, creation of vintage consistent industry codes and noise factors, and the creation and cleaning of each year of LBD data. This documentation is intended to facilitate the proper use and understanding of the data by both researchers with approved projects accessing the LBD microdata and those using the BDS tabulations.

Document Tags and Keywords

Keywords:

data, work census, information census, payroll, enterprise, database, data census, report, quarterly, census data, agency, acquisition, accounting, yearly, longitudinal, recession, incorporated, employment data, economic census, business data, establishments data, warehousing, businesses census, record, census years, use census, census use

Tags:

Metropolitan Statistical Area, Annual Survey of Manufactures, Standard Statistical Establishment List, Internal Revenue Service, Standard Industrial Classification, Bureau of Labor Statistics, Social Security Administration, Service Annual Survey, Small Business Administration, Longitudinal Research Database, Center for Economic Studies, County Business Patterns, National Establishment Time Series, Company Organization Survey, Longitudinal Business Database, Employer Identification Numbers, Economic Census, Research Data Center, North American Industry Classification System, Longitudinal Employer Household Dynamics, Business Register, W-2, Quarterly Workforce Indicators, Business Employment Dynamics, Disclosure Review Board, CAAA, Business Dynamics Statistics, Federal Statistical Research Data Center, COVID-19

Similar Working Papers

The 10 most similar working papers to the working paper 'Redesigning the Longitudinal Business Database' are listed below in order of similarity.

Working Paper
🔥

LEHD Data Documentation LEHD-OVERVIEW-S2008-rev1

December 2011

Authors: Lars Vilhuber, Kevin L. McKinney

Working Paper Number:

CES-11-43

View Full Paper PDF
Working Paper
🔥

The Longitudinal Business Database

July 2002

Authors: Ron Jarmin, Javier Miranda

Working Paper Number:

CES-02-17

As the largest federal statistical agency and primary collector of data on businesses, households and individuals, the Census Bureau each year conducts numerous surveys intended to provide statistics on a wide range of topics about the population and economy of the United States. The Census Bureau's decennial population and quinquennial economic censuses are unique, providing information on all U.S. households and business establishments, respectively.
View Full Paper PDF
Working Paper
🔥

IMPROVING THE SYNTHETIC LONGITUDINAL BUSINESS DATABASE

February 2014

Authors: Javier Miranda, Jerome P. Reiter, Satkartar K. Kinney

Working Paper Number:

CES-14-12

In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments' confidentiality. Agencies potentially can manage these risks by releasing synthetic microdata, i.e., individual establishment records simulated from statistical models de- signed to mimic the joint distribution of the underlying observed data. Previously, we used this approach to generate a public-use version'now available for public use'of the U. S. Census Bureau's Longitudinal Business Database (LBD), a longitudinal cen- sus of establishments dating back to 1976. While the synthetic LBD has proven to be a useful product, we now seek to improve and expand it by using new synthesis models and adding features. This article describes our efforts to create the second generation of the SynLBD, including synthesis procedures that we believe could be replicated in other contexts.
View Full Paper PDF
Working Paper
🔥

LEHD Infrastructure Files in the Census RDC: Overview of S2004 Snapshot

April 2011

Authors: Lars Vilhuber, Kevin L. McKinney

Working Paper Number:

CES-11-13

The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau, with the support of several national research agencies, has built a set of infrastructure files using administrative data provided by state agencies, enhanced with information from other administrative data sources, demographic and economic (business) surveys and censuses. The LEHD Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. This document describes the structure and content of the 2004 Snapshot of the LEHD Infrastructure files as they are made available in the Census Bureau's Research Data Center network.
View Full Paper PDF
Working Paper

Longitudinal Establishment And Enterprise Microdata (LEEM) Documentation

May 1998

Authors: Zoltan J Acs, Catherine Armington

Working Paper Number:

CES-98-09

This paper introduces and documents the new Longitudinal Enterprise and Establishment Microdata (LEEM) database, which has been constructed by Census' Economic Planning and Coordination Division under contract to the Office of Advocacy of the U.S. Small Business Administration. The LEEM links three years (1990, 1994, and 1995) of basic data for each private sector establishment with payroll in any of those years, along with data on the firm to which the establishment belongs each year. The LEEM data will facilitate both broader and more detailed analysis of patterns of job creation and destruction in the U.S., as well as research on the structure and dynamics of U.S. businesses. This paper provides documentation of the construction of LEEM data, summary data on most variables in the database, comparisons of the annual data with that of the nearly identical County Business Patterns, and distributions of establishments and their employment by the size of their firms. This is followed by a simple analysis of changes over time in the attributes of surviving establishments, and a brief discussion of turnover (business births and deaths) in the population and gross changes in employment associated with both establishment turnover and with surviving establishments. It concludes with a summary of the strengths and weaknesses of the LEEM.
View Full Paper PDF
Working Paper

A FIRST STEP TOWARDS A GERMAN SYNLBD: CONSTRUCTING A GERMAN LONGITUDINAL BUSINESS DATABASE

February 2014

Authors: Lars Vilhuber, Jorg Drechsler

Working Paper Number:

CES-14-13

One major criticism against the use of synthetic data has been that the efforts necessary to generate useful synthetic data are so in- tense that many statistical agencies cannot afford them. We argue many lessons in this evolving field have been learned in the early years of synthetic data generation, and can be used in the development of new synthetic data products, considerably reducing the required in- vestments. The final goal of the project described in this paper will be to evaluate whether synthetic data algorithms developed in the U.S. to generate a synthetic version of the Longitudinal Business Database (LBD) can easily be transferred to generate a similar data product for other countries. We construct a German data product with infor- mation comparable to the LBD - the German Longitudinal Business Database (GLBD) - that is generated from different administrative sources at the Institute for Employment Research, Germany. In a fu- ture step, the algorithms developed for the synthesis of the LBD will be applied to the GLBD. Extensive evaluations will illustrate whether the algorithms provide useful synthetic data without further adjustment. The ultimate goal of the project is to provide access to multiple synthetic datasets similar to the SynLBD at Cornell to enable comparative studies between countries. The Synthetic GLBD is a first step towards that goal.
View Full Paper PDF
Working Paper

Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database

February 2011

Authors: Arnold P Reznek, Ron Jarmin, Javier Miranda, John M. Abowd, Jerome P. Reiter, Satkartar K. Kinney

Working Paper Number:

CES-11-04

In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments\' confidentiality. One approach with the potential for overcoming these risks is to release synthetic data; that is, the released establishment data are simulated from statistical models designed to mimic the distributions of the underlying real microdata. In this article, we describe an application of this strategy to create a public use file for the Longitudinal Business Database, an annual economic census of establishments in the United States comprising more than 20 million records dating back to 1976. The U.S. Bureau of the Census and the Internal Revenue Service recently approved the release of these synthetic microdata for public use, making the synthetic Longitudinal Business Database the first-ever business microdata set publicly released in the United States. We describe how we created the synthetic data, evaluated analytical validity, and assessed disclosure risk.
View Full Paper PDF
Working Paper

LEHD Infrastructure files in the Census RDC - Overview

June 2014

Authors: Lars Vilhuber, Kevin L. McKinney

Working Paper Number:

CES-14-26

The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau, with the support of several national research agencies, maintains a set of infrastructure files using administrative data provided by state agencies, enhanced with information from other administrative data sources, demographic and economic (business) surveys and censuses. The LEHD Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. This document describes the structure and content of the 2011 Snapshot of the LEHD Infrastructure files as they are made available in the Census Bureaus secure and restricted-access Research Data Center network. The document attempts to provide a comprehensive description of all researcher-accessible files, of their creation, and of any modifcations made to the files to facilitate researcher access.
View Full Paper PDF
Working Paper

RECOVERING THE ITEM-LEVEL EDIT AND IMPUTATION FLAGS IN THE 1977-1997 CENSUSES OF MANUFACTURES

September 2014

Authors: T. Kirk White

Working Paper Number:

CES-14-37

As part of processing the Census of Manufactures, the Census Bureau edits some data items and imputes for missing data and some data that is deemed erroneous. Until recently it was difficult for researchers using the plant-level microdata to determine which data items were changed or imputed during the editing and imputation process, because the edit/imputation processing flags were not available to researchers. This paper describes the process of reconstructing the edit/imputation flags for variables in the 1977, 1982, 1987, 1992, and 1997 Censuses of Manufactures using recently recovered Census Bureau files. Thepaper also reports summary statistics for the percentage of cases that are imputed for key variables. Excluding plants with fewer than 5 employees, imputation rates for several key variables range from 8% to 54% for the manufacturing sector as a whole, and from 1% to 72% at the 2-digit SIC industry level.
View Full Paper PDF
Working Paper

Using Partially Synthetic Microdata to Protect Sensitive Cells in Business Statistics

February 2016

Authors: Javier Miranda, Lars Vilhuber

Working Paper Number:

CES-16-10

We describe and analyze a method that blends records from both observed and synthetic microdata into public-use tabulations on establishment statistics. The resulting tables use synthetic data only in potentially sensitive cells. We describe different algorithms, and present preliminary results when applied to the Census Bureau's Business Dynamics Statistics and Synthetic Longitudinal Business Database, highlighting accuracy and protection afforded by the method when compared to existing public-use tabulations (with suppressions).
View Full Paper PDF

Redesigning the Longitudinal Business Database

May 2021

Working Paper Number:

CES-21-08

Abstract

Document Tags and Keywords

The 10 most similar working papers to the working paper 'Redesigning the Longitudinal Business Database' are listed below in order of similarity.

December 2011

Working Paper Number:

CES-11-43

July 2002

Working Paper Number:

CES-02-17

February 2014

Working Paper Number:

CES-14-12

April 2011

Working Paper Number:

CES-11-13

May 1998

Working Paper Number:

CES-98-09

February 2014

Working Paper Number:

CES-14-13

February 2011

Working Paper Number:

CES-11-04

June 2014

Working Paper Number:

CES-14-26

September 2014

Working Paper Number:

CES-14-37

February 2016

Working Paper Number:

CES-16-10