The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau, with the support of several national research agencies, maintains a set of infrastructure files using administrative data provided by state agencies, enhanced with information from other administrative data sources, demographic and economic (business) surveys and censuses. The LEHD Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. This document describes the structure and content of the 2011 Snapshot of the LEHD Infrastructure files as they are made available in the Census Bureaus secure and restricted-access Research Data Center network. The document attempts to provide a comprehensive description of all researcher-accessible files, of their creation, and of any modifcations made to the files to facilitate researcher access.
-
LEHD Infrastructure Files in the Census RDC: Overview of S2004 Snapshot
April 2011
Working Paper Number:
CES-11-13
The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau, with the support of several national research agencies, has built a set of infrastructure files using administrative data provided by state agencies, enhanced with information from other administrative data sources, demographic and economic (business) surveys and censuses. The LEHD Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. This document describes the structure and content of the 2004 Snapshot of the LEHD Infrastructure files as they are made available in the Census Bureau's Research Data Center network.
View Full
Paper PDF
-
LEHD Data Documentation LEHD-OVERVIEW-S2008-rev1
December 2011
Working Paper Number:
CES-11-43
View Full
Paper PDF
-
LEHD Infrastructure S2014 files in the FSRDC
September 2018
Working Paper Number:
CES-18-27R
The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau, with the support of several national research agencies, maintains a set of infrastructure files using administrative data provided by state agencies, enhanced with information from other administrative data sources, demographic and economic (business) surveys and censuses. The LEHD Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. This document describes the structure and content of the 2014 Snapshot of the LEHD Infrastructure files as they are made available in the Census Bureau's secure and restricted-access Research Data Center network. The document attempts to provide a comprehensive description of all researcher-accessible files, of their creation, and of any modifications made to the files to facilitate researcher access.
View Full
Paper PDF
-
LEHD Snapshot Documentation, Release S2021_R2022Q4
November 2022
Working Paper Number:
CES-22-51
The Longitudinal Employer-Household Dynamics (LEHD) data at the U.S. Census Bureau is a quarterly database of linked employer-employee data covering over 95% of employment in the United States. These data are used to produce a number of public-use tabulations and tools, including the Quarterly Workforce Indicators (QWI), LEHD Origin-Destination Employment Statistics (LODES), Job-to-Job Flows (J2J), and Post-Secondary Employment Outcomes (PSEO) data products. Researchers on approved projects may also access the underlying LEHD microdata directly, in the form of the LEHD Snapshot restricted-use data product. This document provides a detailed overview of the LEHD Snapshot as of release S2021_R2022Q4, including user guidance, variable codebooks, and an overview of the approvals needed to obtain access. Updates to the documentation for this and future snapshot releases will be made available in HTML format on the LEHD website.
View Full
Paper PDF
-
The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators
January 2006
Working Paper Number:
tp-2006-01
The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau,
with the support of several national research agencies, has built a set of infrastructure files
using administrative data provided by state agencies, enhanced with information from other administrative
data sources, demographic and economic (business) surveys and censuses. The LEHD
Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their
interaction in the U.S. economy. Beginning in 2003 and building on this infrastructure, the Census
Bureau has published the Quarterly Workforce Indicators (QWI), a new collection of data series
that offers unprecedented detail on the local dynamics of labor markets. Despite the fine detail,
confidentiality is maintained due to the application of state-of-the-art confidentiality protection
methods. This article describes how the input files are compiled and combined to create the infrastructure
files. We describe the multiple imputation methods used to impute in missing data and
the statistical matching techniques used to combine and edit data when a direct identifier match
requires improvement. Both of these innovations are crucial to the success of the final product. Finally,
we pay special attention to the details of the confidentiality protection system used to protect
the identity and micro data values of the underlying entities used to form the published estimates.
We provide a brief description of public-use and restricted-access data files with pointers to further
documentation for researchers interested in using these data.
View Full
Paper PDF
-
Redesigning the Longitudinal Business Database
May 2021
Working Paper Number:
CES-21-08
In this paper we describe the U.S. Census Bureau's redesign and production implementation of the Longitudinal Business Database (LBD) first introduced by Jarmin and Miranda (2002). The LBD is used to create the Business Dynamics Statistics (BDS), tabulations describing the entry, exit, expansion, and contraction of businesses. The new LBD and BDS also incorporate information formerly provided by the Statistics of U.S. Businesses program, which produced similar year-to-year measures of employment and establishment flows. We describe in detail how the LBD is created from curation of the input administrative data, longitudinal matching, retiming of economic census-year births and deaths, creation of vintage consistent industry codes and noise factors, and the creation and cleaning of each year of LBD data. This documentation is intended to facilitate the proper use and understanding of the data by both researchers with approved projects accessing the LBD microdata and those using the BDS tabulations.
View Full
Paper PDF
-
Dynamically Consistent Noise Infusion and Partially Synthetic Data as Confidentiality Protection Measures for Related Time Series
July 2012
Working Paper Number:
CES-12-13
The Census Bureau's Quarterly Workforce Indicators (QWI) provide detailed quarterly statistics on employment measures such as worker and job flows, tabulated by worker characteristics in various combinations. The data are released for several levels of NAICS industries and geography, the lowest aggregation of the latter being counties. Disclosure avoidance methods are required to protect the information about individuals and businesses that contribute to the underlying data. The QWI disclosure avoidance mechanism we describe here relies heavily on the use of noise infusion through a permanent multiplicative noise distortion factor, used for magnitudes, counts, differences and ratios. There is minimal suppression and no complementary suppressions. To our knowledge, the release in 2003 of the QWI was the first large-scale use of noise infusion in any official statistical product. We show that the released statistics are analytically valid along several critical dimensions { measures are unbiased and time series properties are preserved. We provide an analysis of the degree to which confidentiality is protected. Furthermore, we show how the judicious use of synthetic data, injected into the tabulation process, can completely eliminate suppressions, maintain analytical validity, and increase the protection of the underlying confidential data.
View Full
Paper PDF
-
The Creation of the Employment Dynamics Estimates
July 2002
Working Paper Number:
tp-2002-13
View Full
Paper PDF
-
A FIRST STEP TOWARDS A GERMAN SYNLBD: CONSTRUCTING A GERMAN LONGITUDINAL BUSINESS DATABASE
February 2014
Working Paper Number:
CES-14-13
One major criticism against the use of synthetic data has been that the efforts necessary to generate useful synthetic data are so in- tense that many statistical agencies cannot afford them. We argue many lessons in this evolving field have been learned in the early years of synthetic data generation, and can be used in the development of new synthetic data products, considerably reducing the required in- vestments. The final goal of the project described in this paper will be to evaluate whether synthetic data algorithms developed in the U.S. to generate a synthetic version of the Longitudinal Business Database (LBD) can easily be transferred to generate a similar data product for other countries. We construct a German data product with infor- mation comparable to the LBD - the German Longitudinal Business Database (GLBD) - that is generated from different administrative sources at the Institute for Employment Research, Germany. In a fu- ture step, the algorithms developed for the synthesis of the LBD will be applied to the GLBD. Extensive evaluations will illustrate whether the algorithms provide useful synthetic data without further adjustment. The ultimate goal of the project is to provide access to multiple synthetic datasets similar to the SynLBD at Cornell to enable comparative studies between countries. The Synthetic GLBD is a first step towards that goal.
View Full
Paper PDF
-
Total Error and Variability Measures with Integrated Disclosure Limitation for Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in On The Map
January 2017
Working Paper Number:
CES-17-71
We report results from the rst comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total employment, beginning-of-quarter employment, full-quarter employment, total payroll, and average monthly earnings of full-quarter employees. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace reports as displayed in OnTheMap (OTM). The evaluation is conducted by generating multiple threads of the edit and imputation models used in the LEHD Infrastructure File System. These threads conform to the Rubin (1987) multiple imputation model, with each thread or implicate being the output of formal probability models that address coverage, edit, and imputation errors. Design-based sampling variability and nite population corrections are also included in the evaluation. We derive special formulas for the Rubin total variability and its components that are consistent with the disclosure avoidance system used for QWI and LODES/OTM workplace reports. These formulas allow us to publish the complete set of detailed total quality measures for QWI and LODES. The analysis reveals that the five publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs have quality in the range generally deemed acceptable. Tabulations involving zero, one or two jobs, which are generally suppressed in the QWI and synthesized in LODES, have substantial total variability but their publication in LODES allows the formation of larger custom aggregations, which will in general have the accuracy estimated for tabulations in the QWI based on a similar number of workers.
View Full
Paper PDF