The Census Bureau is developing a 'job frame' to provide detailed job-level employment data across the U.S. through linked administrative records such as unemployment insurance and IRS W-2 filings. This working paper summarizes the research conducted by the job frame development team on modifying and extending the LEHD Unit-to-Worker (U2W) imputation procedure for the job frame prototype. It provides a conceptual overview of the U2W imputation method, highlighting key challenges and tradeoffs in its current application. The paper then presents four imputation methodologies and evaluates their performance in areas such as establishment assignment accuracy, establishment size matching, and job separation rates. The results show that all methodologies perform similarly in assigning workers to the correct establishment. Non-spell-based methodologies excel in matching establishment sizes, while spell-based methodologies perform better in accurately tracking separation rates.
-
Two Perspectives on Commuting: A Comparison of Home to Work Flows Across Job-Linked Survey and Administrative Files
January 2017
Working Paper Number:
CES-17-34
Commuting flows and workplace employment data have a wide constituency of users including urban and regional planners, social science and transportation researchers, and businesses. The U.S. Census Bureau releases two, national data products that give the magnitude and characteristics of home to work flows. The American Community Survey (ACS) tabulates households' responses on employment, workplace, and commuting behavior. The Longitudinal Employer-Household Dynamics (LEHD) program tabulates administrative records on jobs in the LEHD Origin-Destination Employment Statistics (LODES). Design differences across the datasets lead to divergence in a comparable statistic: county-to-county aggregate commute flows. To understand differences in the public use data, this study compares ACS and LEHD source files, using identifying information and probabilistic matching to join person and job records. In our assessment, we compare commuting statistics for job frames linked on person, employment status, employer, and workplace and we identify person and job characteristics as well as design features of the data frames that explain aggregate differences. We find a lower rate of within-county commuting and farther commutes in LODES. We attribute these greater distances to differences in workplace reporting and to uncertainty of establishment assignments in LEHD for workers at multi-unit employers. Minor contributing factors include differences in residence location and ACS workplace edits. The results of this analysis and the data infrastructure developed will support further work to understand and enhance commuting statistics in both datasets.
View Full
Paper PDF
-
Developing a Residence Candidate File for Use With Employer-Employee Matched Data
January 2017
Working Paper Number:
CES-17-40
This paper describes the Longitudinal Employer-Household Dynamics (LEHD) program's ongoing efforts to use administrative records in a predictive model that describes residence locations for workers. This project was motivated by the discontinuation of a residence file produced elsewhere at the U.S. Census Bureau. The goal of the Residence Candidate File (RCF) process is to provide the LEHD Infrastructure Files with residence information that maintains currency with the changing state of administrative sources and represents uncertainty in location as a probability distribution. The discontinued file provided only a single residence per person/year, even when contributing administrative data may have contained multiple residences. This paper describes the motivation for the project, our methodology, the administrative data sources, the model estimation and validation results, and the file specifications. We find that the best prediction of the person-place model provides similar, but superior, accuracy compared with previous methods and performs well for workers in the LEHD jobs frame. We outline possibilities for further improvement in sources and modeling as well as recommendations on how to use the preference weights in downstream processing.
View Full
Paper PDF
-
The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators
January 2006
Working Paper Number:
tp-2006-01
The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau,
with the support of several national research agencies, has built a set of infrastructure files
using administrative data provided by state agencies, enhanced with information from other administrative
data sources, demographic and economic (business) surveys and censuses. The LEHD
Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their
interaction in the U.S. economy. Beginning in 2003 and building on this infrastructure, the Census
Bureau has published the Quarterly Workforce Indicators (QWI), a new collection of data series
that offers unprecedented detail on the local dynamics of labor markets. Despite the fine detail,
confidentiality is maintained due to the application of state-of-the-art confidentiality protection
methods. This article describes how the input files are compiled and combined to create the infrastructure
files. We describe the multiple imputation methods used to impute in missing data and
the statistical matching techniques used to combine and edit data when a direct identifier match
requires improvement. Both of these innovations are crucial to the success of the final product. Finally,
we pay special attention to the details of the confidentiality protection system used to protect
the identity and micro data values of the underlying entities used to form the published estimates.
We provide a brief description of public-use and restricted-access data files with pointers to further
documentation for researchers interested in using these data.
View Full
Paper PDF
-
LEHD Snapshot Documentation, Release S2021_R2022Q4
November 2022
Working Paper Number:
CES-22-51
The Longitudinal Employer-Household Dynamics (LEHD) data at the U.S. Census Bureau is a quarterly database of linked employer-employee data covering over 95% of employment in the United States. These data are used to produce a number of public-use tabulations and tools, including the Quarterly Workforce Indicators (QWI), LEHD Origin-Destination Employment Statistics (LODES), Job-to-Job Flows (J2J), and Post-Secondary Employment Outcomes (PSEO) data products. Researchers on approved projects may also access the underlying LEHD microdata directly, in the form of the LEHD Snapshot restricted-use data product. This document provides a detailed overview of the LEHD Snapshot as of release S2021_R2022Q4, including user guidance, variable codebooks, and an overview of the approvals needed to obtain access. Updates to the documentation for this and future snapshot releases will be made available in HTML format on the LEHD website.
View Full
Paper PDF
-
The Sensitivity of Economic Statistics to Coding Errors in Personal Identifiers
October 2002
Working Paper Number:
tp-2002-17
In this paper, we describe the sensitivity of small-cell flow statistics
to coding errors in the identity of the underlying entities. Specifically,
we present results based on a comparison of the U.S. Census Bureau's
Quarterly Workforce Indicators (QWI) before and after correcting for
such errors in SSN-based identifiers in the underlying individual wage
records. The correction used involves a novel application of existing
statistical matching techniques. It is found that even a very conservative
correction procedure has a sizable impact on the statistics. The
average bias ranges from 0.25 percent up to 15 percent for flow statistics,
and up to 5 percent for payroll aggregates.
View Full
Paper PDF
-
Introducing the Medical Expenditure Panel Survey-Insurance Component with Administrative Records (MEPS-ICAR): Description, Data Construction Methodology, and Quality Assessment
August 2022
Working Paper Number:
CES-22-29
This report introduces a new dataset, the Medical Expenditure Panel Survey-Insurance Component with Administrative Records (MEPS-ICAR), consisting of MEPS-IC survey data on establishments and their health insurance benefits packages linked to Decennial Census data and administrative tax records on MEPS-IC establishments' workforces. These data include new measures of the characteristics of MEPS-IC establishments' parent firms, employee turnover, the full distribution of MEPS-IC workers' personal and family incomes, the geographic locations where those workers live, and improved workforce demographic detail. Next, this report details the methods used for producing the MEPS-ICAR. Broadly, the linking process begins by matching establishments' parent firms to their workforces using identifiers appearing in tax records. The linking process concludes by matching establishments to their own workforces by identifying the subset of their parent firm's workforce that best matches the expected size, total payroll, and residential geographic distribution of the establishment's workforce. Finally, this report presents statistics characterizing the match rate and the MEPS-ICAR data itself. Key results include that match rates are consistently high (exceeding 90%) across nearly all data subgroups and that the matched data exhibit a reasonable distribution of employment, payroll, and worker commute distances relative to expectations and external benchmarks. Notably, employment measures derived from tax records, but not used in the match itself, correspond with high fidelity to the employment levels that establishments report in the MEPS-IC. Cumulatively, the construction of the MEPS-ICAR significantly expands the capabilities of the MEPS-IC and presents many opportunities for analysts.
View Full
Paper PDF
-
LEHD Infrastructure S2014 files in the FSRDC
September 2018
Working Paper Number:
CES-18-27R
The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau, with the support of several national research agencies, maintains a set of infrastructure files using administrative data provided by state agencies, enhanced with information from other administrative data sources, demographic and economic (business) surveys and censuses. The LEHD Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. This document describes the structure and content of the 2014 Snapshot of the LEHD Infrastructure files as they are made available in the Census Bureau's secure and restricted-access Research Data Center network. The document attempts to provide a comprehensive description of all researcher-accessible files, of their creation, and of any modifications made to the files to facilitate researcher access.
View Full
Paper PDF
-
Estimation of Job-to-Job Flow Rates under Partially Missing Geography
September 2012
Working Paper Number:
CES-12-29
Integration of data from different regions presents challenges for the calculation of entitylevel longitudinal statistics with a strong geographic component: for example, movements between employers, migration, business dynamics, and health statistics. In this paper, we consider the estimation of worker-level employment statistics when the geographies (in our application, US states) over which such measures are defined are partially missing. We focus on the recent pilot set of job-to-job flow statistics produced by the US Census Bureau's Longitudinal Employer- Household Dynamics (LEHD) program, which measure the frequency of worker movements between jobs and into and out of nonemployment. LEHD's coverage of the labor force gradually increases during the 1990s and 2000s because some states have a longer time series than others, so employment transitions involving missing states are only partially or not at all observed. We propose and implement a method for estimating national-level job-to-job flow statistics that involves dropping observed states to recover the relationship between missing states and directly tabulated job-to-job flow rates. Using the estimated relationship between the observable characteristics of the missing states and changes in the employment measures, we provide estimates of the rates of job-to-job, and job-to-nonemployment, job-to-nonemploymentto- job flows were all states uniformly available.
View Full
Paper PDF
-
Escaping poverty for low-wage workers The role of employer characteristics and changes
June 2001
Working Paper Number:
tp-2001-02
View Full
Paper PDF
-
JOB-TO-JOB (J2J) Flows: New Labor Market Statistics From Linked Employer-Employee Data
September 2014
Working Paper Number:
CES-14-34
Flows of workers across jobs are a principal mechanism by which labor markets allocate workers to optimize productivity. While these job flows are both large and economically important, they represent a significant gap in available economic statistics. A soon to be released data product from the U.S. Census Bureau will fill this gap. The Job-to-Job (J2J) flow statistics provide estimates of worker flows across jobs, across different geographic labor markets, by worker and firm characteristics, including direct job-to-job flows as well as job changes with intervening nonemployment. In this paper, we describe the creation of the public-use data product on job-to-job flows. The data underlying the statistics are the matched employer-employee data from the U.S. Census Bureau's Longitudinal Employer-Household Dynamics program. We describe definitional issues and the identification strategy for tracing worker movements between employers in administrative data. We then compare our data with related series and discuss similarities and differences. Lastly, we describe disclosure avoidance techniques for the public use file, and our methodology for estimating national statistics when there is partially missing geography.
View Full
Paper PDF