Commuting flows and workplace employment data have a wide constituency of users including urban and regional planners, social science and transportation researchers, and businesses. The U.S. Census Bureau releases two, national data products that give the magnitude and characteristics of home to work flows. The American Community Survey (ACS) tabulates households' responses on employment, workplace, and commuting behavior. The Longitudinal Employer-Household Dynamics (LEHD) program tabulates administrative records on jobs in the LEHD Origin-Destination Employment Statistics (LODES). Design differences across the datasets lead to divergence in a comparable statistic: county-to-county aggregate commute flows. To understand differences in the public use data, this study compares ACS and LEHD source files, using identifying information and probabilistic matching to join person and job records. In our assessment, we compare commuting statistics for job frames linked on person, employment status, employer, and workplace and we identify person and job characteristics as well as design features of the data frames that explain aggregate differences. We find a lower rate of within-county commuting and farther commutes in LODES. We attribute these greater distances to differences in workplace reporting and to uncertainty of establishment assignments in LEHD for workers at multi-unit employers. Minor contributing factors include differences in residence location and ACS workplace edits. The results of this analysis and the data infrastructure developed will support further work to understand and enhance commuting statistics in both datasets.
-
Developing a Residence Candidate File for Use With Employer-Employee Matched Data
January 2017
Working Paper Number:
CES-17-40
This paper describes the Longitudinal Employer-Household Dynamics (LEHD) program's ongoing efforts to use administrative records in a predictive model that describes residence locations for workers. This project was motivated by the discontinuation of a residence file produced elsewhere at the U.S. Census Bureau. The goal of the Residence Candidate File (RCF) process is to provide the LEHD Infrastructure Files with residence information that maintains currency with the changing state of administrative sources and represents uncertainty in location as a probability distribution. The discontinued file provided only a single residence per person/year, even when contributing administrative data may have contained multiple residences. This paper describes the motivation for the project, our methodology, the administrative data sources, the model estimation and validation results, and the file specifications. We find that the best prediction of the person-place model provides similar, but superior, accuracy compared with previous methods and performs well for workers in the LEHD jobs frame. We outline possibilities for further improvement in sources and modeling as well as recommendations on how to use the preference weights in downstream processing.
View Full
Paper PDF
-
The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators
January 2006
Working Paper Number:
tp-2006-01
The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau,
with the support of several national research agencies, has built a set of infrastructure files
using administrative data provided by state agencies, enhanced with information from other administrative
data sources, demographic and economic (business) surveys and censuses. The LEHD
Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their
interaction in the U.S. economy. Beginning in 2003 and building on this infrastructure, the Census
Bureau has published the Quarterly Workforce Indicators (QWI), a new collection of data series
that offers unprecedented detail on the local dynamics of labor markets. Despite the fine detail,
confidentiality is maintained due to the application of state-of-the-art confidentiality protection
methods. This article describes how the input files are compiled and combined to create the infrastructure
files. We describe the multiple imputation methods used to impute in missing data and
the statistical matching techniques used to combine and edit data when a direct identifier match
requires improvement. Both of these innovations are crucial to the success of the final product. Finally,
we pay special attention to the details of the confidentiality protection system used to protect
the identity and micro data values of the underlying entities used to form the published estimates.
We provide a brief description of public-use and restricted-access data files with pointers to further
documentation for researchers interested in using these data.
View Full
Paper PDF
-
Design Comparison of LODES and ACS Commuting Data Products
October 2014
Working Paper Number:
CES-14-38
The Census Bureau produces two complementary data products, the American Community Survey (ACS) commuting and workplace data and the Longitudinal Employer-Household Dynamics (LEHD) Origin-Destination Employment Statistics (LODES), which can be used to answer questions about spatial, economic, and demographic questions relating to workplaces and home-to-work flows. The products are complementary in the sense that they measure similar activities but each has important unique characteristics that provide information that the other measure cannot. As a result of questions from data users, the Census Bureau has created this document to highlight the major design differences between these two data products. This report guides users on the relative advantages of each data product for various analyses and helps explain differences that may arise when using the products.2,3
As an overview, these two data products are sourced from different inputs, cover different populations and time periods, are subject to different sets of edits and imputations, are released under different confidentiality protection mechanisms, and are tabulated at different geographic and characteristic levels. As a general rule, the two data products should not be expected to match exactly for arbitrary queries and may differ substantially for some queries.
Within this document, we compare the two data products by the design elements that were deemed most likely to contribute to differences in tabulated data. These elements are: Collection, Coverage, Geographic and Longitudinal Scope, Job Definition and Reference Period, Job and Worker Characteristics, Location Definitions (Workplace and Residence), Completeness of Geographic Information and Edits/Imputations, Geographic Tabulation Levels, Control Totals, Confidentiality Protection and Suppression, and Related
Public-Use Data Products.
An in-depth data analysis'in aggregate or with the microdata'between the two data products will be the subject of a future technical report. The Census Bureau has begun a pilot project to integrate ACS microdata with LEHD administrative data to develop an enhanced frame of employment status, place of work, and commuting. The Census Bureau will publish quality metrics for person match rates, residence and workplace match rates, and commute distance comparisons.
View Full
Paper PDF
-
Finding Needles in Haystacks: Multiple-Imputation Record Linkage Using Machine Learning
November 2021
Working Paper Number:
CES-21-35
This paper considers the problem of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across establishments is highly skewed. To address these difficulties, this paper develops a probabilistic record linkage methodology that combines machine learning (ML) with multiple imputation (MI). This ML-MI methodology is applied to link survey respondents in the Health and Retirement Study to their workplaces in the Census Business Register. The linked data reveal new evidence that non-sampling errors in household survey data are correlated with respondents' workplace characteristics.
View Full
Paper PDF
-
Revisions to the LEHD Establishment Imputation Procedure and Applications to Administrative Job Frame
September 2024
Working Paper Number:
CES-24-51
The Census Bureau is developing a 'job frame' to provide detailed job-level employment data across the U.S. through linked administrative records such as unemployment insurance and IRS W-2 filings. This working paper summarizes the research conducted by the job frame development team on modifying and extending the LEHD Unit-to-Worker (U2W) imputation procedure for the job frame prototype. It provides a conceptual overview of the U2W imputation method, highlighting key challenges and tradeoffs in its current application. The paper then presents four imputation methodologies and evaluates their performance in areas such as establishment assignment accuracy, establishment size matching, and job separation rates. The results show that all methodologies perform similarly in assigning workers to the correct establishment. Non-spell-based methodologies excel in matching establishment sizes, while spell-based methodologies perform better in accurately tracking separation rates.
View Full
Paper PDF
-
Optimal Probabilistic Record Linkage: Best Practice for Linking Employers in Survey and Administrative Data
March 2019
Working Paper Number:
CES-19-08
This paper illustrates an application of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across firms is highly asymmetric. To address these difficulties, this paper uses a supervised machine learning model to probabilistically link survey respondents in the Health and Retirement Study (HRS) with employers and establishments in the Census Business Register (BR) to create a new data source which we call the CenHRS. Multiple imputation is used to propagate uncertainty from the linkage step into subsequent analyses of the linked data. The linked data reveal new evidence that survey respondents' misreporting and selective nonresponse about employer characteristics are systematically correlated with wages.
View Full
Paper PDF
-
Recalculating... : How Uncertainty in Local Labor Market Definitions Affects Empirical Findings
January 2017
Working Paper Number:
CES-17-49R
This paper evaluates the use of commuting zones as a local labor market definition. We revisit Tolbert and Sizer (1996) and demonstrate the sensitivity of definitions to two features of the methodology: a cluster dissimilarity cutoff, or the count of clusters, and uncertainty in the input data. We show how these features impact empirical estimates using a standard application of commuting zones and an example from related literature. We conclude with advice to researchers on how to demonstrate the robustness of empirical findings to uncertainty in the definition of commuting zones
View Full
Paper PDF
-
Revisiting the Effects of Unemployment Insurance Extensions on Unemployment: A Measurement Error-Corrected Regression Discontinuity Approach
March 2016
Working Paper Number:
carra-2016-01
The extension of Unemployment Insurance (UI) benefits was a key policy response to the Great Recession. However, these benefit extensions may have had detrimental labor market effects. While evidence on the individual labor supply response indicates small effects on unemployment, recent work by Hagedorn et al. (2015) uses a county border pair identification strategy to find that the total effects inclusive of effects on labor demand are substantially larger. By focusing on variation within border county pairs, this identification strategy requires counties in the pairs to be similar in terms of unobservable factors. We explore this assumption using an alternative regression discontinuity approach that controls for changes in unobservables by distance to the border. To do so, we must account for measurement error induced by using county-level aggregates. These new results provide no evidence of a large change in unemployment induced by differences in UI generosity across state boundaries. Further analysis suggests that individuals respond to UI benefit differences across boundaries by targeting job search in high-benefit states, thereby raising concerns of treatment spillovers in this setting. Taken together, these two results suggest that the effect of UI benefit extensions on unemployment remains an open question.
View Full
Paper PDF
-
The Sensitivity of Economic Statistics to Coding Errors in Personal Identifiers
October 2002
Working Paper Number:
tp-2002-17
In this paper, we describe the sensitivity of small-cell flow statistics
to coding errors in the identity of the underlying entities. Specifically,
we present results based on a comparison of the U.S. Census Bureau's
Quarterly Workforce Indicators (QWI) before and after correcting for
such errors in SSN-based identifiers in the underlying individual wage
records. The correction used involves a novel application of existing
statistical matching techniques. It is found that even a very conservative
correction procedure has a sizable impact on the statistics. The
average bias ranges from 0.25 percent up to 15 percent for flow statistics,
and up to 5 percent for payroll aggregates.
View Full
Paper PDF
-
Workplace Concentration of Immigrants
November 2010
Working Paper Number:
CES-10-39R
To what extent do immigrants and the native-born work in separate workplaces? Do worker and employer characteristics explain the degree of workplace concentration? We explore these questions using a matched employer-employee database that extensively covers employers in selected MSAs. We find that immigrants are much more likely to have immigrant coworkers than are natives, and are particularly likely to work with their compatriots. We find much higher levels of concentration for small businesses than for large ones, that concentration varies substantially across industries, and that concentration is particularly high among immigrants with limited English skills. We also find evidence that neighborhood job networks are strongly positively associated with concentration. The effects of networks and language remain strong when type is defined by country of origin rather than simply immigrant status. The importance of these factors varies by immigrant country of origin'for example, not speaking English well has a particularly strong association with concentration for immigrants from Asian countries. Controlling for differences across MSAs, we find that observable employer and employee characteristics account for about half of the difference between immigrants and natives in the likelihood of having immigrant coworkers, with differences in industry, residential segregation and English speaking skills being the most important factors.
View Full
Paper PDF