CREAT - Census Bureau

Two Perspectives on Commuting: A Comparison of Home to Work Flows Across Job-Linked Survey and Administrative Files

January 2017

Written by: Lars Vilhuber, Mark J. Kutzbach, Andrew S. Green

Working Paper Number:

CES-17-34

Abstract

Commuting flows and workplace employment data have a wide constituency of users including urban and regional planners, social science and transportation researchers, and businesses. The U.S. Census Bureau releases two, national data products that give the magnitude and characteristics of home to work flows. The American Community Survey (ACS) tabulates households' responses on employment, workplace, and commuting behavior. The Longitudinal Employer-Household Dynamics (LEHD) program tabulates administrative records on jobs in the LEHD Origin-Destination Employment Statistics (LODES). Design differences across the datasets lead to divergence in a comparable statistic: county-to-county aggregate commute flows. To understand differences in the public use data, this study compares ACS and LEHD source files, using identifying information and probabilistic matching to join person and job records. In our assessment, we compare commuting statistics for job frames linked on person, employment status, employer, and workplace and we identify person and job characteristics as well as design features of the data frames that explain aggregate differences. We find a lower rate of within-county commuting and farther commutes in LODES. We attribute these greater distances to differences in workplace reporting and to uncertainty of establishment assignments in LEHD for workers at multi-unit employers. Minor contributing factors include differences in residence location and ACS workplace edits. The results of this analysis and the data infrastructure developed will support further work to understand and enhance commuting statistics in both datasets.

Document Tags and Keywords

Keywords:

employ, employed, metropolitan, workplace, workforce, residential, mobility, resident, home, residence, moving, migration, commute

Tags:

Internal Revenue Service, Bureau of Labor Statistics, National Science Foundation, Center for Economic Studies, Office of Management and Budget, Current Population Survey, Decennial Census, Cornell University, Unemployment Insurance, North American Industry Classification System, American Community Survey, Longitudinal Employer Household Dynamics, Protected Identification Key, Quarterly Workforce Indicators, Social and Economic Supplement, Quarterly Census of Employment and Wages, Composite Person Record, Local Employment Dynamics, Office of Personnel Management, Master Address File, University of Michigan, 2010 Census, Multiple Worksite Report, LODES, LEHD Origin-Destination Employment Statistics

Similar Working Papers

The 10 most similar working papers to the working paper 'Two Perspectives on Commuting: A Comparison of Home to Work Flows Across Job-Linked Survey and Administrative Files' are listed below in order of similarity.

Working Paper
🔥

LODES Design and Methodology Report: Methodology Version 7

August 2025

Authors: Matthew R. Graham, Mark J. Kutzbach, Andrew Foote

Working Paper Number:

CES-25-52

The purpose of this report is to document the important features of Version 7 of the LEHD Origin-Destination Employment Statistics (LODES) processing system. This includes data sources, data processing methodology, confidentiality protection methodology, some quality measures, and a high-level description of the published data. The intended audience for this document includes LODES data users, Local Employment Dynamics (LED) Partnership members, U.S. Census Bureau management, program quality auditors, and current and future research and development staff members.
View Full Paper PDF
Working Paper
🔥

Developing a Residence Candidate File for Use With Employer-Employee Matched Data

January 2017

Authors: Matthew R. Graham, Mark J. Kutzbach, Danielle H. Sandler

Working Paper Number:

CES-17-40

This paper describes the Longitudinal Employer-Household Dynamics (LEHD) program's ongoing efforts to use administrative records in a predictive model that describes residence locations for workers. This project was motivated by the discontinuation of a residence file produced elsewhere at the U.S. Census Bureau. The goal of the Residence Candidate File (RCF) process is to provide the LEHD Infrastructure Files with residence information that maintains currency with the changing state of administrative sources and represents uncertainty in location as a probability distribution. The discontinued file provided only a single residence per person/year, even when contributing administrative data may have contained multiple residences. This paper describes the motivation for the project, our methodology, the administrative data sources, the model estimation and validation results, and the file specifications. We find that the best prediction of the person-place model provides similar, but superior, accuracy compared with previous methods and performs well for workers in the LEHD jobs frame. We outline possibilities for further improvement in sources and modeling as well as recommendations on how to use the preference weights in downstream processing.
View Full Paper PDF
Working Paper
🔥

The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators

January 2006

Authors: Lars Vilhuber, John M. Abowd, Kevin L. McKinney, Bryce Stephens, Fredrik Andersson, Marc Roemer, Simon Woodcock

Working Paper Number:

tp-2006-01

The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau, with the support of several national research agencies, has built a set of infrastructure files using administrative data provided by state agencies, enhanced with information from other administrative data sources, demographic and economic (business) surveys and censuses. The LEHD Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. Beginning in 2003 and building on this infrastructure, the Census Bureau has published the Quarterly Workforce Indicators (QWI), a new collection of data series that offers unprecedented detail on the local dynamics of labor markets. Despite the fine detail, confidentiality is maintained due to the application of state-of-the-art confidentiality protection methods. This article describes how the input files are compiled and combined to create the infrastructure files. We describe the multiple imputation methods used to impute in missing data and the statistical matching techniques used to combine and edit data when a direct identifier match requires improvement. Both of these innovations are crucial to the success of the final product. Finally, we pay special attention to the details of the confidentiality protection system used to protect the identity and micro data values of the underlying entities used to form the published estimates. We provide a brief description of public-use and restricted-access data files with pointers to further documentation for researchers interested in using these data.
View Full Paper PDF
Working Paper

Design Comparison of LODES and ACS Commuting Data Products

October 2014

Authors: Matthew R. Graham, Mark J. Kutzbach, Brian McKenzie

Working Paper Number:

CES-14-38

The Census Bureau produces two complementary data products, the American Community Survey (ACS) commuting and workplace data and the Longitudinal Employer-Household Dynamics (LEHD) Origin-Destination Employment Statistics (LODES), which can be used to answer questions about spatial, economic, and demographic questions relating to workplaces and home-to-work flows. The products are complementary in the sense that they measure similar activities but each has important unique characteristics that provide information that the other measure cannot. As a result of questions from data users, the Census Bureau has created this document to highlight the major design differences between these two data products. This report guides users on the relative advantages of each data product for various analyses and helps explain differences that may arise when using the products.2,3 As an overview, these two data products are sourced from different inputs, cover different populations and time periods, are subject to different sets of edits and imputations, are released under different confidentiality protection mechanisms, and are tabulated at different geographic and characteristic levels. As a general rule, the two data products should not be expected to match exactly for arbitrary queries and may differ substantially for some queries. Within this document, we compare the two data products by the design elements that were deemed most likely to contribute to differences in tabulated data. These elements are: Collection, Coverage, Geographic and Longitudinal Scope, Job Definition and Reference Period, Job and Worker Characteristics, Location Definitions (Workplace and Residence), Completeness of Geographic Information and Edits/Imputations, Geographic Tabulation Levels, Control Totals, Confidentiality Protection and Suppression, and Related Public-Use Data Products. An in-depth data analysis'in aggregate or with the microdata'between the two data products will be the subject of a future technical report. The Census Bureau has begun a pilot project to integrate ACS microdata with LEHD administrative data to develop an enhanced frame of employment status, place of work, and commuting. The Census Bureau will publish quality metrics for person match rates, residence and workplace match rates, and commute distance comparisons.
View Full Paper PDF
Working Paper

Recalculating... : How Uncertainty in Local Labor Market Definitions Affects Empirical Findings

January 2017

Authors: Lars Vilhuber, Mark J. Kutzbach, Andrew Foote

Working Paper Number:

CES-17-49R

This paper evaluates the use of commuting zones as a local labor market definition. We revisit Tolbert and Sizer (1996) and demonstrate the sensitivity of definitions to two features of the methodology: a cluster dissimilarity cutoff, or the count of clusters, and uncertainty in the input data. We show how these features impact empirical estimates using a standard application of commuting zones and an example from related literature. We conclude with advice to researchers on how to demonstrate the robustness of empirical findings to uncertainty in the definition of commuting zones
View Full Paper PDF
Working Paper

Finding Needles in Haystacks: Multiple-Imputation Record Linkage Using Machine Learning

November 2021

Authors: Kristin McCue, John M. Abowd, Matthew D. Shapiro, Trivellore Raghunathan, Margaret C. Levenstein, Joelle Abramowitz, Dhiren Patki, Ann M. Rodgers, Nada Wasi, Dawn Zinsser

Working Paper Number:

CES-21-35

This paper considers the problem of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across establishments is highly skewed. To address these difficulties, this paper develops a probabilistic record linkage methodology that combines machine learning (ML) with multiple imputation (MI). This ML-MI methodology is applied to link survey respondents in the Health and Retirement Study to their workplaces in the Census Business Register. The linked data reveal new evidence that non-sampling errors in household survey data are correlated with respondents' workplace characteristics.
View Full Paper PDF
Working Paper

Confidentiality Protection in the Census Bureau Quarterly Workforce Indicators

February 2006

Authors: Lars Vilhuber, John M. Abowd, Bryce Stephens

Working Paper Number:

tp-2006-02

The QuarterlyWorkforce Indicators are new estimates developed by the Census Bureau's Longitudinal Employer-Household Dynamics Program as a part of its Local Employment Dynamics partnership with 37 state Labor Market Information offices. These data provide detailed quarterly statistics on employment, accessions, layoffs, hires, separations, full-quarter employment (and related flows), job creations, job destructions, and earnings (for flow and stock categories of workers). The data are released for NAICS industries (and 4-digit SICs) at the county, workforce investment board, and metropolitan area levels of geography. The confidential microdata - unemployment insurance wage records, ES-202 establishment employment, and Title 13 demographic and economic information - are protected using a permanent multiplicative noise distortion factor. This factor distorts all input sums, counts, differences and ratios. The released statistics are analytically valid - measures are unbiased and time series properties are preserved. The confidentiality protection is manifested in the release of some statistics that are flagged as "significantly distorted to preserve confidentiality." These statistics differ from the undistorted statistics by a significant proportion. Even for the significantly distorted statistics, the data remain analytically valid for time series properties. The released data can be aggregated; however, published aggregates are less distorted than custom postrelease aggregates. In addition to the multiplicative noise distortion, confidentiality protection is provided by the estimation process for the QWIs, which multiply imputes all missing data (including missing establishment, given UI account, in the UI wage record data) and dynamically re-weights the establishment data to provide state-level comparability with the BLS's Quarterly Census of Employment and Wages.
View Full Paper PDF
Working Paper

The Sensitivity of Economic Statistics to Coding Errors in Personal Identifiers

October 2002

Authors: Lars Vilhuber, John M. Abowd

Working Paper Number:

tp-2002-17

In this paper, we describe the sensitivity of small-cell flow statistics to coding errors in the identity of the underlying entities. Specifically, we present results based on a comparison of the U.S. Census Bureau's Quarterly Workforce Indicators (QWI) before and after correcting for such errors in SSN-based identifiers in the underlying individual wage records. The correction used involves a novel application of existing statistical matching techniques. It is found that even a very conservative correction procedure has a sizable impact on the statistics. The average bias ranges from 0.25 percent up to 15 percent for flow statistics, and up to 5 percent for payroll aggregates.
View Full Paper PDF
Working Paper

Workplace Concentration of Immigrants

November 2010

Authors: John Haltiwanger, Kristin McCue, Fredrik Andersson, Monica Garcia-Perez, Seth Sanders

Working Paper Number:

CES-10-39R

To what extent do immigrants and the native-born work in separate workplaces? Do worker and employer characteristics explain the degree of workplace concentration? We explore these questions using a matched employer-employee database that extensively covers employers in selected MSAs. We find that immigrants are much more likely to have immigrant coworkers than are natives, and are particularly likely to work with their compatriots. We find much higher levels of concentration for small businesses than for large ones, that concentration varies substantially across industries, and that concentration is particularly high among immigrants with limited English skills. We also find evidence that neighborhood job networks are strongly positively associated with concentration. The effects of networks and language remain strong when type is defined by country of origin rather than simply immigrant status. The importance of these factors varies by immigrant country of origin'for example, not speaking English well has a particularly strong association with concentration for immigrants from Asian countries. Controlling for differences across MSAs, we find that observable employer and employee characteristics account for about half of the difference between immigrants and natives in the likelihood of having immigrant coworkers, with differences in industry, residential segregation and English speaking skills being the most important factors.
View Full Paper PDF
Working Paper

Employees in the US Nonprofit Sector

May 2026

Authors: Stephanie Karol, Jennifer Mayo

Working Paper Number:

CES-26-33

The nonprofit sector employs roughly 10% of the American workforce, making it the third largest workforce behind the retail and manufacturing sectors. Despite this, relatively little is known about its employees. This paper is the first to use comprehensive administrative tax data, covering the near-universe of workers in the US, to quantify and explain the causes of the nonprofit pay differential. Unconditionally, we find the nonprofit earnings penalty to be 12% relative to for-profit workers. Estimating an 'AKM' worker-firm job ladder model, we show that most of the penalty is causal and not driven by selection. We also document considerable heterogeneity across industries, both in terms of earnings premia/penalties and worker selection, and show that nonprofit and for-profit earnings have been converging over time.
View Full Paper PDF

Two Perspectives on Commuting: A Comparison of Home to Work Flows Across Job-Linked Survey and Administrative Files

January 2017

Working Paper Number:

CES-17-34

Abstract

Document Tags and Keywords

The 10 most similar working papers to the working paper 'Two Perspectives on Commuting: A Comparison of Home to Work Flows Across Job-Linked Survey and Administrative Files' are listed below in order of similarity.

August 2025

Working Paper Number:

CES-25-52

January 2017

Working Paper Number:

CES-17-40

January 2006

Working Paper Number:

tp-2006-01

October 2014

Working Paper Number:

CES-14-38

January 2017

Working Paper Number:

CES-17-49R

November 2021

Working Paper Number:

CES-21-35

February 2006

Working Paper Number:

tp-2006-02

October 2002

Working Paper Number:

tp-2002-17

November 2010

Working Paper Number:

CES-10-39R

May 2026

Working Paper Number:

CES-26-33