CREAT - Census Bureau

Developing a Residence Candidate File for Use With Employer-Employee Matched Data

January 2017

Written by: Matthew R. Graham, Mark J. Kutzbach, Danielle H. Sandler

Working Paper Number:

CES-17-40

Abstract

This paper describes the Longitudinal Employer-Household Dynamics (LEHD) program's ongoing efforts to use administrative records in a predictive model that describes residence locations for workers. This project was motivated by the discontinuation of a residence file produced elsewhere at the U.S. Census Bureau. The goal of the Residence Candidate File (RCF) process is to provide the LEHD Infrastructure Files with residence information that maintains currency with the changing state of administrative sources and represents uncertainty in location as a probability distribution. The discontinued file provided only a single residence per person/year, even when contributing administrative data may have contained multiple residences. This paper describes the motivation for the project, our methodology, the administrative data sources, the model estimation and validation results, and the file specifications. We find that the best prediction of the person-place model provides similar, but superior, accuracy compared with previous methods and performs well for workers in the LEHD jobs frame. We outline possibilities for further improvement in sources and modeling as well as recommendations on how to use the preference weights in downstream processing.

Document Tags and Keywords

Keywords:

estimating, data, employee, employed, job, consolidated, imputation, department, workforce, worker, housing, residential, employer household, residence, datasets, reside

Tags:

Internal Revenue Service, Center for Economic Studies, Decennial Census, Housing and Urban Development, Postal Service, Department of Housing and Urban Development, American Community Survey, Longitudinal Employer Household Dynamics, Protected Identification Key, Employment History File, Employer Characteristics File, Individual Characteristics File, Department of Health and Human Services, Quarterly Workforce Indicators, NUMIDENT, Composite Person Record, Master Address File, 2010 Census, Probability Density Function, Indian Health Service, PIKed, MAFID, Center for Administrative Records Research and Applications, MAF-ARF, HHS, Selective Service System, LODES, LEHD Origin-Destination Employment Statistics

Similar Working Papers

The 10 most similar working papers to the working paper 'Developing a Residence Candidate File for Use With Employer-Employee Matched Data' are listed below in order of similarity.

Working Paper
🔥

Two Perspectives on Commuting: A Comparison of Home to Work Flows Across Job-Linked Survey and Administrative Files

January 2017

Authors: Lars Vilhuber, Mark J. Kutzbach, Andrew S. Green

Working Paper Number:

CES-17-34

Commuting flows and workplace employment data have a wide constituency of users including urban and regional planners, social science and transportation researchers, and businesses. The U.S. Census Bureau releases two, national data products that give the magnitude and characteristics of home to work flows. The American Community Survey (ACS) tabulates households' responses on employment, workplace, and commuting behavior. The Longitudinal Employer-Household Dynamics (LEHD) program tabulates administrative records on jobs in the LEHD Origin-Destination Employment Statistics (LODES). Design differences across the datasets lead to divergence in a comparable statistic: county-to-county aggregate commute flows. To understand differences in the public use data, this study compares ACS and LEHD source files, using identifying information and probabilistic matching to join person and job records. In our assessment, we compare commuting statistics for job frames linked on person, employment status, employer, and workplace and we identify person and job characteristics as well as design features of the data frames that explain aggregate differences. We find a lower rate of within-county commuting and farther commutes in LODES. We attribute these greater distances to differences in workplace reporting and to uncertainty of establishment assignments in LEHD for workers at multi-unit employers. Minor contributing factors include differences in residence location and ACS workplace edits. The results of this analysis and the data infrastructure developed will support further work to understand and enhance commuting statistics in both datasets.
View Full Paper PDF
Working Paper
🔥

LODES Design and Methodology Report: Methodology Version 7

August 2025

Authors: Matthew R. Graham, Mark J. Kutzbach, Andrew Foote

Working Paper Number:

CES-25-52

The purpose of this report is to document the important features of Version 7 of the LEHD Origin-Destination Employment Statistics (LODES) processing system. This includes data sources, data processing methodology, confidentiality protection methodology, some quality measures, and a high-level description of the published data. The intended audience for this document includes LODES data users, Local Employment Dynamics (LED) Partnership members, U.S. Census Bureau management, program quality auditors, and current and future research and development staff members.
View Full Paper PDF
Working Paper
🔥

The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators

January 2006

Authors: Lars Vilhuber, John M. Abowd, Kevin L. McKinney, Bryce Stephens, Fredrik Andersson, Marc Roemer, Simon Woodcock

Working Paper Number:

tp-2006-01

The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau, with the support of several national research agencies, has built a set of infrastructure files using administrative data provided by state agencies, enhanced with information from other administrative data sources, demographic and economic (business) surveys and censuses. The LEHD Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. Beginning in 2003 and building on this infrastructure, the Census Bureau has published the Quarterly Workforce Indicators (QWI), a new collection of data series that offers unprecedented detail on the local dynamics of labor markets. Despite the fine detail, confidentiality is maintained due to the application of state-of-the-art confidentiality protection methods. This article describes how the input files are compiled and combined to create the infrastructure files. We describe the multiple imputation methods used to impute in missing data and the statistical matching techniques used to combine and edit data when a direct identifier match requires improvement. Both of these innovations are crucial to the success of the final product. Finally, we pay special attention to the details of the confidentiality protection system used to protect the identity and micro data values of the underlying entities used to form the published estimates. We provide a brief description of public-use and restricted-access data files with pointers to further documentation for researchers interested in using these data.
View Full Paper PDF
Working Paper

Design Comparison of LODES and ACS Commuting Data Products

October 2014

Authors: Matthew R. Graham, Mark J. Kutzbach, Brian McKenzie

Working Paper Number:

CES-14-38

The Census Bureau produces two complementary data products, the American Community Survey (ACS) commuting and workplace data and the Longitudinal Employer-Household Dynamics (LEHD) Origin-Destination Employment Statistics (LODES), which can be used to answer questions about spatial, economic, and demographic questions relating to workplaces and home-to-work flows. The products are complementary in the sense that they measure similar activities but each has important unique characteristics that provide information that the other measure cannot. As a result of questions from data users, the Census Bureau has created this document to highlight the major design differences between these two data products. This report guides users on the relative advantages of each data product for various analyses and helps explain differences that may arise when using the products.2,3 As an overview, these two data products are sourced from different inputs, cover different populations and time periods, are subject to different sets of edits and imputations, are released under different confidentiality protection mechanisms, and are tabulated at different geographic and characteristic levels. As a general rule, the two data products should not be expected to match exactly for arbitrary queries and may differ substantially for some queries. Within this document, we compare the two data products by the design elements that were deemed most likely to contribute to differences in tabulated data. These elements are: Collection, Coverage, Geographic and Longitudinal Scope, Job Definition and Reference Period, Job and Worker Characteristics, Location Definitions (Workplace and Residence), Completeness of Geographic Information and Edits/Imputations, Geographic Tabulation Levels, Control Totals, Confidentiality Protection and Suppression, and Related Public-Use Data Products. An in-depth data analysis'in aggregate or with the microdata'between the two data products will be the subject of a future technical report. The Census Bureau has begun a pilot project to integrate ACS microdata with LEHD administrative data to develop an enhanced frame of employment status, place of work, and commuting. The Census Bureau will publish quality metrics for person match rates, residence and workplace match rates, and commute distance comparisons.
View Full Paper PDF
Working Paper

Estimating the U.S. Citizen Voting-Age Population (CVAP) Using Blended Survey Data, Administrative Record Data, and Modeling: Technical Report

April 2023

Authors: J. David Brown, Danielle H. Sandler, Lawrence Warren, Moises Yi, Misty L. Heggeness, Joseph L. Schafer, Matthew Spence, Marta Murray-Close, Carl Lieberman, Genevieve Denoeux, Lauren Medina

Working Paper Number:

CES-23-21

This report develops a method using administrative records (AR) to fill in responses for nonresponding American Community Survey (ACS) housing units rather than adjusting survey weights to account for selection of a subset of nonresponding housing units for follow-up interviews and for nonresponse bias. The method also inserts AR and modeling in place of edits and imputations for ACS survey citizenship item nonresponses. We produce Citizen Voting-Age Population (CVAP) tabulations using this enhanced CVAP method and compare them to published estimates. The enhanced CVAP method produces a 0.74 percentage point lower citizen share, and it is 3.05 percentage points lower for voting-age Hispanics. The latter result can be partly explained by omissions of voting-age Hispanic noncitizens with unknown legal status from ACS household responses. Weight adjustments may be less effective at addressing nonresponse bias under those conditions.
View Full Paper PDF
Working Paper

LEHD Infrastructure S2014 files in the FSRDC

September 2018

Authors: Lars Vilhuber

Working Paper Number:

CES-18-27R

The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau, with the support of several national research agencies, maintains a set of infrastructure files using administrative data provided by state agencies, enhanced with information from other administrative data sources, demographic and economic (business) surveys and censuses. The LEHD Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. This document describes the structure and content of the 2014 Snapshot of the LEHD Infrastructure files as they are made available in the Census Bureau's secure and restricted-access Research Data Center network. The document attempts to provide a comprehensive description of all researcher-accessible files, of their creation, and of any modifications made to the files to facilitate researcher access.
View Full Paper PDF
Working Paper

Integrating Multiple U.S. Census Bureau Data Assets to Create Standardized Profiles of Program Participants

January 2026

Authors: Joyce K. Hahn, Robert Dominy III, Samuel Glick, Katlyn King, MariTere Molinet, JJ Naddeo, Margaret Sabelhous, Aaron Weinstock

Working Paper Number:

CES-26-01

The Foundations for Evidence-Based Policymaking Act of 2018 (Evidence Act) directed federal agencies to systematically use data when making policy decisions. In response, the U.S. Census Bureau established the Evidence Group within its Center for Economic Studies (CES). With an interdisciplinary team of economists, sociologists, and statisticians, the Evidence Group can support the broader federal government in their efforts to use existing data to improve program operations without increasing respondent burden. For federal agencies administering social safety net and business assistance programs in particular, the team provides a no-cost evidence-building service that links program records to Census Bureau data assets and creates a series of standardized tables describing participants, their economic outcomes prior to program entry, and the communities where they live. These tables provide partner agencies with the detailed information they need to better understand their participants and potentially make their programs more accountable and effective in reaching their target populations. In this working paper, we describe the standardized tables themselves as well as the data assets available at the Census Bureau to create these tables, the data files produced by the table production process, and the methodology used to merge and harmonize data on participants and subsequently calculate unbiased and accurate estimates. We conclude with a brief discussion of steps taken to ensure confidentiality and data security. This documentation is intended to facilitate proper use and understanding of the standardized tables by partner agencies as well as researchers who are interested in leveraging these tools to explore characteristics of their samples of interest.
View Full Paper PDF
Working Paper

LEHD Snapshot Documentation, Release S2021_R2022Q4

November 2022

Authors: Kevin L. McKinney, Erika McEntarfer, Matthew R. Graham, Stephen Tibbets, Lee Tucker

Working Paper Number:

CES-22-51

The Longitudinal Employer-Household Dynamics (LEHD) data at the U.S. Census Bureau is a quarterly database of linked employer-employee data covering over 95% of employment in the United States. These data are used to produce a number of public-use tabulations and tools, including the Quarterly Workforce Indicators (QWI), LEHD Origin-Destination Employment Statistics (LODES), Job-to-Job Flows (J2J), and Post-Secondary Employment Outcomes (PSEO) data products. Researchers on approved projects may also access the underlying LEHD microdata directly, in the form of the LEHD Snapshot restricted-use data product. This document provides a detailed overview of the LEHD Snapshot as of release S2021_R2022Q4, including user guidance, variable codebooks, and an overview of the approvals needed to obtain access. Updates to the documentation for this and future snapshot releases will be made available in HTML format on the LEHD website.
View Full Paper PDF
Working Paper

Determination of the 2020 U.S. Citizen Voting Age Population (CVAP) Using Administrative Records and Statistical Methodology Technical Report

October 2020

Authors: John M. Abowd, J. David Brown, Lawrence Warren, Moises Yi, Misty L. Heggeness, William R. Bell, Michael B. Hawes, Andrew Keller, Vincent T. Mule Jr., Joseph L. Schafer, Matthew Spence

Working Paper Number:

CES-20-33

This report documents the efforts of the Census Bureau's Citizen Voting-Age Population (CVAP) Internal Expert Panel (IEP) and Technical Working Group (TWG) toward the use of multiple data sources to produce block-level statistics on the citizen voting-age population for use in enforcing the Voting Rights Act. It describes the administrative, survey, and census data sources used, and the four approaches developed for combining these data to produce CVAP estimates. It also discusses other aspects of the estimation process, including how records were linked across the multiple data sources, and the measures taken to protect the confidentiality of the data.
View Full Paper PDF
Working Paper

Experimental Capture/recapture Estimation Using Census and Administrative Data

June 2026

Authors: William R. Bell, Andrew Keller, Mary H. Mulry, Thomas Mule

Working Paper Number:

CES-26-38

This report expands upon the innovation of utilizing administrative records and third-party data implemented in the 2020 Census. The 2020 Census used administrative records and third-party data in address canvassing and nonresponse followup operations. The Census Bureau also has a long history of using administrative records of births, deaths, and other information to produce Demographic Analysis coverage estimates. Since 1980, the Census Bureau has produced capture-recapture coverage estimates by conducting an independent post-enumeration survey and utilizing dual system estimation approaches. This report presents the research results of attempting to see if administrative records and third-party data could be utilized to produce capture-recapture coverage estimates. This work uses an Expectation Maximization Log Linear Modeling approach previously researched by Statistics Netherlands and Statistics New Zealand. This report documents some of the experimental results from an evaluation that was part of the 2020 Census Program for Evaluation, Experiments, and Assessments.
View Full Paper PDF

Developing a Residence Candidate File for Use With Employer-Employee Matched Data

January 2017

Working Paper Number:

CES-17-40

Abstract

Document Tags and Keywords

The 10 most similar working papers to the working paper 'Developing a Residence Candidate File for Use With Employer-Employee Matched Data' are listed below in order of similarity.

January 2017

Working Paper Number:

CES-17-34

August 2025

Working Paper Number:

CES-25-52

January 2006

Working Paper Number:

tp-2006-01

October 2014

Working Paper Number:

CES-14-38

April 2023

Working Paper Number:

CES-23-21

September 2018

Working Paper Number:

CES-18-27R

January 2026

Working Paper Number:

CES-26-01

November 2022

Working Paper Number:

CES-22-51

October 2020

Working Paper Number:

CES-20-33

June 2026

Working Paper Number:

CES-26-38