CREAT - Census Bureau

Non-Random Assignment of Individual Identifiers and Selection into Linked Data: Implications for Research

January 2026

Written by: Liana Christin Landivar, Kyle Raze, Nicole Perales

Working Paper Number:

CES-26-06

Abstract

The U.S. Census Bureau's Person Identification Validation System facilitates anonymous linkages between survey and administrative records by assigning Protected Identification Keys (PIKs) to person records. While PIK assignment is generally accurate, some person records are not successfully assigned a PIK, which can lead to sample selection bias in analyses of linked data. Using the American Community Survey (ACS) and the Current Population Survey Annual Social and Economic Supplement (CPS ASEC) between 2005 and 2022, we corroborate and extend existing findings on the drivers of PIK assignment, showing that the rate of PIK assignment varies widely across socio-demographic subgroups. Using earnings as a test case, we then show that limiting a survey sample of wage earners to person records with PIKs or successful linkages to W-2 wage records tends to overestimate self-reported wage earnings, on average, indicative of linkage-induced selection bias. In a validation exercise, we demonstrate that reweighting methods, such as inverse probability weighting or entropy balancing, can mitigate this bias.

Document Tags and Keywords

Keywords:

estimating, survey, respondent, earnings, bias, matching, race, socioeconomic, census bureau, irs, sampling, household surveys, earner, linkage

Tags:

Internal Revenue Service, Bureau of Labor Statistics, Social Security Administration, Current Population Survey, American Community Survey, Social Security Number, Protected Identification Key, W-2, National Academy of Sciences, Social and Economic Supplement, Census Bureau Disclosure Review Board, PIKed, Person Validation System, Person Identification Validation System, Individual Taxpayer Identification Numbers, SSA Numident, COVID-19, CPS ASEC

Similar Working Papers

The 10 most similar working papers to the working paper 'Non-Random Assignment of Individual Identifiers and Selection into Linked Data: Implications for Research' are listed below in order of similarity.

Working Paper
🔥

The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey

April 2014

Authors: Adela Luque, J. David Brown, Brittany Bond, Amy B. O'Hara, Amy OHara

Working Paper Number:

carra-2014-08

Record linkage across survey and administrative records sources can greatly enrich data and improve their quality. The linkage can reduce respondent burden and nonresponse follow-up costs. This is particularly important in an era of declining survey response rates and tight budgets. Record linkage also creates statistical bias, however. The U.S. Census Bureau links person records through its Person Identification Validation System (PVS), assigning each record a Protected Identification Key (PIK). It is not possible to reliably assign a PIK to every record, either due to insufficient identifying information or because the information does not uniquely match any of the administrative records used in the person validation process. Non-random ability to assign a PIK can potentially inject bias into statistics using linked data. This paper studies the nature of this bias using the 2009 and 2010 American Community Survey (ACS). The ACS is well-suited for this analysis, as it contains a rich set of person characteristics that can describe the bias. We estimate probit models for whether a record is assigned a PIK. The results suggest that young children, minorities, residents of group quarters, immigrants, recent movers, low-income individuals, and non-employed individuals are less likely to receive a PIK using 2009 ACS. Changes to the PVS process in 2010 significantly addressed the young children deficit, attenuated the other biases, and increased the validated records share from 88.1 to 92.6 percent (person-weighted).
View Full Paper PDF
Working Paper

Differences in Disability Insurance Allowance Rates

August 2025

Authors: Joshua Mitchell, Daniel Thompson

Working Paper Number:

CES-25-54

Allowance rates for disability insurance applications vary by race and ethnicity, but it is unclear to what extent these differences are artifacts of other differing socio-economic and health characteristics, or selection issues in SSA's race and ethnicity data. This paper uses the 2015 American Community Survey linked to 2015-2019 SSA administrative data to investigate DI application allowance rates among non-Hispanic White, non-Hispanic Black, non-Hispanic Asian, non-Hispanic American Indian/Alaska Native, and Hispanic applicants aged 25-65. The analysis uses regression, propensity score matching, and inverse probability weighting to estimate differences in allowance rates among applicants who are similar on observable characteristics. Relative to raw comparisons, differences by race and ethnicity in multivariate analyses are substantially smaller in magnitude and are generally not statistically significant.
View Full Paper PDF
Working Paper

Foreign-Born and Native-Born Migration in the U.S.: Evidence from IRS Administrative and Census Survey Records

July 2018

Authors: Thomas B. Foster, Mark Ellis, Lee Fiorio

Working Paper Number:

carra-2018-07

This paper details efforts to link administrative records from the Internal Revenue Service (IRS) to American Community Survey (ACS) and 2010 Census microdata for the study of migration among foreign-born and native-born populations in the United States. Specifically, we (1) document our linkage strategy and methodology for inferring migration in IRS records; (2) model selection into and survival across IRS records to determine suitability for research applications; and (3) gauge the efficacy of the IRS records by demonstrating how they can be used to validate and potentially improve migration responses for native-born and foreign-born respondents in ACS microdata. Our results show little evidence of selection or survival bias in the IRS records, suggesting broad generalizability to the nation as a whole. Moreover, we find that the combined IRS 1040, 1099, and W2 records may provide important information on populations, such as the foreign-born, that may be difficult to reach with traditional Census Bureau surveys. Finally, while preliminary, the results of our comparison of IRS and ACS migration responses shows that IRS records may be useful in improving ACS migration measurement for respondents whose migration response is proxy, allocated, or imputed. Taking these results together, we discuss the potential application of our longitudinal IRS dataset to innovations in migration research on both the native-born and foreign-born populations of the United States.
View Full Paper PDF
Working Paper

Employment and Earnings Trajectories of HUD Program Participants

May 2026

Authors: Sonya R. Porter, Rachel M. Shattuck, Denise Flanagan-Doyle, Thomas B. Foster, Ethan Krohn, Kyle Raze, David Pritchard, Veronica Garrison, Jacqueline Bachand

Working Paper Number:

CES-26-31

Federal housing assistance programs, such as those run by the U.S. Department of Housing and Urban Development (HUD), have been shown to reduce rent burden and improve housing stability for program participants, which may in turn have downstream impacts on their labor market attachment and career trajectories. However, existing studies from individual cities or states provide mixed evidence on the association of housing assistance with labor market outcomes. By linking HUD administrative records to matched employee-employer earnings records from the Longitudinal Employer-Household Dynamics (LEHD) program, we document how the labor market trajectories of program participants change as they enter and exit federal housing assistance programs, examining outcomes over a 14-year window surrounding entry or exit. In our analysis of entry, we find that the employment rates and earnings of first-time HUD program participants begin to increase upon entering a HUD program, which represents a reversal of prior declining trends in these outcomes. Suggestive of a positive association, these increases in employment and earnings trends exceed those of low-income non-participants from the American Community Survey (ACS). In our analysis of exits, we find that program participants who eventually leave a HUD program have increasing pre-exit trends in employment and earnings that then flatten upon exiting. Comparing these negative changes in trend to the relatively stable trajectories of those who remain in HUD programs throughout the analysis suggests that exits are associated with diminished employment and earnings trajectories.
View Full Paper PDF
Working Paper

The Opportunities and Challenges of Linked IRS Administrative and Census Survey Records in the Study of Migration

July 2018

Authors: Thomas B. Foster, Mark Ellis, Lee Fiorio

Working Paper Number:

carra-2018-06

This paper details efforts to link administrative records from the Internal Revenue Service (IRS) to American Community Survey (ACS) and 2010 Census microdata for the study of migration in the United States. Specifically, we (1) document our linkage strategy and methodology for inferring migration in IRS records; (2) model selection into and survival across IRS records to determine suitability for research applications; and (3) gauge the efficacy of the IRS records by demonstrating how they can be used to validate and potentially improve migration responses in ACS microdata. Our results show little evidence of selection or survival bias in the IRS records, suggesting broad generalizability to the nation as a whole. Moreover, we find that the combined IRS 1040, 1099, and W2 records may provide important information on populations that are hard to reach with traditional Census surveys. Finally, while preliminary, the results of our comparison of IRS and ACS migration responses shows that IRS records may be useful in improving ACS migration measurement for respondents whose migration response is proxy, allocated, or imputed. Taking these results together, we discuss the potential applications of our longitudinal IRS dataset to innovations in migration research in the United States.
View Full Paper PDF
Working Paper

The Design of Sampling Strata for the National Household Food Acquisition and Purchase Survey

February 2025

Authors: Jonathan Eggleston, Mark Klee, Linden McBride

Working Paper Number:

CES-25-13

The National Household Food Acquisition and Purchase Survey (FoodAPS), sponsored by the United States Department of Agriculture's (USDA) Economic Research Service (ERS) and Food and Nutrition Service (FNS), examines the food purchasing behavior of various subgroups of the U.S. population. These subgroups include participants in the Supplemental Nutrition Assistance Program (SNAP) and the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC), as well as households who are eligible for but don't participate in these programs. Participants in these social protection programs constitute small proportions of the U.S. population; obtaining an adequate number of such participants in a survey would be challenging absent stratified sampling to target SNAP and WIC participating households. This document describes how the U.S. Census Bureau (which is planning to conduct future versions of the FoodAPS survey on behalf of USDA) created sampling strata to flag the FoodAPS targeted subpopulations using machine learning applications in linked survey and administrative data. We describe the data, modeling techniques, and how well the sampling flags target low-income households and households receiving WIC and SNAP benefits. We additionally situate these efforts in the nascent literature on the use of big data and machine learning for the improvement of survey efficiency.
View Full Paper PDF
Working Paper

Childhood Housing and Adult Earnings: A Between-Siblings Analysis of Housing Vouchers and Public Housing

January 2013

Authors: John Haltiwanger, Fredrik Andersson, Daniel Weinberg, Mark J. Kutzbach, Giordano Palloni, Henry O. Pollakowski

Working Paper Number:

CES-13-48RR

To date, research on the long-term effects of childhood participation in voucher-assisted and public housing has been limited by the lack of data and suitable identification strategies. We create a national level longitudinal data set that enables us to analyze how children's housing experiences affect adult earnings and incarceration rates. While naive estimates suggest there are substantial negative consequences to childhood participation in voucher assisted and public housing, this result appears to be driven largely by selection of households into housing assistance programs. To mitigate this source of bias, we employ household fixed-effects specifications that use only within-household (across-sibling) variation for identification. Compared to naive specifications, household fixed-effects estimates for earnings are universally more positive, and they suggest that there are positive and statistically significant benefits from childhood residence in assisted housing on young adult earnings for nearly all demographic groups. Childhood participation in assisted housing also reduces the likelihood of incarceration across all household race/ethnicity groups. Time spent in voucher-assisted or public housing is especially beneficial for females from non-Hispanic Black households, who experience substantial increases in expected earnings and lower incarceration rates.
View Full Paper PDF
Working Paper

National Experimental Wellbeing Statistics - Version 1

February 2023

Authors: Nikolas Mittag, Joshua Mitchell, Adam Bee, Jonathan Rothbaum, Carl Sanders, Lawrence Schmidt, Matthew Unrath

Working Paper Number:

CES-23-04

This is the U.S. Census Bureau's first release of the National Experimental Wellbeing Statistics (NEWS) project. The NEWS project aims to produce the best possible estimates of income and poverty given all available survey and administrative data. We link survey, decennial census, administrative, and third-party data to address measurement error in income and poverty statistics. We estimate improved (pre-tax money) income and poverty statistics for 2018 by addressing several possible sources of bias documented in prior research. We address biases from 1) unit nonresponse through improved weights, 2) missing income information in both survey and administrative data through improved imputation, and 3) misreporting by combining or replacing survey responses with administrative information. Reducing survey error substantially affects key measures of well-being: We estimate median household income is 6.3 percent higher than in survey estimates, and poverty is 1.1 percentage points lower. These changes are driven by subpopulations for which survey error is particularly relevant. For house holders aged 65 and over, median household income is 27.3 percent higher and poverty is 3.3 percentage points lower than in survey estimates. We do not find a significant impact on median household income for householders under 65 or on child poverty. Finally, we discuss plans for future releases: addressing other potential sources of bias, releasing additional years of statistics, extending the income concepts measured, and including smaller geographies such as state and county.
View Full Paper PDF
Working Paper

The Measurement of Medicaid Coverage in the SIPP: Evidence from California, 1990-1996

September 2002

Authors: David Card, Andrew KG Hildreth, Lara D Shore-Sheppard

Working Paper Number:

CES-02-21

This paper studies the accuracy of reported Medicaid coverage in the Survey of Income and Program Participation (SIPP) using a unique data set formed by matching SIPP survey responses to administrative records from the State of California. Overall, we estimate that the SIPP underestimates Medicaid coverage in the California populaton by about 10 percent. Among SIPP respondents who can be matched to administrative records, we estimate that the probability someone reports Medicaid coverage in a month when they are actually covered is around 85 percent. The corresponding probability for low-income children is even higher ' at least 90 percent. These estimates suggest that the SIPP provides reasonably accurate coverage reports for those who are actually in the Medicaid system. On the other hand, our estimate of the false positive rate (the rate of reported coverage for those who are not covered in the administrative records) is relatively high: 2.5 percent for the sample as a whole, and up to 20 percent for poor children. Some of this is due to errors in the recording of Social Security numbers in the administrative system, rather than to problems in the SIPP.
View Full Paper PDF
Working Paper

Gradient Boosting to Address Statistical Problems Arising from Non-Linkage of Census Bureau Datasets

June 2024

Authors: Narayan Sastry, Todd Gardner, Matthew Cefalu, John Sullivan, Elizabeth Fussell

Working Paper Number:

CES-24-27

This article introduces the twangRDC package, which contains functions to address non-linkage in US Census Bureau datasets. The Census Bureau's Person Identification Validation System facilitates data linkage by assigning unique person identifiers to federal, third party, decennial census, and survey data. Not all records in these datasets can be linked to the reference file and as such not all records will be assigned an identifier. This article is a tutorial for using the twangRDC to generate nonresponse weights to account for non-linkage of person records across US Census Bureau datasets.
View Full Paper PDF

Non-Random Assignment of Individual Identifiers and Selection into Linked Data: Implications for Research

January 2026

Working Paper Number:

CES-26-06

Abstract

Document Tags and Keywords

The 10 most similar working papers to the working paper 'Non-Random Assignment of Individual Identifiers and Selection into Linked Data: Implications for Research' are listed below in order of similarity.

April 2014

Working Paper Number:

carra-2014-08

August 2025

Working Paper Number:

CES-25-54

July 2018

Working Paper Number:

carra-2018-07

May 2026

Working Paper Number:

CES-26-31

July 2018

Working Paper Number:

carra-2018-06

February 2025

Working Paper Number:

CES-25-13

January 2013

Working Paper Number:

CES-13-48RR

February 2023

Working Paper Number:

CES-23-04

September 2002

Working Paper Number:

CES-02-21

June 2024

Working Paper Number:

CES-24-27