CREAT - Census Bureau

The Person Identification Validation System (PVS): Applying the Center for Administrative Records Research and Applications' (CARRA) Record Linkage Software

July 2014

Written by: Deborah Wagner, Mary Layne

Working Paper Number:

carra-2014-01

Abstract

The Census Bureau's Person Identification Validation System (PVS) assigns unique person identifiers to federal, commercial, census, and survey data to facilitate linkages across and within files. PVS uses probabilistic matching to assign a unique Census Bureau identifier for each person. The PVS matches incoming files to reference files created with data from the Social Security Administration (SSA) Numerical Identification file, and SSA data with addresses obtained from federal files. This paper describes the PVS methodology from editing input data to creating the final file.

Document Tags and Keywords

Keywords:

data, database, data census, census data, record, matching, census file, records census, irs, ssa, datasets, identifier, census records, race census

Tags:

Internal Revenue Service, Social Security Administration, Service Annual Survey, Current Population Survey, Decennial Census, Housing and Urban Development, Social Security Number, Protected Identification Key, National Opinion Research Center, Master Address File, Indian Health Service, Person Validation System, Indian Housing Information Center, Person Identification Validation System, Individual Taxpayer Identification Numbers, MAFID, Center for Administrative Records Research and Applications, Census Numident, Census Bureau Person Identification Validation System, SSA Numident, Census Edited File, DOB, Selective Service System

Similar Working Papers

The 10 most similar working papers to the working paper 'The Person Identification Validation System (PVS): Applying the Center for Administrative Records Research and Applications' (CARRA) Record Linkage Software' are listed below in order of similarity.

Working Paper
🔥

Estimating Record Linkage False Match Rate for the Person Identification Validation System

July 2014

Authors: Deborah Wagner, Mary Layne, Cynthia Rothhaas

Working Paper Number:

carra-2014-02

The Census Bureau Person Identification Validation System (PVS) assigns unique person identifiers to federal, commercial, census, and survey data to facilitate linkages across files. PVS uses probabilistic matching to assign a unique Census Bureau identifier for each person. This paper presents a method to measure the false match rate in PVS following the approach of Belin and Rubin (1995). The Belin and Rubin methodology requires truth data to estimate a mixture model. The parameters from the mixture model are used to obtain point estimates of the false match rate for each of the PVS search modules. The truth data requirement is satisfied by the unique access the Census Bureau has to high quality name, date of birth, address and Social Security (SSN) data. Truth data are quickly created for the Belin and Rubin model and do not involve a clerical review process. These truth data are used to create estimates for the Belin and Rubin parameters, making the approach more feasible. Both observed and modeled false match rates are computed for all search modules in federal administrative records data and commercial data.
View Full Paper PDF
Working Paper
🔥

Person Matching in Historical Files using the Census Bureau's Person Validation System

September 2014

Authors: Amy B. O'Hara, Catherine G. Massey, Amy OHara

Working Paper Number:

carra-2014-11

The recent release of the 1940 Census manuscripts enables the creation of longitudinal data spanning the whole of the twentieth century. Linked historical and contemporary data would allow unprecedented analyses of the causes and consequences of health, demographic, and economic change. The Census Bureau is uniquely equipped to provide high quality linkages of person records across datasets. This paper summarizes the linkage techniques employed by the Census Bureau and discusses utilization of these techniques to append protected identification keys to the 1940 Census.
View Full Paper PDF
Working Paper
🔥

Matching Addresses between Household Surveys and Commercial Data

July 2015

Authors: Quentin Brummet

Working Paper Number:

carra-2015-04

Matching third-party data sources to household surveys can benefit household surveys in a number of ways, but the utility of these new data sources depends critically on our ability to link units between data sets. To understand this better, this report discusses potential modifications to the existing match process that could potentially improve our matches. While many changes to the matching procedure produce marginal improvements in match rates, substantial increases in match rates can only be achieved by relaxing the definition of a successful match. In the end, the results show that the most important factor determining the success of matching procedures is the quality and composition of the data sets being matched.
View Full Paper PDF
Working Paper
🔥

Determination of the 2020 U.S. Citizen Voting Age Population (CVAP) Using Administrative Records and Statistical Methodology Technical Report

October 2020

Authors: John M. Abowd, J. David Brown, Lawrence Warren, Moises Yi, Misty L. Heggeness, William R. Bell, Michael B. Hawes, Andrew Keller, Vincent T. Mule Jr., Joseph L. Schafer, Matthew Spence

Working Paper Number:

CES-20-33

This report documents the efforts of the Census Bureau's Citizen Voting-Age Population (CVAP) Internal Expert Panel (IEP) and Technical Working Group (TWG) toward the use of multiple data sources to produce block-level statistics on the citizen voting-age population for use in enforcing the Voting Rights Act. It describes the administrative, survey, and census data sources used, and the four approaches developed for combining these data to produce CVAP estimates. It also discusses other aspects of the estimation process, including how records were linked across the multiple data sources, and the measures taken to protect the confidentiality of the data.
View Full Paper PDF
Working Paper
🔥

Creating Linked Historical Data: An Assessment of the Census Bureau's Ability to Assign Protected Identification Keys to the 1960 Census

September 2014

Authors: Catherine G. Massey

Working Paper Number:

carra-2014-12

In order to study social phenomena over the course of the 20th century, the Census Bureau is investigating the feasibility of digitizing historical census records and linking them to contemporary data. However, historical censuses have limited personally identifiable information available to match on. In this paper, I discuss the problems associated with matching older censuses to contemporary data files, and I describe the matching process used to match a small sample of the 1960 census to the Social Security Administration Numeric Identification System.
View Full Paper PDF
Working Paper
🔥

Gradient Boosting to Address Statistical Problems Arising from Non-Linkage of Census Bureau Datasets

June 2024

Authors: Narayan Sastry, Todd Gardner, Matthew Cefalu, John Sullivan, Elizabeth Fussell

Working Paper Number:

CES-24-27

This article introduces the twangRDC package, which contains functions to address non-linkage in US Census Bureau datasets. The Census Bureau's Person Identification Validation System facilitates data linkage by assigning unique person identifiers to federal, third party, decennial census, and survey data. Not all records in these datasets can be linked to the reference file and as such not all records will be assigned an identifier. This article is a tutorial for using the twangRDC to generate nonresponse weights to account for non-linkage of person records across US Census Bureau datasets.
View Full Paper PDF
Working Paper

Assessing Coverage and Quality of the 2007 Prototype Census Kidlink Database

September 2015

Authors: Adela Luque, Deborah Wagner

Working Paper Number:

carra-2015-07

The Census Bureau is conducting research to expand the use of administrative records data in censuses and surveys to decrease respondent burden and reduce costs while improving data quality. Much of this research (e.g., Rastogi and O''Hara (2012), Luque and Bhaskar (2014)) hinges on the ability to integrate multiple data sources by linking individuals across files. One of the Census Bureau's record linkage methodologies for data integration is the Person Identification Validation System or PVS. PVS assigns anonymous and unique IDs (Protected Identification Keys or PIKs) that serve as linkage keys across files. Prior research showed that integrating 'known associates' information into PVS's reference files could potentially enhance PVS's PIK assignment rates. The term 'known associates' refers to people that are likely to be associated with each other because of a known common link (such as family relationships or people sharing a common address), and thus, to be observed together in different files. One of the results from this prior research was the creation of the 2007 Census Kidlink file, a child-level file linking a child's Social Security Number (SSN) record to the SSN of those identified as the child's parents. In this paper, we examine to what extent the 2007 Census Kidlink methodology was able to link parents SSNs to children SSN records, and also evaluate the quality of those links. We find that in approximately 80 percent of cases, at least one parent was linked to the child's record. Younger children and noncitizens have a higher percentage of cases where neither parent could be linked to the child. Using 2007 tax data as a benchmark, our quality evaluation results indicate that in at least 90 percent of the cases, the parent-child link agreed with those found in the tax data. Based on our findings, we propose improvements to the 2007 Kidlink methodology to increase child-parent links, and discuss how the creation of the file could be operationalized moving forward.
View Full Paper PDF
Working Paper

Full Report of the Comparisons of Administrative Record Rosters to Census Self-Responses and NRFU Household Member Responses

March 2023

Authors: Cristina Tello-Trillo, Andrew Keller, Mary H. Mulry, Thomas Mule

Working Paper Number:

CES-23-08

One of the U.S. Census Bureau's innovations in the 2020 U.S. Census was the use of administrative records (AR) to create household rosters for enumerating some addresses when a self response was not available but high-quality ARs were. The goal was to reduce the cost of fieldwork during the Nonresponse Followup operation (NRFU). The original plan had NRFU beginning in mid-May and continuing through late July 2020. However, the COVID-19 pandemic forced the delay of NRFU and caused the Internal Revenue Service to postpone the income tax filing deadline, resulting in an interruption in the delivery of ARs to the U.S. Census Bureau. The delays were not anticipated when U.S. Census Bureau staff conducted the research on AR enumeration with the 2010 Census data in preparation for the 2020 Census or during the fine tuning of plans for using ARs during the 2018 End-to-End Census Test. These circumstances raised questions about whether the quality of the AR household rosters was high enough for use in enumeration. To aid in investigating the concern about the quality of the AR rosters, our analyses compared AR rosters to self-response rosters and NRFU household member responses at addresses where both ARs and a self-response were available.
View Full Paper PDF
Working Paper

The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey

April 2014

Authors: Adela Luque, J. David Brown, Brittany Bond, Amy B. O'Hara, Amy OHara

Working Paper Number:

carra-2014-08

Record linkage across survey and administrative records sources can greatly enrich data and improve their quality. The linkage can reduce respondent burden and nonresponse follow-up costs. This is particularly important in an era of declining survey response rates and tight budgets. Record linkage also creates statistical bias, however. The U.S. Census Bureau links person records through its Person Identification Validation System (PVS), assigning each record a Protected Identification Key (PIK). It is not possible to reliably assign a PIK to every record, either due to insufficient identifying information or because the information does not uniquely match any of the administrative records used in the person validation process. Non-random ability to assign a PIK can potentially inject bias into statistics using linked data. This paper studies the nature of this bias using the 2009 and 2010 American Community Survey (ACS). The ACS is well-suited for this analysis, as it contains a rich set of person characteristics that can describe the bias. We estimate probit models for whether a record is assigned a PIK. The results suggest that young children, minorities, residents of group quarters, immigrants, recent movers, low-income individuals, and non-employed individuals are less likely to receive a PIK using 2009 ACS. Changes to the PVS process in 2010 significantly addressed the young children deficit, attenuated the other biases, and increased the validated records share from 88.1 to 92.6 percent (person-weighted).
View Full Paper PDF
Working Paper

Estimating the U.S. Citizen Voting-Age Population (CVAP) Using Blended Survey Data, Administrative Record Data, and Modeling: Technical Report

April 2023

Authors: J. David Brown, Danielle H. Sandler, Lawrence Warren, Moises Yi, Misty L. Heggeness, Joseph L. Schafer, Matthew Spence, Marta Murray-Close, Carl Lieberman, Genevieve Denoeux, Lauren Medina

Working Paper Number:

CES-23-21

This report develops a method using administrative records (AR) to fill in responses for nonresponding American Community Survey (ACS) housing units rather than adjusting survey weights to account for selection of a subset of nonresponding housing units for follow-up interviews and for nonresponse bias. The method also inserts AR and modeling in place of edits and imputations for ACS survey citizenship item nonresponses. We produce Citizen Voting-Age Population (CVAP) tabulations using this enhanced CVAP method and compare them to published estimates. The enhanced CVAP method produces a 0.74 percentage point lower citizen share, and it is 3.05 percentage points lower for voting-age Hispanics. The latter result can be partly explained by omissions of voting-age Hispanic noncitizens with unknown legal status from ACS household responses. Weight adjustments may be less effective at addressing nonresponse bias under those conditions.
View Full Paper PDF

The Person Identification Validation System (PVS): Applying the Center for Administrative Records Research and Applications' (CARRA) Record Linkage Software

July 2014

Working Paper Number:

carra-2014-01

Abstract

Document Tags and Keywords

The 10 most similar working papers to the working paper 'The Person Identification Validation System (PVS): Applying the Center for Administrative Records Research and Applications' (CARRA) Record Linkage Software' are listed below in order of similarity.

July 2014

Working Paper Number:

carra-2014-02

September 2014

Working Paper Number:

carra-2014-11

July 2015

Working Paper Number:

carra-2015-04

October 2020

Working Paper Number:

CES-20-33

September 2014

Working Paper Number:

carra-2014-12

June 2024

Working Paper Number:

CES-24-27

September 2015

Working Paper Number:

carra-2015-07

March 2023

Working Paper Number:

CES-23-08

April 2014

Working Paper Number:

carra-2014-08

April 2023

Working Paper Number:

CES-23-21