The Census Bureau is conducting research to expand the use of administrative records data in censuses and surveys to decrease respondent burden and reduce costs while improving data quality. Much of this research (e.g., Rastogi and O''Hara (2012), Luque and Bhaskar (2014)) hinges on the ability to integrate multiple data sources by linking individuals across files. One of the Census Bureau's record linkage methodologies for data integration is the Person Identification Validation System or PVS. PVS assigns anonymous and unique IDs (Protected Identification Keys or PIKs) that serve as linkage keys across files. Prior research showed that integrating 'known associates' information into PVS's reference files could potentially enhance PVS's PIK assignment rates. The term 'known associates' refers to people that are likely to be associated with each other because of a known common link (such as family relationships or people sharing a common address), and thus, to be observed together in different files. One of the results from this prior research was the creation of the 2007 Census Kidlink file, a child-level file linking a child's Social Security Number (SSN) record to the SSN of those identified as the child's parents. In this paper, we examine to what extent the 2007 Census Kidlink methodology was able to link parents SSNs to children SSN records, and also evaluate the quality of those links. We find that in approximately 80 percent of cases, at least one parent was linked to the child's record. Younger children and noncitizens have a higher percentage of cases where neither parent could be linked to the child. Using 2007 tax data as a benchmark, our quality evaluation results indicate that in at least 90 percent of the cases, the parent-child link agreed with those found in the tax data. Based on our findings, we propose improvements to the 2007 Kidlink methodology to increase child-parent links, and discuss how the creation of the file could be operationalized moving forward.
-
Where Are Your Parents? Exploring Potential Bias in Administrative Records on Children
March 2024
Working Paper Number:
CES-24-18
This paper examines potential bias in the Census Household Composition Key's (CHCK) probabilistic parent-child linkages. By linking CHCK data to the American Community Survey (ACS), we reveal disparities in parent-child linkages among specific demographic groups and find that characteristics of children that can and cannot be linked to the CHCK vary considerably from the larger population. In particular, we find that children from low-income, less educated households and of Hispanic origin are less likely to be linked to a mother or a father in the CHCK. We also highlight some data considerations when using the CHCK.
View Full
Paper PDF
-
Person Matching in Historical Files using the Census Bureau's Person Validation System
September 2014
Working Paper Number:
carra-2014-11
The recent release of the 1940 Census manuscripts enables the creation of longitudinal data spanning the whole of the twentieth century. Linked historical and contemporary data would allow unprecedented analyses of the causes and consequences of health, demographic, and economic change. The Census Bureau is uniquely equipped to provide high quality linkages of person records across datasets. This paper summarizes the linkage techniques employed by the Census Bureau and discusses utilization of these techniques to append protected identification keys to the 1940 Census.
View Full
Paper PDF
-
The Use of Administrative Records and the American Community Survey to Study the Characteristics of Undercounted Young Children in the 2010 Census
May 2018
Working Paper Number:
carra-2018-05
Children under age five are historically one of the most difficult segments of the population to enumerate in the U.S. decennial census. The persistent undercount of young children is highest among Hispanics and racial minorities. In this study, we link 2010 Census data to administrative records from government and third party data sources, such as Medicaid enrollment data and tenant rental assistance program records from the Department of Housing and Urban Development, to identify differences between children reported and not reported in the 2010 Census. In addition, we link children in administrative records to the American Community Survey to identify various characteristics of households with children under age five who may have been missed in the last census. This research contributes to what is known about the demographic, socioeconomic, and household characteristics of young children undercounted by the census. Our research also informs the potential benefits of using administrative records and surveys to supplement the U.S. Census Bureau child population enumeration efforts in future decennial censuses.
View Full
Paper PDF
-
Creating Linked Historical Data: An Assessment of the Census Bureau's Ability to Assign Protected Identification Keys to the 1960 Census
September 2014
Working Paper Number:
carra-2014-12
In order to study social phenomena over the course of the 20th century, the Census Bureau is investigating the feasibility of digitizing historical census records and linking them to contemporary data. However, historical censuses have limited personally identifiable information available to match on. In this paper, I discuss the problems associated with matching older censuses to contemporary data files, and I describe the matching process used to match a small sample of the 1960 census to the Social Security Administration Numeric Identification System.
View Full
Paper PDF
-
Understanding the Quality of Alternative Citizenship Data Sources for the 2020 Census
August 2018
Working Paper Number:
CES-18-38R
This paper examines the quality of citizenship data in self-reported survey responses compared to administrative records and evaluates options for constructing an accurate count of resident U.S. citizens. Person-level discrepancies between survey-collected citizenship data and administrative records are more pervasive than previously reported in studies comparing survey and administrative data aggregates. Our results imply that survey-sourced citizenship data produce significantly lower estimates of the noncitizen share of the population than would be produced from currently available administrative records; both the survey-sourced and administrative data have shortcomings that could contribute to this difference. Our evidence is consistent with noncitizen respondents misreporting their own citizenship status and failing to report that of other household members. At the same time, currently available administrative records may miss some naturalizations and capture others with a delay. The evidence in this paper also suggests that adding a citizenship question to the 2020 Census would lead to lower self-response rates in households potentially containing noncitizens, resulting in higher fieldwork costs and a lower-quality population count.
View Full
Paper PDF
-
Full Report of the Comparisons of Administrative Record Rosters to Census Self-Responses and NRFU Household Member Responses
March 2023
Working Paper Number:
CES-23-08
One of the U.S. Census Bureau's innovations in the 2020 U.S. Census was the use of administrative records (AR) to create household rosters for enumerating some addresses when a self response was not available but high-quality ARs were. The goal was to reduce the cost of fieldwork during the Nonresponse Followup operation (NRFU). The original plan had NRFU beginning in mid-May and continuing through late July 2020. However, the COVID-19 pandemic forced the delay of NRFU and caused the Internal Revenue Service to postpone the income tax filing deadline, resulting in an interruption in the delivery of ARs to the U.S. Census Bureau. The delays were not anticipated when U.S. Census Bureau staff conducted the research on AR enumeration with the 2010 Census data in preparation for the 2020 Census or during the fine tuning of plans for using ARs during the 2018 End-to-End Census Test. These circumstances raised questions about whether the quality of the AR household rosters was high enough for use in enumeration. To aid in investigating the concern about the quality of the AR rosters, our analyses compared AR rosters to self-response rosters and NRFU household member responses at addresses where both ARs and a self-response were available.
View Full
Paper PDF
-
Producing U.S. Population Statistics Using Multiple Administrative Sources
November 2023
Working Paper Number:
CES-23-58
We identify several challenges encountered when constructing U.S. administrative record-based (AR-based) population estimates for 2020. Though the AR estimates are higher than the 2020 Census at the national level, they are over 15 percent lower in 5 percent of counties, suggesting that locational accuracy can be improved. Other challenges include how to achieve comprehensive coverage, maintain consistent coverage across time, filter out nonresidents and people not alive on the reference date, uncover missing links across person and address records, and predict demographic characteristics when multiple ones are reported or when they are missing. We discuss several ways of addressing these issues, e.g., building in redundancy with more sources, linking children to their parents' addresses, and conducting additional record linkage for people without Social Security Numbers and for addresses not initially linked to the Census Bureau's Master Address File. We discuss modeling to predict lower levels of geography for people lacking those geocodes, the probability that a person is a U.S. resident on the reference date, the probability that an address is the person's residence on the reference date, and the probability a person is in each demographic characteristic category. Regression results illustrate how many of these challenges and solutions affect the AR county population estimates.
View Full
Paper PDF
-
Coverage of Children in the American Community Survey Based on California Birth Records
September 2023
Working Paper Number:
CES-23-46
The U.S. Census Bureau's American Community Survey (ACS) collects information on individuals and households. The ACS provides survey-based estimates of children drawn from a sample of the U.S. population. However, survey responses may not match administrative records, such as birth records. Birth records should provide a complete account of all births, along with child-parent relationships and demographic characteristics. California is a state that has both a large population of children and a high undercount for young children. This paper uses California as a case study to examine differences between reported versus unreported children in the ACS based on state birth records. Child reporting rates were lower for more recent data years, younger children, for Black and Hispanic mothers, and for more complex households. Child reporting rates were higher for more educated mothers and for households above the poverty line. Using mother's race and Hispanic ethnicity from the birth records combined with poverty indices from the ACS, this analysis also finds that child reporting does not uniformly vary with poverty status across all race and ethnicity groups. This research builds support for the utility of state birth records in analyzing the undercount of children.
View Full
Paper PDF
-
Noncitizen Coverage and Its Effects on U.S. Population Statistics
August 2023
Working Paper Number:
CES-23-42
We produce population estimates with the same reference date, April 1, 2020, as the 2020 Census of Population and Housing by combining 31 types of administrative record (AR) and third-party sources, including several new to the Census Bureau with a focus on noncitizens. Our AR census national population estimate is higher than other Census Bureau official estimates: 1.8% greater than the 2020 Demographic Analysis high estimate, 3.0% more than the 2020 Census count, and 3.6% higher than the vintage-2020 Population Estimates Program estimate. Our analysis suggests that inclusion of more noncitizens, especially those with unknown legal status, explains the higher AR census estimate. About 19.8% of AR census noncitizens have addresses that cannot be linked to an address in the 2020 Census collection universe, compared to 5.7% of citizens, raising the possibility that the 2020 Census did not collect data for a significant fraction of noncitizens residing in the United States under the residency criteria used for the census. We show differences in estimates by age, sex, Hispanic origin, geography, and socioeconomic characteristics symptomatic of the differences in noncitizen coverage.
View Full
Paper PDF
-
The Person Identification Validation System (PVS): Applying the Center for Administrative Records Research and Applications' (CARRA) Record Linkage Software
July 2014
Working Paper Number:
carra-2014-01
The Census Bureau's Person Identification Validation System (PVS) assigns unique person identifiers to federal, commercial, census, and survey data to facilitate linkages across and within files. PVS uses probabilistic matching to assign a unique Census Bureau identifier for each person. The PVS matches incoming files to reference files created with data from the Social Security Administration (SSA) Numerical Identification file, and SSA data with addresses obtained from federal files. This paper describes the PVS methodology from editing input data to creating the final file.
View Full
Paper PDF