-
Creating Linked Historical Data: An Assessment of the Census Bureau's Ability to Assign Protected Identification Keys to the 1960 Census
September 2014
Working Paper Number:
carra-2014-12
In order to study social phenomena over the course of the 20th century, the Census Bureau is investigating the feasibility of digitizing historical census records and linking them to contemporary data. However, historical censuses have limited personally identifiable information available to match on. In this paper, I discuss the problems associated with matching older censuses to contemporary data files, and I describe the matching process used to match a small sample of the 1960 census to the Social Security Administration Numeric Identification System.
View Full
Paper PDF
-
Person Matching in Historical Files using the Census Bureau's Person Validation System
September 2014
Working Paper Number:
carra-2014-11
The recent release of the 1940 Census manuscripts enables the creation of longitudinal data spanning the whole of the twentieth century. Linked historical and contemporary data would allow unprecedented analyses of the causes and consequences of health, demographic, and economic change. The Census Bureau is uniquely equipped to provide high quality linkages of person records across datasets. This paper summarizes the linkage techniques employed by the Census Bureau and discusses utilization of these techniques to append protected identification keys to the 1940 Census.
View Full
Paper PDF
-
Dynamics of Race: Joining, Leaving, and Staying in the American Indian/Alaska Native Race Category between 2000 and 2010
August 2014
Working Paper Number:
carra-2014-10
Each census for decades has seen the American Indian and Alaska Native population increase substantially more than expected. Changes in racial reporting seem to play an important role in the observed net increases, though research has been hampered by data limitations. We address previously unanswerable questions about race response change among American Indian and Alaska Natives (hereafter 'American Indians') using uniquely-suited (but not nationally representative) linked data from the 2000 and 2010 decennial censuses (N = 3.1 million) and the 2006-2010 American Community Survey (N = 188,131). To what extent do people change responses to include or exclude American Indian? How are people who change responses similar to or different from those who do not? How are people who join a group similar to or different from those who leave it? We find considerable race response change by people in our data, especially by multiple-race and/or Hispanic American Indians. This turnover is hidden in cross-sectional comparisons because people joining the group are similar in number and characteristics to those who leave the group. People in our data who changed their race response to add or drop American Indian differ from those who kept the same race response in 2000 and 2010 and from those who moved between a single-race and multiple-race American Indian response. Those who consistently reported American Indian (including those who added or dropped another race response) were relatively likely to report a tribe, live in an American Indian area, report American Indian ancestry, and live in the West. There are significant differences between those who joined and those who left a specific American Indian response group, but poor model fit indicates general similarity between joiners and leavers. Response changes should be considered when conceptualizing and operationalizing 'the American Indian and Alaska Native population.'
View Full
Paper PDF
-
America's Churning Races: Race and Ethnic Response Changes between Census 2000 and the 2010 Census
August 2014
Working Paper Number:
carra-2014-09
Race and ethnicity responses can change over time and across contexts - a component of population change not usually taken into account. To what extent do race and/or Hispanic origin responses change? Is change more common to/from some race/ethnic groups than others? Does the propensity to change responses vary by characteristics of the individual? To what extent do these changes affect researchers? We use internal Census Bureau data from the 2000 and 2010 censuses in which individuals' responses have been linked across years. Approximately 9.8 million people (about 6 percent) in our large, non-representative linked data have a different race and/or Hispanic origin response in 2010 than they did in 2000. Several groups experienced considerable fluidity in racial identification: American Indians and Alaska Natives, Native Hawaiians and Other Pacific Islanders, and multiple-race response groups, as well as Hispanics when reporting a race. In contrast, race and ethnic responses for single-race non-Hispanic whites, blacks, and Asians were relatively consistent over the decade, as were ethnicity responses by Hispanics. People who change their race and/or Hispanic origin response(s) are doing so in a wide variety of ways, as anticipated by previous research. For example, people's responses change from multiple races to a single race, from a single race to multiple races, from one single race to another, and some people add or drop a Hispanic response. The inflow of people to each race/Hispanic group is in many cases similar in size to the outflow from the same group, such that cross-sectional data would show a small net change. We find response changes across ages, sexes, regions, and response modes, with variation across groups. Researchers should consider the implications of changing race and Hispanic origin responses when conducting analyses and interpreting results.
View Full
Paper PDF
-
Within and Across County Variation in SNAP Misreporting: Evidence from Linked ACS and Administrative Records
July 2014
Working Paper Number:
carra-2014-05
This paper examines sub-state spatial and temporal variation in misreporting of participation in the Supplemental Nutrition Assistance Program (SNAP) using several years of the American Community Survey linked to SNAP administrative records from New York (2008-2010) and Texas (2006-2009). I calculate county false-negative (FN) and false-positive (FP) rates for each year of observation and find that, within a given state and year, there is substantial heterogeneity in FN rates across counties. In addition, I find evidence that FN rates (but not FP rates) persist over time within counties. This persistence in FN rates is strongest among more populous counties, suggesting that when noise from sampling variation is not an issue, some counties have consistently high FN rates while others have consistently low FN rates. This finding is important for understanding how misreporting might bias estimates of sub-state SNAP participation rates, changes in those participation rates, and effects of program participation. This presentation was given at the CARRA Seminar, June 27, 2013
View Full
Paper PDF
-
2010 American Community Survey Match Study
July 2014
Working Paper Number:
carra-2014-03
Using administrative records data from federal government agencies and commercial sources, the 2010 ACS Match Study measures administrative records coverage of 2010 ACS addresses, persons, and persons at addresses at different levels of geography as well as by demographic characteristics and response mode. The 2010 ACS Match Study represents a continuation of the research undertaken in the 2010 Census Match Study, the first national-level evaluation of administrative records data coverage. Preliminary results indicate that administrative records provide substantial coverage for addresses and persons in the 2010 ACS (92.7 and 92.1 percent respectively), and less extensive though substantial coverage, for person-address pairs (74.3 percent). In addition, some variation in address, person and/or person-address coverage is found across demographic and response mode groups. This research informs future uses of administrative records in survey and decennial census operations to address the increasing costs of data collection and declining response rates.
View Full
Paper PDF
-
Estimating Record Linkage False Match Rate for the Person Identification Validation System
July 2014
Working Paper Number:
carra-2014-02
The Census Bureau Person Identification Validation System (PVS) assigns unique person identifiers to federal, commercial, census, and survey data to facilitate linkages across files. PVS uses probabilistic matching to assign a unique Census Bureau identifier for each person. This paper presents a method to measure the false match rate in PVS following the approach of Belin and Rubin (1995). The Belin and Rubin methodology requires truth data to estimate a mixture model. The parameters from the mixture model are used to obtain point estimates of the false match rate for each of the PVS search modules. The truth data requirement is satisfied by the unique access the Census Bureau has to high quality name, date of birth, address and Social Security (SSN) data. Truth data are quickly created for the Belin and Rubin model and do not involve a clerical review process. These truth data are used to create estimates for the Belin and Rubin parameters, making the approach more feasible. Both observed and modeled false match rates are computed for all search modules in federal administrative records data and commercial data.
View Full
Paper PDF
-
The Person Identification Validation System (PVS): Applying the Center for Administrative Records Research and Applications' (CARRA) Record Linkage Software
July 2014
Working Paper Number:
carra-2014-01
The Census Bureau's Person Identification Validation System (PVS) assigns unique person identifiers to federal, commercial, census, and survey data to facilitate linkages across and within files. PVS uses probabilistic matching to assign a unique Census Bureau identifier for each person. The PVS matches incoming files to reference files created with data from the Social Security Administration (SSA) Numerical Identification file, and SSA data with addresses obtained from federal files. This paper describes the PVS methodology from editing input data to creating the final file.
View Full
Paper PDF
-
Comparison of Survey, Federal, and Commercial Address Data Quality
June 2014
Working Paper Number:
carra-2014-06
This report summarizes matching of survey, commercial, and administrative records housing units to the Census Bureau Master Address File (MAF). We document overall MAF match rates in each data set and evaluate differences in match rates across a variety of housing characteristics. Results show that over 90 percent of records in survey data from the American Housing Survey (AHS) match to the MAF. Commercial data from CoreLogic matches at much lower rates, in part due to missing address information and poor match rates for multi-unit buildings. MAF match rates for administrative records from the Department of Housing and Urban Development are also high, and open the possibility of using this information in surveys such as the AHS.
View Full
Paper PDF