-
Playing with Matches: An Assessment of Accuracy in Linked Historical Data
June 2016
Working Paper Number:
carra-2016-05
This paper evaluates linkage quality achieved by various record linkage techniques used in historical demography. I create benchmark, or truth, data by linking the 2005 Current Population Survey Annual Social and Economic Supplement to the Social Security Administration's Numeric Identification System by Social Security Number. By comparing simulated linkages to the benchmark data, I examine the value added (in terms of number and quality of links) from incorporating text-string comparators, adjusting age, and using a probabilistic matching algorithm. I find that text-string comparators and probabilistic approaches are useful for increasing the linkage rate, but use of text-string comparators may decrease accuracy in some cases. Overall, probabilistic matching offers the best balance between linkage rates and accuracy.
View Full
Paper PDF
-
Black Pioneers, Intermetropolitan Movers, and Housing Desegregation
March 2016
Working Paper Number:
CES-16-23
In this project, we examine the mobility choices of black households between 1960 and 2000. We use household-level Decennial Census data geocoded down to the census tract level. Our results indicate that, for black households, one's status as an intermetropolitan migrant ' especially from an urban area outside the South ' is a powerful predictor of pioneering into a white neighborhood. Moreover, and perhaps even more importantly, the ratio of these intermetropolitan black arrivals to the incumbent metropolitan black population is a powerful predictor of whether a metropolitan area experiences substantial declines in housing segregation.
View Full
Paper PDF
-
Documenting the Business Register and Related Economic Business Data
March 2016
Working Paper Number:
CES-16-17
The Business Register (BR) is a comprehensive database of business establishments in the United States and provides resources for the U.S. Census Bureau's economic programs for sample selection, research, and survey operations. It is maintained using information from several federal agencies including the Census Bureau, Internal Revenue Service, Bureau of Labor Statistics, and the Social Security Administration. This paper provides a detailed description of the sources and functions of the BR. An overview of the BR as a linking tool and bridge to other Census Bureau data for additional business characteristics is also given.
View Full
Paper PDF
-
Urban-Suburban Migration in the United States, 1955-2000
February 2016
Working Paper Number:
CES-16-08
This study uses census microdata from 1960 to 2010 to look at the rates of suburbanization in the 100 largest metro areas. Looking at the racial and ethnic composition of the population, and then further breaking down these groups by income, it's clear that more affluent people were more likely to move to the suburbs. Also, the White non-Hispanic population has long been the most suburbanized group. A majority of the White population lived in suburbs by 1960 in the 100 largest metro areas, while most of the Black non-Hispanic population lived in urban core areas as late as 2000. The Hispanic and Asian populations went from majority urban to majority suburban during this period.
View Full
Paper PDF
-
The Timing of Teenage Births: Estimating the Effect on High School Graduation and Later Life Outcomes
January 2016
Working Paper Number:
CES-16-39R
We examine the long-term outcomes for a population of teenage mothers who give birth to their children around the end of their high school year. We compare the mothers whose high school education was interrupted by childbirth, because the child was born before her expected graduation date to mothers who did not experience the same disruption to their education. We find that mothers who give birth during the school year are seven percent less likely to graduate from high school, are less likely to be married, and have more children than their counterparts who gave birth just a few months later. The labor market outcomes for these two sets of teenage mothers are not statistically different, but with a lower likelihood of marriage and more children, the households of the treated mothers are more likely to fall below the poverty threshold. While differences in educational attainment have narrowed over time, the differences in labor market outcomes and family structure have remained stable.
View Full
Paper PDF
-
Disconnected Geography: A Spatial Analysis of Disconnected Youth in the United States
January 2016
Working Paper Number:
CES-16-37
Since the Great Recession, US policy and advocacy groups have sought to better understand its effect on a group of especially vulnerable young adults who are not enrolled in school or training programs and not participating in the labor market, so called 'disconnected youth.' This article distinguishes between disconnected youth and unemployed youth and examines the spatial clustering of these two groups across counties in the US. The focus is to ascertain whether there are differences in underlying contextual factors among groups of counties that are mutually exclusive and spatially disparate (non-adjacent), comprising two types of spatial clusters ' high rates of disconnected youth and high rates of unemployed youth. Using restricted, household-level census data inside the Census Research Data Center (RDC) under special permission by the US Census Bureau, we were able to define these two groups using detailed household questionnaires that are not available to researchers outside the RDC. The geospatial patterns in the two types of clusters suggest that places with high concentrations of disconnected youth are distinctly different in terms of underlying characteristics from places with high concentrations of unemployed youth. These differences include, among other things, arrests for synthetic drug production, enclaves of poor in rural areas, persistent poverty in areas, educational attainment in the populace, children in poverty, persons without health insurance, the
social capital index, and elders who receive disability benefits. This article provides some preliminary evidence regarding the social forces underlying the two types of observed geospatial clusters and discusses how they differ.
View Full
Paper PDF
-
When Race and Hispanic Origin Reporting are Discrepant Across Administrative Records and Third Party Sources: Exploring Methods to Assign Responses
December 2015
Working Paper Number:
carra-2015-08
The U.S. Census Bureau is researching uses of administrative records and third party data in survey and decennial census operations. One potential use of administrative records is to utilize these data when race and Hispanic origin responses are missing. When federal and third party administrative records are compiled, race and Hispanic origin responses are not always the same for an individual across sources. We explore different methods to assign one race and one Hispanic response when these responses are discrepant. We also describe the characteristics of individuals with matching, non-matching, and missing race and Hispanic origin data by demographic, household, and contextual variables. We find that minorities, especially Hispanics, are more likely to have non-matching Hispanic origin and race responses in administrative records and third party data compared to the 2010 Census. Minority groups and individuals ages 0-17 are more likely to have missing race or Hispanic origin data in administrative records and third party data. Larger households tend to have more missing race data in administrative records and third party data than smaller households.
View Full
Paper PDF
-
Assessing Coverage and Quality of the 2007 Prototype Census Kidlink Database
September 2015
Working Paper Number:
carra-2015-07
The Census Bureau is conducting research to expand the use of administrative records data in censuses and surveys to decrease respondent burden and reduce costs while improving data quality. Much of this research (e.g., Rastogi and O''Hara (2012), Luque and Bhaskar (2014)) hinges on the ability to integrate multiple data sources by linking individuals across files. One of the Census Bureau's record linkage methodologies for data integration is the Person Identification Validation System or PVS. PVS assigns anonymous and unique IDs (Protected Identification Keys or PIKs) that serve as linkage keys across files. Prior research showed that integrating 'known associates' information into PVS's reference files could potentially enhance PVS's PIK assignment rates. The term 'known associates' refers to people that are likely to be associated with each other because of a known common link (such as family relationships or people sharing a common address), and thus, to be observed together in different files. One of the results from this prior research was the creation of the 2007 Census Kidlink file, a child-level file linking a child's Social Security Number (SSN) record to the SSN of those identified as the child's parents. In this paper, we examine to what extent the 2007 Census Kidlink methodology was able to link parents SSNs to children SSN records, and also evaluate the quality of those links. We find that in approximately 80 percent of cases, at least one parent was linked to the child's record. Younger children and noncitizens have a higher percentage of cases where neither parent could be linked to the child. Using 2007 tax data as a benchmark, our quality evaluation results indicate that in at least 90 percent of the cases, the parent-child link agreed with those found in the tax data. Based on our findings, we propose improvements to the 2007 Kidlink methodology to increase child-parent links, and discuss how the creation of the file could be operationalized moving forward.
View Full
Paper PDF
-
Exploring Administrative Records Use for Race and Hispanic Origin Item Non-Response
December 2014
Working Paper Number:
carra-2014-16
Race and Hispanic origin data are required to produce official statistics in the United States. Data collected through the American Community Survey and decennial census address missing data through traditional imputation methods, often relying on information from neighbors. These methods work well if neighbors share similar characteristics, however, the shape and patterns of neighborhoods in the United States are changing. Administrative records may provide more accurate data compared to traditional imputation methods for missing race and Hispanic origin responses. This paper first describes the characteristics of persons with missing demographic data, then assesses the coverage of administrative records data for respondents who do not answer race and Hispanic origin questions in Census data. The paper also discusses the distributional impact of using administrative records race and Hispanic origin data to complete missing responses in a decennial census or survey context.
View Full
Paper PDF
-
Creating Linked Historical Data: An Assessment of the Census Bureau's Ability to Assign Protected Identification Keys to the 1960 Census
September 2014
Working Paper Number:
carra-2014-12
In order to study social phenomena over the course of the 20th century, the Census Bureau is investigating the feasibility of digitizing historical census records and linking them to contemporary data. However, historical censuses have limited personally identifiable information available to match on. In this paper, I discuss the problems associated with matching older censuses to contemporary data files, and I describe the matching process used to match a small sample of the 1960 census to the Social Security Administration Numeric Identification System.
View Full
Paper PDF