-
Using Small-Area Estimation (SAE) to Estimate Prevalence of Child Health Outcomes at the Census Regional-, State-, and County-Levels
November 2022
Working Paper Number:
CES-22-48
In this study, we implement small-area estimation to assess the prevalence of child health outcomes at the county, state, and regional levels, using national survey data.
View Full
Paper PDF
-
Finding Needles in Haystacks: Multiple-Imputation Record Linkage Using Machine Learning
November 2021
Working Paper Number:
CES-21-35
This paper considers the problem of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across establishments is highly skewed. To address these difficulties, this paper develops a probabilistic record linkage methodology that combines machine learning (ML) with multiple imputation (MI). This ML-MI methodology is applied to link survey respondents in the Health and Retirement Study to their workplaces in the Census Business Register. The linked data reveal new evidence that non-sampling errors in household survey data are correlated with respondents' workplace characteristics.
View Full
Paper PDF
-
Validating Abstract Representations of Spatial Population Data while considering Disclosure Avoidance
February 2020
Working Paper Number:
CES-20-05
This paper furthers a research agenda for modeling populations along spatial networks and expands upon an empirical analysis to a full U.S. county (Gaboardi, 2019, Ch. 1,2). Specific foci are the necessity of, and methods for, validating and benchmarking spatial data when conducting social science research with aggregated and ambiguous population representations. In order to promote the validation of publicly-available data, access to highly-restricted census microdata was requested, and granted, in order to determine the levels of accuracy and error associated with a network-based population modeling framework. Primary findings reinforce the utility of a novel network allocation method'populated polygons to networks (pp2n) in terms of accuracy, computational complexity, and real runtime (Gaboardi, 2019, Ch. 2). Also, a pseudo-benchmark dataset's performance against the true census microdata shows promise in modeling populations along networks.
View Full
Paper PDF
-
Optimal Probabilistic Record Linkage: Best Practice for Linking Employers in Survey and Administrative Data
March 2019
Working Paper Number:
CES-19-08
This paper illustrates an application of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across firms is highly asymmetric. To address these difficulties, this paper uses a supervised machine learning model to probabilistically link survey respondents in the Health and Retirement Study (HRS) with employers and establishments in the Census Business Register (BR) to create a new data source which we call the CenHRS. Multiple imputation is used to propagate uncertainty from the linkage step into subsequent analyses of the linked data. The linked data reveal new evidence that survey respondents' misreporting and selective nonresponse about employer characteristics are systematically correlated with wages.
View Full
Paper PDF
-
Disclosure Avoidance Techniques Used for the 1970 through 2010 Decennial Censuses of Population and Housing
November 2018
Working Paper Number:
CES-18-47
The U.S. Census Bureau conducts the decennial censuses under Title 13 of the U. S. Code with the Section 9 mandate to not 'use the information furnished under the provisions of this title for any purpose other than the statistical purposes for which it is supplied; or make any publication whereby the data furnished by any particular establishment or individual under this title can be identified; or permit anyone other than the sworn officers and employees of the Department or bureau or agency thereof to examine the individual reports (13 U.S.C. ' 9 (2007)).' The Census Bureau applies disclosure avoidance techniques to its publicly released statistical products in order to protect the confidentiality of its respondents and their data.
View Full
Paper PDF
-
LEHD Infrastructure S2014 files in the FSRDC
September 2018
Working Paper Number:
CES-18-27R
The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau, with the support of several national research agencies, maintains a set of infrastructure files using administrative data provided by state agencies, enhanced with information from other administrative data sources, demographic and economic (business) surveys and censuses. The LEHD Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. This document describes the structure and content of the 2014 Snapshot of the LEHD Infrastructure files as they are made available in the Census Bureau's secure and restricted-access Research Data Center network. The document attempts to provide a comprehensive description of all researcher-accessible files, of their creation, and of any modifications made to the files to facilitate researcher access.
View Full
Paper PDF
-
The Use of Administrative Records and the American Community Survey to Study the Characteristics of Undercounted Young Children in the 2010 Census
May 2018
Working Paper Number:
carra-2018-05
Children under age five are historically one of the most difficult segments of the population to enumerate in the U.S. decennial census. The persistent undercount of young children is highest among Hispanics and racial minorities. In this study, we link 2010 Census data to administrative records from government and third party data sources, such as Medicaid enrollment data and tenant rental assistance program records from the Department of Housing and Urban Development, to identify differences between children reported and not reported in the 2010 Census. In addition, we link children in administrative records to the American Community Survey to identify various characteristics of households with children under age five who may have been missed in the last census. This research contributes to what is known about the demographic, socioeconomic, and household characteristics of young children undercounted by the census. Our research also informs the potential benefits of using administrative records and surveys to supplement the U.S. Census Bureau child population enumeration efforts in future decennial censuses.
View Full
Paper PDF
-
Who are the people in my neighborhood? The 'contextual fallacy' of measuring individual context with census geographies
February 2018
Working Paper Number:
CES-18-11
Scholars deploy census-based measures of neighborhood context throughout the social sciences and epidemiology. Decades of research confirm that variation in how individuals are aggregated into geographic units to create variables that control for social, economic or political contexts can dramatically alter analyses. While most researchers are aware of the problem, they have lacked the tools to determine its magnitude in the literature and in their own projects. By using confidential access to the complete 2010 U.S. Decennial Census, we are able to construct'for all persons in the US'individual-specific contexts, which we group according to the Census-assigned block, block group, and tract. We compare these individual-specific measures to the published statistics at each scale, and we then determine the magnitude of variation in context for an individual with respect to the published measures using a simple statistic, the standard deviation of individual context (SDIC). For three key measures (percent Black, percent Hispanic, and Entropy'a measure of ethno-racial diversity), we find that block-level Census statistics frequently do not capture the actual context of individuals within them. More problematic, we uncover systematic spatial patterns in the contextual variables at all three scales. Finally, we show that within-unit variation is greater in some parts of the country than in others. We publish county-level estimates of the SDIC statistics that enable scholars to assess whether mis-specification in context variables is likely to alter analytic findings when measured at any of the three common Census units.
View Full
Paper PDF
-
Has Falling Crime Invited Gentrification?
January 2017
Working Paper Number:
CES-17-27
Over the past two decades, crime has fallen dramatically in cities in the United States. We explore whether, in the face of falling central city crime rates, households with more resources and options were more likely to move into central cities overall and more particularly into low income and/or majority minority central city neighborhoods. We use confidential, geocoded versions of the 1990 and 2000 Decennial Census and the 2010, 2011, and 2012 American Community Survey to track moves to different neighborhoods in 244 Core Based Statistical Areas (CBSAs) and their largest central cities. Our dataset includes over four million household moves across the three time periods. We focus on three household types typically considered gentrifiers: high-income, college-educated, and white households. We find that declines in city crime are associated with increases in the probability that highincome and college-educated households choose to move into central city neighborhoods, including low-income and majority minority central city neighborhoods. Moreover, we find little evidence that households with lower incomes and without college degrees are more likely to move to cities when violent crime falls. These results hold during the 1990s as well as the 2000s and for the 100 largest metropolitan areas, where crime declines were greatest. There is weaker evidence that white households are disproportionately drawn to cities as crime falls in the 100 largest metropolitan areas from 2000 to 2010.
View Full
Paper PDF
-
Disconnected Geography: A Spatial Analysis of Disconnected Youth in the United States
January 2016
Working Paper Number:
CES-16-37
Since the Great Recession, US policy and advocacy groups have sought to better understand its effect on a group of especially vulnerable young adults who are not enrolled in school or training programs and not participating in the labor market, so called 'disconnected youth.' This article distinguishes between disconnected youth and unemployed youth and examines the spatial clustering of these two groups across counties in the US. The focus is to ascertain whether there are differences in underlying contextual factors among groups of counties that are mutually exclusive and spatially disparate (non-adjacent), comprising two types of spatial clusters ' high rates of disconnected youth and high rates of unemployed youth. Using restricted, household-level census data inside the Census Research Data Center (RDC) under special permission by the US Census Bureau, we were able to define these two groups using detailed household questionnaires that are not available to researchers outside the RDC. The geospatial patterns in the two types of clusters suggest that places with high concentrations of disconnected youth are distinctly different in terms of underlying characteristics from places with high concentrations of unemployed youth. These differences include, among other things, arrests for synthetic drug production, enclaves of poor in rural areas, persistent poverty in areas, educational attainment in the populace, children in poverty, persons without health insurance, the
social capital index, and elders who receive disability benefits. This article provides some preliminary evidence regarding the social forces underlying the two types of observed geospatial clusters and discusses how they differ.
View Full
Paper PDF