-
A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census
August 2025
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Michael B. Hawes,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. Rodríguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-25-57
For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level'individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all of the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act.
View Full
Paper PDF
-
Differences in Disability Insurance Allowance Rates
August 2025
Working Paper Number:
CES-25-54
Allowance rates for disability insurance applications vary by race and ethnicity, but it is unclear to what extent these differences are artifacts of other differing socio-economic and health characteristics, or selection issues in SSA's race and ethnicity data. This paper uses the 2015 American Community Survey linked to 2015-2019 SSA administrative data to investigate DI application allowance rates among non-Hispanic White, non-Hispanic Black, non-Hispanic Asian, non-Hispanic American Indian/Alaska Native, and Hispanic applicants aged 25-65. The analysis uses regression, propensity score matching, and inverse probability weighting to estimate differences in allowance rates among applicants who are similar on observable characteristics. Relative to raw comparisons, differences by race and ethnicity in multivariate analyses are substantially smaller in magnitude and are generally not statistically significant.
View Full
Paper PDF
-
Locating Hispanic Americans, 1900-2020
July 2025
Working Paper Number:
CES-25-50
This study examines Hispanic Americans' residential settlement patterns nationwide in the last 120 years. Drawing on newly available neighborhood data for the whole country as early as 1900, it documents the direction and timing of changes in two aspects of their location. First, it charts Hispanics' transition from a predominantly rural population to majority metropolitan by 1930 and also their growing presence in all regions of the U.S. while still maintaining a predominance in the West and Texas. Second, it provides the first evidence of the long-term trajectory of their segregation from whites in the metropolitan areas where they were settling. As shown by studies of more recent decades, Hispanics were never as segregated as African Americans. Nonetheless, similar to African Americans, their segregation from whites increased to high levels through the middle of the century, followed by slow decline. For both groups metropolitan segregation was driven mainly by segregation among central city neighborhoods prior to the 1940s. But new forms of segregation ' a growing city/suburb divide and increasing segregation among suburban places ' have become the largest contributors to segregation today.
View Full
Paper PDF
-
Credit Access in the United States
July 2025
Working Paper Number:
CES-25-45
We construct new population-level linked administrative data to study households' access to credit in the United States. These data reveal large differences in credit access by race, class, and hometown. By age 25, Black individuals, those who grew up in low-income families, and those who grew up in certain areas (including the Southeast and Appalachia) have significantly lower credit scores than other groups. Consistent with lower scores generating credit constraints, these individuals have smaller balances, more credit inquiries, higher credit card utilization rates, and greater use of alternative higher-cost forms of credit. Tests for alternative definitions of algorithmic bias in credit scores yield results in opposite directions. From a calibration perspective, group-level differences in credit scores understate differences in delinquency: conditional on a given credit score, Black individuals and those from low-income families fall delinquent at relatively higher rates. From a balance perspective, these groups receive lower credit scores even when comparing those with the same future repayment behavior. Addressing both of these biases and expanding credit access to groups with lower credit scores requires addressing group-level differences in delinquency rates. These delinquencies emerge soon after individuals access credit in their early twenties, often due to missed payments on credit cards, student loans, and other bills. Comprehensive measures of individuals' income profiles, income volatility, and observed wealth explain only a small portion of these repayment gaps. In contrast, we find that the large variation in repayment across hometowns mostly reflects the causal effect of childhood exposure to these places. Places that promote upward income mobility also promote repayment and expand credit access even conditional on income, suggesting that common place-level factors may drive behaviors in both credit and labor markets. We discuss suggestive evidence for several mechanisms that drive our results, including the role of social and cultural capital. We conclude that gaps in credit access by race, class, and hometown have roots in childhood environments.
View Full
Paper PDF
-
Geographic Immobility in the United States: Assessing the Prevalence and Characteristics of Those Who Never Migrate Across State Lines Using Linked Federal Tax Microdata
March 2025
Working Paper Number:
CES-25-19
This paper explores the prevalence and characteristics of those who never migrate at the state scale in the U.S. Studying people who never migrate requires regular and frequent observation of their residential location for a lifetime, or at least for many years. A novel U.S. population-sized longitudinal dataset that links individual level Internal Revenue Service (IRS) and Social Security Administration (SSA) administrative records supplies this information annually, along with information on income and socio-demographic characteristics. We use these administrative microdata to follow a cohort aged between 15 and 50 in 2001 from 2001 to 2016, differentiating those who lived in the same state every year during this period (i.e., never made an interstate move) from those who lived in more than one state (i.e., made at least one interstate move). We find those who never made an interstate move comprised 75 percent of the total population of this age cohort. This percentage varies by year of age but never falls below 62 percent even for those who were teenagers or young adults in 2001. There are also variations in these percentages by sex, race, nativity, and income, with the latter having the largest effects. We also find substantial variation in these percentages across states. Our findings suggest a need for more research on geographically immobile populations in U.S.
View Full
Paper PDF
-
Places versus People: The Ins and Outs of Labor Market Adjustment to Globalization
December 2024
Working Paper Number:
CES-24-78
We analyze the distinct adjustment paths of U.S. labor markets (places) and U.S. workers (people) to increased Chinese import competition during the 2000s. Using comprehensive register data for 2000'2019, we document that employment levels more than fully rebound in trade-exposed places after 2010, while employment-to-population ratios remain depressed and manufacturing employment further atrophies. The adjustment of places to trade shocks is generational: affected areas recover primarily by adding workers to non-manufacturing who were below working age when the shock occurred. Entrants are disproportionately native-born Hispanics, foreign-born immigrants, women, and the college-educated, who find employment in relatively low-wage service sectors like medical services, education, retail, and hospitality. Using the panel structure of the employer-employee data, we decompose changes in the employment composition of places into trade-induced shifts in the gross flows of people across sectors, locations, and non-employment status. Contrary to standard models, trade shocks reduce geographic mobility, with both in- and out-migration remaining depressed through 2019. The employment recovery instead stems almost entirely from young adults and foreign-born immigrants taking their first U.S. jobs in affected areas, with minimal contributions from cross-sector transitions of former manufacturing workers. Although worker inflows into non-manufacturing more than fully offset manufacturing employment losses in trade-exposed locations after 2010, incumbent workers neither fully recover earnings losses nor predominately exit the labor market, but rather age in place as communities undergo rapid demographic and industrial transitions.
View Full
Paper PDF
-
Garage Entrepreneurs or just Self-Employed? An Investigation into Nonemployer Entrepreneurship
October 2024
Working Paper Number:
CES-24-61
Nonemployers, businesses without employees, account for most businesses in the U.S. yet are poorly understood. We use restricted administrative and survey data to describe nonemployer dynamics, overall performance, and performance by demographic group. We find that eventual outcome ' migration to employer status, continuing as a nonemployer, or exit ' is closely related to receipt growth. We provide estimates of employment creation by firms that began as nonemployers and become employers (migrants), estimating that relative to all firms born in 1996, nonemployer migrants accounted for 3-17% of all net jobs in the seventh year after startup. Moreover, we find that migrants' employment creation declined by 54% for the cohorts born between 1996 to 2014. Our results are consistent with increased adjustment frictions in recent periods, and suggest accessibility to transformative entrepreneurship for everyday Americans has declined.
View Full
Paper PDF
-
Comparison of Child Reporting in the American Community Survey and Federal Income Tax Returns Based on California Birth Records
September 2024
Working Paper Number:
CES-24-55
This paper takes advantage of administrative records from California, a state with a large child population and a significant historical undercount of children in Census Bureau data, dependent information in the Internal Revenue Service (IRS) Form 1040 records, and the American Community Survey to characterize undercounted children and compare child reporting. While IRS Form 1040 records offer potential utility for adjusting child undercounting in Census Bureau surveys, this analysis finds overlapping reporting issues among various demographic and economic groups. Specifically, older children, those of Non-Hispanic Black mothers and Hispanic mothers, children or parents with lower English proficiency, children whose mothers did not complete high school, and families with lower income-to-poverty ratio were less frequently reported in IRS 1040 records than other groups. Therefore, using IRS 1040 dependent records may have limitations for accurately representing populations with characteristics associated with the undercount of children in surveys.
View Full
Paper PDF
-
Estimating the Potential Impact of Combined Race and Ethnicity Reporting on Long-Term Earnings Statistics
September 2024
Working Paper Number:
CES-24-48
We use place of birth information from the Social Security Administration linked to earnings data from the Longitudinal Employer-Household Dynamics Program and detailed race and ethnicity data from the 2010 Census to study how long-term earnings differentials vary by place of birth for different self-identified race and ethnicity categories. We focus on foreign-born persons from countries that are heavily Hispanic and from countries in the Middle East and North Africa (MENA). We find substantial heterogeneity of long-term earnings differentials within country of birth, some of which will be difficult to detect when the reporting format changes from the current two-question version to the new single-question version because they depend on self-identifications that place the individual in two distinct categories within the single-question format, specifically, Hispanic and White or Black, and MENA and White or Black. We also study the USA-born children of these same immigrants. Long-term earnings differences for the 2nd generation also vary as a function of self-identified ethnicity and race in ways that changing to the single-question format could affect.
View Full
Paper PDF
-
Citizenship Question Effects on Household Survey Response
June 2024
Working Paper Number:
CES-24-31
Several small-sample studies have predicted that a citizenship question in the 2020 Census would cause a large drop in self-response rates. In contrast, minimal effects were found in Poehler et al.'s (2020) analysis of the 2019 Census Test randomized controlled trial (RCT). We reconcile these findings by analyzing associations between characteristics about the addresses in the 2019 Census Test and their response behavior by linking to independently constructed administrative data. We find significant heterogeneity in sensitivity to the citizenship question among households containing Hispanics, naturalized citizens, and noncitizens. Response drops the most for households containing noncitizens ineligible for a Social Security number (SSN). It falls more for households with Latin American-born immigrants than those with immigrants from other countries. Response drops less for households with U.S.-born Hispanics than households with noncitizens from Latin America. Reductions in responsiveness occur not only through lower unit self-response rates, but also by increased household roster omissions and internet break-offs. The inclusion of a citizenship question increases the undercount of households with noncitizens. Households with noncitizens also have much higher citizenship question item nonresponse rates than those only containing citizens. The use of tract-level characteristics and significant heterogeneity among Hispanics, the foreign-born, and noncitizens help explain why the effects found by Poehler et al. were so small. Linking administrative microdata with the RCT data expands what we can learn from the RCT.
View Full
Paper PDF