-
A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census
August 2025
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Michael B. Hawes,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. Rodríguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-25-57
For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level'individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all of the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act.
View Full
Paper PDF
-
Education and Mortality: Evidence for the Silent Generation from Linked Census and Administrative Data
August 2025
Working Paper Number:
CES-25-56
We quantify the effect of education on mortality using a linkage of the full count 1940, 2000, and 2010 US census files and the Numident death records file. Our sample is composed of children aged 0-18 in 1940, observed living with at least one parent, for whom we can construct a rich set of parental and neighborhood characteristics. We estimate effects of educational attainment in 1940 on survival to 2000, as well as the effects of completed education, observed in 2000, on 10-year survival to 2010. The educational gradients in longevity that we estimate are robust to the inclusion of detailed individual, parental, household, neighborhood and county covariates. Given our full population census sample, we also explore rich patterns of heterogeneity and examine the effect of mediators of the education-mortality relationship. The mediators we consider in this study explain more than half of the relationship between education and mortality. We further show that the mechanisms underlying the education-mortality gradient might be different at different margins of educational attainment.
View Full
Paper PDF
-
Differences in Disability Insurance Allowance Rates
August 2025
Working Paper Number:
CES-25-54
Allowance rates for disability insurance applications vary by race and ethnicity, but it is unclear to what extent these differences are artifacts of other differing socio-economic and health characteristics, or selection issues in SSA's race and ethnicity data. This paper uses the 2015 American Community Survey linked to 2015-2019 SSA administrative data to investigate DI application allowance rates among non-Hispanic White, non-Hispanic Black, non-Hispanic Asian, non-Hispanic American Indian/Alaska Native, and Hispanic applicants aged 25-65. The analysis uses regression, propensity score matching, and inverse probability weighting to estimate differences in allowance rates among applicants who are similar on observable characteristics. Relative to raw comparisons, differences by race and ethnicity in multivariate analyses are substantially smaller in magnitude and are generally not statistically significant.
View Full
Paper PDF
-
Credit Access in the United States
July 2025
Working Paper Number:
CES-25-45
We construct new population-level linked administrative data to study households' access to credit in the United States. These data reveal large differences in credit access by race, class, and hometown. By age 25, Black individuals, those who grew up in low-income families, and those who grew up in certain areas (including the Southeast and Appalachia) have significantly lower credit scores than other groups. Consistent with lower scores generating credit constraints, these individuals have smaller balances, more credit inquiries, higher credit card utilization rates, and greater use of alternative higher-cost forms of credit. Tests for alternative definitions of algorithmic bias in credit scores yield results in opposite directions. From a calibration perspective, group-level differences in credit scores understate differences in delinquency: conditional on a given credit score, Black individuals and those from low-income families fall delinquent at relatively higher rates. From a balance perspective, these groups receive lower credit scores even when comparing those with the same future repayment behavior. Addressing both of these biases and expanding credit access to groups with lower credit scores requires addressing group-level differences in delinquency rates. These delinquencies emerge soon after individuals access credit in their early twenties, often due to missed payments on credit cards, student loans, and other bills. Comprehensive measures of individuals' income profiles, income volatility, and observed wealth explain only a small portion of these repayment gaps. In contrast, we find that the large variation in repayment across hometowns mostly reflects the causal effect of childhood exposure to these places. Places that promote upward income mobility also promote repayment and expand credit access even conditional on income, suggesting that common place-level factors may drive behaviors in both credit and labor markets. We discuss suggestive evidence for several mechanisms that drive our results, including the role of social and cultural capital. We conclude that gaps in credit access by race, class, and hometown have roots in childhood environments.
View Full
Paper PDF
-
Re-assessing the Spatial Mismatch Hypothesis
April 2025
Working Paper Number:
CES-25-23
We use detailed location information from the Longitudinal Employer-Household Dynamics (LEHD) database to develop new evidence on the effects of spatial mismatch on the relative earnings of Black workers in large US cities. We classify workplaces by the size of the pay premiums they offer in a two-way fixed effects model, providing a simple metric for defining 'good' jobs. We show that: (a) Black workers earn nearly the same average wage premiums as whites; (b) in most cities Black workers live closer to jobs, and closer to good jobs, than do whites; (c) Black workers typically commute shorter distances than whites; and (d) people who commute further earn higher average pay premiums, but the elasticity with respect to distance traveled is slightly lower for Black workers. We conclude that geographic proximity to good jobs is unlikely to be a major source of the racial earnings gaps in major U.S. cities today.
View Full
Paper PDF
-
EITC Participation Results and IRS-Census Match Methodology, Tax Year 2021
December 2024
Working Paper Number:
CES-24-75
The Earned Income Tax Credit (EITC), enacted in 1975, offers a refundable tax credit to low income working families. This paper provides taxpayer and dollar participation estimates for the EITC covering tax year 2021. The estimates derive from an approach that relies on linking the 2022 Current Population Survey Annual Social and Economic Supplement (CPS ASEC) to IRS administrative data. This approach, called the Exact Match, uses survey data to identify EITC eligible taxpayers and IRS administrative data to indicate which eligible taxpayers claimed and received the credit. Overall in tax year 2021 eligible taxpayers participated in the EITC program at a rate of 78 percent while dollar participation was 81 percent.
View Full
Paper PDF
-
Income, Wealth, and Environmental Inequality in the United States
October 2024
Working Paper Number:
CES-24-57
This paper explores the relationships between air pollution, income, wealth, and race by combining administrative data from U.S. tax returns between 1979'2016, various measures of air pollution, and sociodemographic information from linked survey and administrative data. In the first year of our data, the relationship between income and ambient pollution levels nationally is approximately zero for both non-Hispanic White and Black individuals. However, at every single percentile of the national income distribution, Black individuals are exposed to, on average, higher levels of pollution than White individuals. By 2016, the relationship between income and air pollution had steepened, primarily for Black individuals, driven by changes in where rich and poor Black individuals live. We utilize quasi-random shocks to income to examine the causal effect of changes in income and wealth on pollution exposure over a five year horizon, finding that these income'pollution elasticities map closely to the values implied by our descriptive patterns. We calculate that Black-White differences in income can explain ~10 percent of the observed gap in air pollution levels in 2016.
View Full
Paper PDF
-
Comparison of Child Reporting in the American Community Survey and Federal Income Tax Returns Based on California Birth Records
September 2024
Working Paper Number:
CES-24-55
This paper takes advantage of administrative records from California, a state with a large child population and a significant historical undercount of children in Census Bureau data, dependent information in the Internal Revenue Service (IRS) Form 1040 records, and the American Community Survey to characterize undercounted children and compare child reporting. While IRS Form 1040 records offer potential utility for adjusting child undercounting in Census Bureau surveys, this analysis finds overlapping reporting issues among various demographic and economic groups. Specifically, older children, those of Non-Hispanic Black mothers and Hispanic mothers, children or parents with lower English proficiency, children whose mothers did not complete high school, and families with lower income-to-poverty ratio were less frequently reported in IRS 1040 records than other groups. Therefore, using IRS 1040 dependent records may have limitations for accurately representing populations with characteristics associated with the undercount of children in surveys.
View Full
Paper PDF
-
Estimating the Potential Impact of Combined Race and Ethnicity Reporting on Long-Term Earnings Statistics
September 2024
Working Paper Number:
CES-24-48
We use place of birth information from the Social Security Administration linked to earnings data from the Longitudinal Employer-Household Dynamics Program and detailed race and ethnicity data from the 2010 Census to study how long-term earnings differentials vary by place of birth for different self-identified race and ethnicity categories. We focus on foreign-born persons from countries that are heavily Hispanic and from countries in the Middle East and North Africa (MENA). We find substantial heterogeneity of long-term earnings differentials within country of birth, some of which will be difficult to detect when the reporting format changes from the current two-question version to the new single-question version because they depend on self-identifications that place the individual in two distinct categories within the single-question format, specifically, Hispanic and White or Black, and MENA and White or Black. We also study the USA-born children of these same immigrants. Long-term earnings differences for the 2nd generation also vary as a function of self-identified ethnicity and race in ways that changing to the single-question format could affect.
View Full
Paper PDF
-
Changing Opportunity: Sociological Mechanisms Underlying Growing Class Gaps and Shrinking Race Gaps in Economic Mobility
July 2024
Working Paper Number:
CES-24-38
We show that intergenerational mobility changed rapidly by race and class in recent decades and use these trends to study the causal mechanisms underlying changes in economic mobility. For white children in the U.S. born between 1978 and 1992, earnings increased for children from high-income families but decreased for children from low-income families, increasing earnings gaps by parental income ('class') by 30%. Earnings increased for Black children at all parental income levels, reducing white- Black earnings gaps for children from low-income families by 30%. Class gaps grew and race gaps shrank similarly for non-monetary outcomes such as educational attainment, standardized test scores, and mortality rates. Using a quasi-experimental design, we show that the divergent trends in economic mobility were caused by differential changes in childhood environments, as proxied by parental employment rates, within local communities defined by race, class, and childhood county. Outcomes improve across birth cohorts for children who grow up in communities with increasing parental employment rates, with larger effects for children who move to such communities at younger ages. Children's outcomes are most strongly related to the parental employment rates of peers they are more likely to interact with, such as those in their own birth cohort, suggesting that the relationship between children's outcomes and parental employment rates is mediated by social interaction. Our findings imply that community-level changes in one generation can propagate to the next generation and thereby generate rapid changes in economic mobility.
View Full
Paper PDF