-
A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census
August 2025
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Michael B. Hawes,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. Rodríguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-25-57
For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level'individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all of the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act.
View Full
Paper PDF
-
The Effect of the Minimum Wage on Childcare Establishments
August 2025
Working Paper Number:
CES-25-53
Childcare is essential for working families, yet it remains increasingly unaffordable and inaccessible for parents and offers poverty-level wages to many employees. While research suggests minimum wage policies may improve the welfare of low-wage workers, there is also evidence they may increase firm exits, especially among smaller, low-profit firms, which could reduce access and harm consumer well-being. This study is the first to examine these trade-offs in the childcare industry, a labor-intensive, highly regulated sector where capital-labor substitution is limited, and to provide evidence on how minimum wage policies affect a dual-sector labor market in the U.S., where self-employed and waged providers serve overlapping markets. Using variation from state-level minimum wage increases between 1995 and 2019 and unique microdata, I implement a cross-state county border discontinuity design to estimate impacts on the stocks, flows, and composition of childcare establishments. I find that while county-level aggregate establishment stocks and employment remained stable, establishment-level turnover increased, and employment decreased. I reconcile these findings by showing that minimum wage increases prompted reallocation, with larger establishments in the waged-sector more likely to enter and less likely to exit, making this one of the first studies to link null aggregate effects to shifts in establishment composition. Finally, I show that minimum wage increases may negatively affect the self-employed sector, resulting in fewer owners with advanced degrees and more with only high school education. These findings suggest that minimum wage policies reshape who provides care in ways that could affect both quality and access.
View Full
Paper PDF
-
Receipt of Public and Private Food Assistance Across the Rural-Urban Continuum Before and During the COVID-19 Pandemic: Analysis of Current Population Survey Data
August 2025
Working Paper Number:
CES-25-51
Background: The nutrition safety net in the United States is critical to supporting food security among households in need. Food assistance in the United States includes both government-funded food programs and private community-based providers who distribute food to in need households. The COVID-19 pandemic impacted experiences of food security and use of private and public food assistance resources. However, this may have differed for households residing in urban versus rural areas. We explored receipt of Supplemental Nutrition Assistance Program (SNAP) benefits or food from community-based emergency food providers across a detailed measure of the rural-urban continuum before and during the COVID-19 pandemic.
Methods: We linked restricted use Current Population Survey Food Security Supplement data to census-tract level United States Department of Agriculture Rural-Urban Commuting Area codes to estimate prevalence of self-reported SNAP participation and receipt of emergency food support across temporal (2015-2019 versus 2020-2021) and socio-spatial (urban, large rural city/town, small rural town, or isolated rural town/area) dimensions. We report prevalences as point estimates with 95% confidence intervals, all weighted for national representation.
Results:
The weighted prevalence of self-reported SNAP participation was 8.9% (8.7-9.2%) in 2015-2019 and 9.1% (8.5-9.5%) in 2020-2021 in urban areas, 11.4% (10.8-12.2%) in 2015-2019 and 11.6% (10.5-12.9%) in 2020-2021 in large rural towns/cities, 13.4% (12.3-14.6%) in 2015-2019 and 12.3% (10.5-14.5%) in 2020-2021 in small rural towns, and 9.7% (8.6-10.9%) in 2015-2019 and 10.9% (8.8-13.4% )in 2020-2021 isolated rural towns. The weighted prevalence of self-reported receipt of emergency food was 4.9% (4.8-5.1%) in 2015-2019 and 6.2% (5.8-6.5%) in 2020-2021 in urban areas, 6.8% (6.2-7.4%) in 2015-2019 and 7.6% (6.6-8.6%) in 2020-2021 in large rural towns/cities, 8.1% (7.3-9.1%) in 2015-2019 and 7.1% (5.7-8.8%) in 2020-2021 in small rural towns, and 6.8% (5.9-7.7%) in 2015-2019 and 8.5% (6.7-10.6%) in 2020-2021 isolated rural towns.
Conclusion: Households in rural communities use public and private food assistance at higher rates than urban areas, but there is variation across communities depending on the level of rurality.
View Full
Paper PDF
-
The Rural/Urban Volunteering Divide
June 2025
Working Paper Number:
CES-25-42
Are rural residents more likely to volunteer than those living in urban places? Although early sociological theory posited that rural residents were more likely to experience social bonds connecting them to their community, increasing their odds of volunteer engagement, empirical support is limited. Drawing upon the full population of rural and urban respondents to the United States Census Bureau's Current Population Survey (CPS) Volunteering Supplement (2002-2015), we found that rural respondents are more likely to report volunteering compared to urban respondents, although these differences are decreasing over time. Moreover, we found that propensities for rural and urban volunteerism vary based on differences in both individual and place-based characteristics; further, the size of these effects differ across rural and urban places. These findings have important implications for theory and empirical analysis.
View Full
Paper PDF
-
Consequences of Eviction for Parenting and Non-parenting College Students
June 2025
Working Paper Number:
CES-25-35
Amidst rising and increasingly unaffordable rents, 7.6 million people are threatened with eviction each year across the United States'and eviction rates are twice as high for renters with children. One important and neglected population who may experience unique levels of housing insecurity is college students, especially given that one in five college students are parents. In this study, we link 11.9 million student records to eviction filings from housing courts, demographic characteristics reported in decennial census and survey data, incomes reported on tax returns by students and their parents, and dates of birth and death from the Social Security Administration. Parenting students are more likely than non-parenting students to identify as female (62.81% vs. 55.94%) and Black (19.66% vs. 14.30%), be over 30 years old (42.73% vs. 20.25%), and have parents with lower household incomes ($100,000 vs. $140,000). Parenting students threatened with eviction (i.e., had an eviction filed against them) are much more likely than non-threatened parenting students to identify as female (81.18% vs. 62.81%) and Black (56.84% vs. 19.66%). In models adjusted for individual and institutional characteristics, we find that being threatened with an eviction was significantly associated with reduced likelihood of degree completion, reduced post-enrollment income, reduced likelihood of being married post-enrollment, and increased post-enrollment mortality. Among parenting students, 38.38% (95% confidence interval (CI): 32.50-44.26%) of non-threatened students completed a bachelor's degree compared to just 15.36% (CI: 11.61-19.11%) of students threatened with eviction. Our findings highlight the long-term economic and health impacts of housing insecurity during college, especially for parenting students. Housing stability for parenting students may have substantial multigenerational benefits for economic mobility and population health.
View Full
Paper PDF
-
Food Security Status Across the Rural-Urban Continuum Before and During the COVID-19 Pandemic
January 2025
Working Paper Number:
CES-25-01
Background: Food security, defined as consistent access to sufficient food to support an active life, is a crucial social determinant of health. A key dimension affecting food security is position along the rural-urban continuum, as there are important socio-economic and environmental differences between communities related to urbanicity or rurality that impact food access. The COVID-19 pandemic created social and economic shocks that altered financial and food security, which may have had differential effects by rurality and urbanicity. However, there has been limited research on how food security differs across the shades of the rural-urban community spectrum, as most often researchers have characterized communities as either urban or rural.
Methods: In this study, which linked restricted use Current Population Survey Food Security Supplement data to census-tract level United States Department of Agriculture Rural-Urban Commuting Area codes, we estimated the prevalence of household food security across temporal (2015-2019 versus 2020-2021) and socio-spatial (urban, large rural city/town, small rural town, or isolated rural town/area) dimensions in order to characterize variations before and during the COVID-19 pandemic by urbanicity/rurality. We report prevalences as point estimates with 95% confidence intervals.
Results: The prevalence of food security was 87.7% (87.5-88.0%) in 2015-2019 and 88.8% (88.4-89.3%) in 2020-2021 for urban areas, 85.5% (84.7-86.2%) in 2015-2019 and 87.1% (85.7-88.3%) in 2020-2021 for large rural towns/cities, 82.8% (81.5-84.1%) in 2015-2019 and 87.3% (85.7-89.2%) in 2020-2021 for small rural towns, and 87.6% (86.3-88.8%) in 2015-2019 and 90.9% (88.7-92.7%) in 2020-2021 for isolated rural towns/areas.
Conclusion: These findings show that rural communities experiences of food security vary and aggregating households in these environments may mask areas of concern and concentrated need.
View Full
Paper PDF
-
A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report
December 2023
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. Rodríguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-23-63R
For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level'individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all of the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act. You are reading the full technical report. For the summary paper see https://doi.org/10.1162/99608f92.4a1ebf70.
View Full
Paper PDF
-
Exploring New Ways to Classify Industries for Energy Analysis and Modeling
November 2022
Working Paper Number:
CES-22-49
Combustion, other emitting processes and fossil energy use outside the power sector have become urgent concerns given the United States' commitment to achieving net-zero greenhouse gas emissions by 2050. Industry is an important end user of energy and relies on fossil fuels used directly for process heating and as feedstocks for a diverse range of applications. Fuel and energy use by industry is heterogeneous, meaning even a single product group can vary broadly in its production routes and associated energy use. In the United States, the North American Industry Classification System (NAICS) serves as the standard for statistical data collection and reporting. In turn, data based on NAICS are the foundation of most United States energy modeling. Thus, the effectiveness of NAICS at representing energy use is a limiting condition for current
expansive planning to improve energy efficiency and alternatives to fossil fuels in industry. Facility-level data could be used to build more detail into heterogeneous sectors and thus supplement data from Bureau of the Census and U.S Energy Information Administration reporting at NAICS code levels but are scarce. This work explores alternative classification schemes for industry based on energy use characteristics and validates an approach to estimate facility-level energy use from publicly available greenhouse gas emissions data from the U.S. Environmental Protection Agency (EPA). The approaches in this study can facilitate understanding of current, as well as possible future, energy demand.
First, current approaches to the construction of industrial taxonomies are summarized along with their usefulness for industrial energy modeling. Unsupervised machine learning techniques are then used to detect clusters in data reported from the U.S. Department of Energy's Industrial Assessment Center program. Clusters of Industrial Assessment Center data show similar levels of correlation between energy use and explanatory variables as three-digit NAICS codes. Interestingly, the clusters each include a large cross section of NAICS codes, which lends additional support to the idea that NAICS may not be particularly suited for correlation between energy use and the variables studied. Fewer clusters are needed for the same level of correlation as shown in NAICS codes. Initial assessment shows a reasonable level of separation using support vector machines with higher than 80% accuracy, so machine learning approaches may be promising for further analysis. The IAC data is focused on smaller and medium-sized facilities and is biased toward higher energy users for a given facility type. Cladistics, an approach for classification developed in biology, is adapted to energy and process characteristics of industries. Cladistics applied to industrial systems seeks to understand the progression of organizations and technology as a type of evolution, wherein traits are inherited from previous systems but evolve due to the emergence of inventions and variations and a selection process driven by adaptation to pressures and favorable outcomes. A cladogram is presented for evolutionary directions in the iron and steel sector. Cladograms are a promising tool for constructing scenarios and summarizing directions of sectoral innovation.
The cladogram of iron and steel is based on the drivers of energy use in the sector. Phylogenetic inference is similar to machine learning approaches as it is based on a machine-led search of the solution space, therefore avoiding some of the subjectivity of other classification systems. Our prototype approach for constructing an industry cladogram is based on process characteristics according to the innovation framework derived from Schumpeter to capture evolution in a given sector. The resulting cladogram represents a snapshot in time based on detailed study of process characteristics. This work could be an important tool for the design of scenarios for more detailed modeling. Cladograms reveal groupings of emerging or dominant processes and their implications in a way that may be helpful for policymakers and entrepreneurs, allowing them to see the larger picture, other good ideas, or competitors. Constructing a cladogram could be a good first step to analysis of many industries (e.g. nitrogenous fertilizer production, ethyl alcohol manufacturing), to understand their heterogeneity, emerging trends, and coherent groupings of related innovations.
Finally, validation is performed for facility-level energy estimates from the EPA Greenhouse Gas Reporting Program. Facility-level data availability continues to be a major challenge for industrial modeling. The method outlined by (McMillan et al. 2016; McMillan and Ruth 2019) allows estimating of facility level energy use based on mandatory greenhouse gas reporting. The validation provided here is an important step for further use of this data for industrial energy modeling.
View Full
Paper PDF
-
Introducing the Medical Expenditure Panel Survey-Insurance Component with Administrative Records (MEPS-ICAR): Description, Data Construction Methodology, and Quality Assessment
August 2022
Working Paper Number:
CES-22-29
This report introduces a new dataset, the Medical Expenditure Panel Survey-Insurance Component with Administrative Records (MEPS-ICAR), consisting of MEPS-IC survey data on establishments and their health insurance benefits packages linked to Decennial Census data and administrative tax records on MEPS-IC establishments' workforces. These data include new measures of the characteristics of MEPS-IC establishments' parent firms, employee turnover, the full distribution of MEPS-IC workers' personal and family incomes, the geographic locations where those workers live, and improved workforce demographic detail. Next, this report details the methods used for producing the MEPS-ICAR. Broadly, the linking process begins by matching establishments' parent firms to their workforces using identifiers appearing in tax records. The linking process concludes by matching establishments to their own workforces by identifying the subset of their parent firm's workforce that best matches the expected size, total payroll, and residential geographic distribution of the establishment's workforce. Finally, this report presents statistics characterizing the match rate and the MEPS-ICAR data itself. Key results include that match rates are consistently high (exceeding 90%) across nearly all data subgroups and that the matched data exhibit a reasonable distribution of employment, payroll, and worker commute distances relative to expectations and external benchmarks. Notably, employment measures derived from tax records, but not used in the match itself, correspond with high fidelity to the employment levels that establishments report in the MEPS-IC. Cumulatively, the construction of the MEPS-ICAR significantly expands the capabilities of the MEPS-IC and presents many opportunities for analysts.
View Full
Paper PDF
-
Climate Change, The Food Problem, and the Challenge of Adaptation through Sectoral Reallocation
September 2021
Working Paper Number:
CES-21-29
This paper combines local temperature treatment effects with a quantitative macroeconomic model to assess the potential for global reallocation between agricultural and non-agricultural production to reduce the costs of climate change. First, I use firm-level panel data from a wide range of countries to show that extreme heat reduces productivity less in manufacturing and services than in agriculture, implying that hot countries could achieve large potential gains through adapting to global warming by shifting labor toward manufacturing and increasing imports of food. To investigate the likelihood that such gains will be realized, I embed the estimated productivity effects in a model of sectoral specialization and trade covering 158 countries. Simulations suggest that climate change does little to alter the geography of agricultural production, however, as high trade barriers in developing countries temper the influence of shifting comparative advantage. Instead, climate change accentuates the existing pattern, known as 'the food problem,' in which poor countries specialize heavily in relatively low productivity agricultural sectors to meet subsistence consumer needs. The productivity effects of climate change reduce welfare by 6-10% for the poorest quartile of the world with trade barriers held at current levels, but by nearly 70% less in an alternative policy counterfactual that moves low-income countries to OECD levels of trade openness.
View Full
Paper PDF