-
Estimating the Graduate Coverage of Post-Secondary Employment Outcomes
September 2025
Working Paper Number:
CES-25-61
This paper proposes a new methodology for estimating the coverage rate of the Post-Secondary Employment Outcomes data product (PSEO), both as a share of new graduates and as a share of total working-age degree holders in the United States. This paper also assesses how representative PSEO is of the broader population of college graduates across an array of institutional and individual characteristics.
View Full
Paper PDF
-
Revisiting the Unintended Consequences of Ban the Box
August 2025
Working Paper Number:
CES-25-58
Ban-the-Box (BTB) policies intend to help formerly incarcerated individuals find employment by delaying when employers can ask about criminal records. We revisit the finding in Doleac and Hansen (2020) that BTB causes statistical discrimination against minority men. We correct miscoded BTB laws and show that estimates from the Current Population Survey (CPS) remain quantitatively similar, while those from the American Community Survey (ACS) now fail to reject the null hypothesis of no effect of BTB on employment. In contrast to the published estimates, these ACS results are statistically significantly different from the CPS results, indicating a lack of robustness across datasets. We do not find evidence that these differences are due to sample composition or survey weights. There is limited evidence that these divergent results are explained by the different frequencies of these surveys. Differences in sample sizes may also lead to different estimates; the ACS has a much larger sample and more statistical power to detect effects near the corrected CPS estimates.
View Full
Paper PDF
-
A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census
August 2025
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Michael B. Hawes,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. Rodríguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-25-57
For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level'individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all of the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act.
View Full
Paper PDF
-
Education and Mortality: Evidence for the Silent Generation from Linked Census and Administrative Data
August 2025
Working Paper Number:
CES-25-56
We quantify the effect of education on mortality using a linkage of the full count 1940, 2000, and 2010 US census files and the Numident death records file. Our sample is composed of children aged 0-18 in 1940, observed living with at least one parent, for whom we can construct a rich set of parental and neighborhood characteristics. We estimate effects of educational attainment in 1940 on survival to 2000, as well as the effects of completed education, observed in 2000, on 10-year survival to 2010. The educational gradients in longevity that we estimate are robust to the inclusion of detailed individual, parental, household, neighborhood and county covariates. Given our full population census sample, we also explore rich patterns of heterogeneity and examine the effect of mediators of the education-mortality relationship. The mediators we consider in this study explain more than half of the relationship between education and mortality. We further show that the mechanisms underlying the education-mortality gradient might be different at different margins of educational attainment.
View Full
Paper PDF
-
LODES Design and Methodology Report: Methodology Version 7
August 2025
Working Paper Number:
CES-25-52
The purpose of this report is to document the important features of Version 7 of the LEHD Origin-Destination Employment Statistics (LODES) processing system. This includes data sources, data processing methodology, confidentiality protection methodology, some quality measures, and a high-level description of the published data. The intended audience for this document includes LODES data users, Local Employment Dynamics (LED) Partnership members, U.S. Census Bureau management, program quality auditors, and current and future research and development staff members.
View Full
Paper PDF
-
Differences in Disability Insurance Allowance Rates
August 2025
Working Paper Number:
CES-25-54
Allowance rates for disability insurance applications vary by race and ethnicity, but it is unclear to what extent these differences are artifacts of other differing socio-economic and health characteristics, or selection issues in SSA's race and ethnicity data. This paper uses the 2015 American Community Survey linked to 2015-2019 SSA administrative data to investigate DI application allowance rates among non-Hispanic White, non-Hispanic Black, non-Hispanic Asian, non-Hispanic American Indian/Alaska Native, and Hispanic applicants aged 25-65. The analysis uses regression, propensity score matching, and inverse probability weighting to estimate differences in allowance rates among applicants who are similar on observable characteristics. Relative to raw comparisons, differences by race and ethnicity in multivariate analyses are substantially smaller in magnitude and are generally not statistically significant.
View Full
Paper PDF
-
Earnings Measurement Error, Nonresponse and Administrative Mismatch in the CPS
July 2025
Working Paper Number:
CES-25-48
Using the Current Population Survey Annual Social and Economic Supplement matched to Social Security Administration Detailed Earnings Records, we link observations across consecutive years to investigate a relationship between item nonresponse and measurement error in the earnings questions. Linking individuals across consecutive years allows us to observe switching from response to nonresponse and vice versa. We estimate OLS, IV, and finite mixture models that allow for various assumptions separately for men and women. We find that those who respond in both years of the survey exhibit less measurement error than those who respond in one year. Our findings suggest a trade-off between survey response and data quality that should be considered by survey designers, data collectors, and data users.
View Full
Paper PDF
-
Tapping Business and Household Surveys to Sharpen Our View of Work from Home
June 2025
Working Paper Number:
CES-25-36
Timely business-level measures of work from home (WFH) are scarce for the U.S. economy. We review prior survey-based efforts to quantify the incidence and character of WFH and describe new questions that we developed and fielded for the Business Trends and Outlook Survey (BTOS). Drawing on more than 150,000 firm-level responses to the BTOS, we obtain four main findings. First, nearly a third of businesses have employees who work from home, with tremendous variation across sectors. The share of businesses with WFH employees is nearly ten times larger in the Information sector than in Accommodation and Food Services. Second, employees work from home about 1 day per week, on average, and businesses expect similar WFH levels in five years. Third, feasibility aside, businesses' largest concern with WFH relates to productivity. Seven percent of businesses find that onsite work is more productive, while two percent find that WFH is more productive. Fourth, there is a low level of tracking and monitoring of WFH activities, with 70% of firms reporting they do not track employee days in the office and 75% reporting they do not monitor employees when they work from home. These lessons serve as a starting point for enhancing WFH-related content in the American Community Survey and other household surveys.
View Full
Paper PDF
-
The Design of Sampling Strata for the National Household Food Acquisition and Purchase Survey
February 2025
Working Paper Number:
CES-25-13
The National Household Food Acquisition and Purchase Survey (FoodAPS), sponsored by the United States Department of Agriculture's (USDA) Economic Research Service (ERS) and Food and Nutrition Service (FNS), examines the food purchasing behavior of various subgroups of the U.S. population. These subgroups include participants in the Supplemental Nutrition Assistance Program (SNAP) and the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC), as well as households who are eligible for but don't participate in these programs. Participants in these social protection programs constitute small proportions of the U.S. population; obtaining an adequate number of such participants in a survey would be challenging absent stratified sampling to target SNAP and WIC participating households. This document describes how the U.S. Census Bureau (which is planning to conduct future versions of the FoodAPS survey on behalf of USDA) created sampling strata to flag the FoodAPS targeted subpopulations using machine learning applications in linked survey and administrative data. We describe the data, modeling techniques, and how well the sampling flags target low-income households and households receiving WIC and SNAP benefits. We additionally situate these efforts in the nascent literature on the use of big data and machine learning for the improvement of survey efficiency.
View Full
Paper PDF
-
Potential Bias When Using Administrative Data to Measure the Family Income of School-Aged Children
January 2025
Working Paper Number:
CES-25-03
Researchers and practitioners increasingly rely on administrative data sources to measure family income. However, administrative data sources are often incomplete in their coverage of the population, giving rise to potential bias in family income measures, particularly if coverage deficiencies are not well understood. We focus on the school-aged child population, due to its particular import to research and policy, and because of the unique challenges of linking children to family income information. We find that two of the most significant administrative sources of family income information that permit linking of children and parents'IRS Form 1040 and SNAP participation records'usefully complement each other, potentially reducing coverage bias when used together. In a case study considering how best to measure economic disadvantage rates in the public school student population, we demonstrate the sensitivity of family income statistics to assumptions about individuals who do not appear in administrative data sources.
View Full
Paper PDF