-
School-Based Disability Identification Varies by Student Family Income
December 2025
Authors:
Quentin Brummet,
Andrew Penner,
Emily Penner,
Leah R. Clark,
Michelle Spiegel,
Paul Y. Yoo,
Paul Hanselman,
Nicholas J. Ainsworth,
Christopher Cleveland,
Jacob Hibel,
Andrew Saultz,
Juan Camilo Cristancho
Working Paper Number:
CES-25-74
Currently, 18 percent of K-12 students in the United States receive additional supports through the identification of a disability. Socioeconomic status is viewed as central to understanding who gets identified as having a disability, yet limited large-scale evidence examines how disability identification varies for students from different income backgrounds. Using unique data linking information on Oregon students and their family income, we document pronounced income-based differences in how students are categorized for two school-based disability supports: special education services and Section 504 plans. We find that a quarter of students in the lowest income percentile receive supports through special education, compared with less than seven percent of students in the top income percentile. This pattern may partially reflect differences in underlying disability-related needs caused by poverty. However, we find the opposite pattern for 504 plans, where students in the top income percentiles are two times more likely to receive 504 plan supports. We further document substantial variation in these income-based differences by disability category, by race/ethnicity, and by grade level. Together, these patterns suggest that disability-related needs alone cannot account for the income-based differences that we observe and highlight the complex ways that income shapes the school and family processes that lead to variability in disability classification and services.
View Full
Paper PDF
-
Gifted Identification Across the Distribution of Family Income
December 2025
Authors:
Quentin Brummet,
Andrew Penner,
Emily Penner,
Leah R. Clark,
Michelle Spiegel,
Paul Hanselman,
Nicholas J. Ainsworth,
Aaron J. Ainsworth,
Christopher Cleveland,
Jacob Hibel,
Andrew Saultz
Working Paper Number:
CES-25-73
Currently, 6.1 percent of K-12 students in the United States receive gifted education. Using education and IRS data that provide information on students and their family income, we show pronounced differences in who schools identify as gifted across the distribution of family income. Under 4 percent of students in the lowest income percentile are identified as gifted, compared with 20 percent of those in the top income percentile. Income-based differences persist after accounting for student test scores and exist across students of different sexes and racial/ethnic groups, underscoring the importance of family resources for gifted identification in schools.
View Full
Paper PDF
-
Optimal Stratified Sampling for Probability-Based Online Panels
September 2025
Working Paper Number:
CES-25-69
Online probability-based panels have emerged as a cost-efficient means of conducting surveys in the 21st century. While there have been various recent advancements in sampling techniques for online panels, several critical aspects of sampling theory for online panels are lacking. Much of current sampling theory from the middle of the 20th century, when response rates were high, and online panels did not exist. This paper presents a mathematical model of stratified sampling for online panels that takes into account historical response rates and survey costs. Through some simplifying assumptions, the model shows that the optimal sample allocation for online panels can largely resemble the solution for a cross-sectional survey. To apply the model, I use the Census Household Panel to show how this method could improve the average precision of key estimates. Holding fielding costs constant, the new sample rates improve the average precision of estimates between 1.47 and 17.25 percent, depending on the importance weight given to an overall population mean compared to mean estimates for racial and ethnic subgroups.
View Full
Paper PDF
-
A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census
August 2025
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Michael B. Hawes,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. Rodríguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-25-57
For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level'individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all of the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act.
View Full
Paper PDF
-
The Census Historical Environmental Impacts Frame
October 2024
Working Paper Number:
CES-24-66
The Census Bureau's Environmental Impacts Frame (EIF) is a microdata infrastructure that combines individual-level information on residence, demographics, and economic characteristics with environmental amenities and hazards from 1999 through the present day. To better understand the long-run consequences and intergenerational effects of exposure to a changing environment, we expand the EIF by extending it backward to 1940. The Historical Environmental Impacts Frame (HEIF) combines the Census Bureau's historical administrative data, publicly available 1940 address information from the 1940 Decennial Census, and historical environmental data. This paper discusses the creation of the HEIF as well as the unique challenges that arise with using the Census Bureau's historical administrative data.
View Full
Paper PDF
-
Comparison of Child Reporting in the American Community Survey and Federal Income Tax Returns Based on California Birth Records
September 2024
Working Paper Number:
CES-24-55
This paper takes advantage of administrative records from California, a state with a large child population and a significant historical undercount of children in Census Bureau data, dependent information in the Internal Revenue Service (IRS) Form 1040 records, and the American Community Survey to characterize undercounted children and compare child reporting. While IRS Form 1040 records offer potential utility for adjusting child undercounting in Census Bureau surveys, this analysis finds overlapping reporting issues among various demographic and economic groups. Specifically, older children, those of Non-Hispanic Black mothers and Hispanic mothers, children or parents with lower English proficiency, children whose mothers did not complete high school, and families with lower income-to-poverty ratio were less frequently reported in IRS 1040 records than other groups. Therefore, using IRS 1040 dependent records may have limitations for accurately representing populations with characteristics associated with the undercount of children in surveys.
View Full
Paper PDF
-
Incorporating Administrative Data in Survey Weights for the Basic Monthly Current Population Survey
January 2024
Working Paper Number:
CES-24-02
Response rates to the Current Population Survey (CPS) have declined over time, raising the potential for nonresponse bias in key population statistics. A potential solution is to leverage administrative data from government agencies and third-party data providers when constructing survey weights. In this paper, we take two approaches. First, we use administrative data to build a non-parametric nonresponse adjustment step while leaving the calibration to population estimates unchanged. Second, we use administratively linked data in the calibration process, matching income data from the Internal Return Service and state agencies, demographic data from the Social Security Administration and the decennial census, and industry data from the Census Bureau's Business Register to both responding and nonresponding households. We use the matched data in the household nonresponse adjustment of the CPS weighting algorithm, which changes the weights of respondents to account for differential nonresponse rates among subpopulations.
After running the experimental weighting algorithm, we compare estimates of the unemployment rate and labor force participation rate between the experimental weights and the production weights. Before March 2020, estimates of the labor force participation rates using the experimental weights are 0.2 percentage points higher than the original estimates, with minimal effect on unemployment rate. After March 2020, the new labor force participation rates are similar, but the unemployment rate is about 0.2 percentage points higher in some months during the height of COVID-related interviewing restrictions. These results are suggestive that if there is any nonresponse bias present in the CPS, the magnitude is comparable to the typical margin of error of the unemployment rate estimate. Additionally, the results are overall similar across demographic groups and states, as well as using alternative weighting methodology. Finally, we discuss how our estimates compare to those from earlier papers that calculate estimates of bias in key CPS labor force statistics.
This paper is for research purposes only. No changes to production are being implemented at this time.
View Full
Paper PDF
-
A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report
December 2023
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. Rodríguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-23-63R
For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level'individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all of the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act. You are reading the full technical report. For the summary paper see https://doi.org/10.1162/99608f92.4a1ebf70.
View Full
Paper PDF
-
The 2010 Census Confidentiality Protections Failed, Here's How and Why
December 2023
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. Rodríguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-23-63
Using only 34 published tables, we reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records. Using the 38-bin age variable tabulated at the census block level, at most 20.1% of reconstructed records can differ from their confidential source on even a single value for these five variables. Using only published data, an attacker can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. The tabular publications in Summary File 1 thus have prohibited disclosure risk similar to the unreleased confidential microdata. Reidentification studies confirm that an attacker can, within blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with nonmodal characteristics) with 95% accuracy, the same precision as the confidential data achieve and far greater than statistical baselines. The flaw in the 2010 Census framework was the assumption that aggregation prevented accurate microdata reconstruction, justifying weaker disclosure limitation methods than were applied to 2010 Census public microdata. The framework used for 2020 Census publications defends against attacks that are based on reconstruction, as we also demonstrate here. Finally, we show that alternatives to the 2020 Census Disclosure Avoidance System with similar accuracy (enhanced swapping) also fail to protect confidentiality, and those that partially defend against reconstruction attacks (incomplete suppression implementations) destroy the primary statutory use case: data for redistricting all legislatures in the country in compliance with the 1965 Voting Rights Act.
View Full
Paper PDF
-
Noncitizen Coverage and Its Effects on U.S. Population Statistics
August 2023
Working Paper Number:
CES-23-42
We produce population estimates with the same reference date, April 1, 2020, as the 2020 Census of Population and Housing by combining 31 types of administrative record (AR) and third-party sources, including several new to the Census Bureau with a focus on noncitizens. Our AR census national population estimate is higher than other Census Bureau official estimates: 1.8% greater than the 2020 Demographic Analysis high estimate, 3.0% more than the 2020 Census count, and 3.6% higher than the vintage-2020 Population Estimates Program estimate. Our analysis suggests that inclusion of more noncitizens, especially those with unknown legal status, explains the higher AR census estimate. About 19.8% of AR census noncitizens have addresses that cannot be linked to an address in the 2020 Census collection universe, compared to 5.7% of citizens, raising the possibility that the 2020 Census did not collect data for a significant fraction of noncitizens residing in the United States under the residency criteria used for the census. We show differences in estimates by age, sex, Hispanic origin, geography, and socioeconomic characteristics symptomatic of the differences in noncitizen coverage.
View Full
Paper PDF