-
Estimating the Potential Impact of Combined Race and Ethnicity Reporting on Long-Term Earnings Statistics
September 2024
Working Paper Number:
CES-24-48
We use place of birth information from the Social Security Administration linked to earnings data from the Longitudinal Employer-Household Dynamics Program and detailed race and ethnicity data from the 2010 Census to study how long-term earnings differentials vary by place of birth for different self-identified race and ethnicity categories. We focus on foreign-born persons from countries that are heavily Hispanic and from countries in the Middle East and North Africa (MENA). We find substantial heterogeneity of long-term earnings differentials within country of birth, some of which will be difficult to detect when the reporting format changes from the current two-question version to the new single-question version because they depend on self-identifications that place the individual in two distinct categories within the single-question format, specifically, Hispanic and White or Black, and MENA and White or Black. We also study the USA-born children of these same immigrants. Long-term earnings differences for the 2nd generation also vary as a function of self-identified ethnicity and race in ways that changing to the single-question format could affect.
View Full
Paper PDF
-
The 2010 Census Confidentiality Protections Failed, Here's How and Why
December 2023
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. Rodríguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-23-63
Using only 34 published tables, we reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records. Using the 38-bin age variable tabulated at the census block level, at most 20.1% of reconstructed records can differ from their confidential source on even a single value for these five variables. Using only published data, an attacker can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. The tabular publications in Summary File 1 thus have prohibited disclosure risk similar to the unreleased confidential microdata. Reidentification studies confirm that an attacker can, within blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with nonmodal characteristics) with 95% accuracy, the same precision as the confidential data achieve and far greater than statistical baselines. The flaw in the 2010 Census framework was the assumption that aggregation prevented accurate microdata reconstruction, justifying weaker disclosure limitation methods than were applied to 2010 Census public microdata. The framework used for 2020 Census publications defends against attacks that are based on reconstruction, as we also demonstrate here. Finally, we show that alternatives to the 2020 Census Disclosure Avoidance System with similar accuracy (enhanced swapping) also fail to protect confidentiality, and those that partially defend against reconstruction attacks (incomplete suppression implementations) destroy the primary statutory use case: data for redistricting all legislatures in the country in compliance with the 1965 Voting Rights Act.
View Full
Paper PDF
-
An In-Depth Examination of Requirements for Disclosure Risk Assessment
October 2023
Authors:
Ron Jarmin,
John M. Abowd,
Ian M. Schmutte,
Jerome P. Reiter,
Nathan Goldschlag,
Victoria A. Velkoff,
Michael B. Hawes,
Robert Ashmead,
Ryan Cumings-Menon,
Sallie Ann Keller,
Daniel Kifer,
Philip Leclerc,
Rolando A. Rodríguez,
Pavel Zhuravlev
Working Paper Number:
CES-23-49
The use of formal privacy to protect the confidentiality of responses in the 2020 Decennial Census of Population and Housing has triggered renewed interest and debate over how to measure the disclosure risks and societal benefits of the published data products. Following long-established precedent in economics and statistics, we argue that any proposal for quantifying disclosure risk should be based on pre-specified, objective criteria. Such criteria should be used to compare methodologies to identify those with the most desirable properties. We illustrate this approach, using simple desiderata, to evaluate the absolute disclosure risk framework, the counterfactual framework underlying differential privacy, and prior-to-posterior comparisons. We conclude that satisfying all the desiderata is impossible, but counterfactual comparisons satisfy the most while absolute disclosure risk satisfies the fewest. Furthermore, we explain that many of the criticisms levied against differential privacy would be levied against any technology that is not equivalent to direct, unrestricted access to confidential data. Thus, more research is needed, but in the near-term, the counterfactual approach appears best-suited for privacy-utility analysis.
View Full
Paper PDF
-
Mixed-Effects Methods For Search and Matching Research
September 2023
Working Paper Number:
CES-23-43
We study mixed-effects methods for estimating equations containing person and firm effects. In economics such models are usually estimated using fixed-effects methods. Recent enhancements to those fixed-effects methods include corrections to the bias in estimating the covariance matrix of the person and firm effects, which we also consider.
View Full
Paper PDF
-
Finding Needles in Haystacks: Multiple-Imputation Record Linkage Using Machine Learning
November 2021
Working Paper Number:
CES-21-35
This paper considers the problem of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across establishments is highly skewed. To address these difficulties, this paper develops a probabilistic record linkage methodology that combines machine learning (ML) with multiple imputation (MI). This ML-MI methodology is applied to link survey respondents in the Health and Retirement Study to their workplaces in the Census Business Register. The linked data reveal new evidence that non-sampling errors in household survey data are correlated with respondents' workplace characteristics.
View Full
Paper PDF
-
U.S. Long-Term Earnings Outcomes by Sex, Race, Ethnicity, and Place of Birth
May 2021
Working Paper Number:
CES-21-07R
This paper is part of the Global Income Dynamics Project cross-country comparison of earnings inequality, volatility, and mobility. Using data from the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) infrastructure files we produce a uniform set of earnings statistics for the U.S. From 1998 to 2019, we find U.S. earnings inequality has increased and volatility has decreased. The combination of increased inequality and reduced volatility suggest earnings growth differs substantially across different demographic groups. We explore this further by estimating 12-year average earnings for a single cohort of age 25-54 eligible workers. Differences in labor supply (hours paid and quarters worked) are found to explain almost 90% of the variation in worker earnings, although even after controlling for labor supply substantial earnings differences across demographic groups remain unexplained. Using a quantile regression approach, we estimate counterfactual earnings distributions for each demographic group. We find that at the bottom of the earnings distribution differences in characteristics such as hours paid, geographic division, industry, and education explain almost all the earnings gap, however above the median the contribution of the differences in the returns to characteristics becomes the dominant component.
View Full
Paper PDF
-
Determination of the 2020 U.S. Citizen Voting Age Population (CVAP) Using Administrative Records and Statistical Methodology Technical Report
October 2020
Authors:
John M. Abowd,
J. David Brown,
Lawrence Warren,
Moises Yi,
Misty L. Heggeness,
William R. Bell,
Michael B. Hawes,
Andrew Keller,
Vincent T. Mule Jr.,
Joseph L. Schafer,
Matthew Spence
Working Paper Number:
CES-20-33
This report documents the efforts of the Census Bureau's Citizen Voting-Age Population (CVAP) Internal Expert Panel (IEP) and Technical Working Group (TWG) toward the use of multiple data sources to produce block-level statistics on the citizen voting-age population for use in enforcing the Voting Rights Act. It describes the administrative, survey, and census data sources used, and the four approaches developed for combining these data to produce CVAP estimates. It also discusses other aspects of the estimation process, including how records were linked across the multiple data sources, and the measures taken to protect the confidentiality of the data.
View Full
Paper PDF
-
Male Earnings Volatility in LEHD before, during, and after the Great Recession
September 2020
Working Paper Number:
CES-20-31
This paper is part of a coordinated collection of papers on prime-age male earnings volatility. Each paper produces a similar set of statistics for the same reference population using a different primary data source. Our primary data source is the Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) infrastructure files. Using LEHD data from 1998 to 2016, we create a well-defined population frame to facilitate accurate estimation of temporal changes comparable to designed longitudinal samples of people. We show that earnings volatility, excluding increases during recessions, has declined over the analysis period, a finding robust to various sensitivity analyses. Although we find volatility is declining, the effect is not homogeneous, particularly for workers with tenuous labor force attachment for whom volatility is increasing. These 'not stable' workers have earnings volatility approximately 30 times larger than stable workers, but more important for earnings volatility trends we observe a large increase in the share of stable employment from 60% in 1998 to 67% in 2016, which we show to largely be responsible for the decline in overall earnings volatility. To further emphasize the importance of not stable and/or low earning workers we also conduct comparisons with the PSID and show how changes over time in the share of workers at the bottom tail of the cross-sectional earnings distributions can produce either declining or increasing earnings volatility trends.
View Full
Paper PDF
-
Total Error and Variability Measures for the Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in OnTheMap
September 2020
Working Paper Number:
CES-20-30
We report results from the first comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total flow-employment, beginning-of-quarter employment, full quarter employment, average monthly earnings of full-quarter employees, and total quarterly payroll. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace reports as displayed in On-TheMap (OTM), including OnTheMap for Emergency Management. We account for errors due to coverage; record-level non response; edit and imputation of item missing data; and statistical disclosure limitation. The analysis reveals that the five publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs are a transition zone, where cells may be fit for use with caution. Tabulations involving one or two jobs, which are generally suppressed on fitness-for-use criteria in the QWI and synthesized in LODES, have substantial total variability but can still be used to estimate statistics for untabulated aggregates as long as the job count in the aggregate is more than 10.
View Full
Paper PDF
-
United States Earnings Dynamics: Inequality, Mobility, and Volatility
September 2020
Working Paper Number:
CES-20-29
Using data from the Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) infrastructure files, we study changes over time and across sub-national populations in the distribution of real labor earnings. We consider four large MSAs (Detroit, Los Angeles, New York, and San Francisco) for the period 1998 to 2017, with particular attention paid to the subperiods before, during, and after the Great Recession. For the four large MSAs we analyze, there are clear national trends represented in each of the local areas, the most prominent of which is the increase in the share of earnings accruing to workers at the top of the earnings distribution in 2017 compared with 1998. However, the magnitude of these trends varies across MSAs, with New York and San Francisco showing relatively large increases and Los Angeles somewhere in the middle relative to Detroit whose total real earnings distribution is relatively stable over the period. Our results contribute to the emerging literature on differences between national and regional economic outcomes, exemplifying what will be possible with a new data exploration tool'the Earnings and Mobility Statistics (EAMS) web application'currently under development at the U.S. Census Bureau.
View Full
Paper PDF