CREAT: Census Research Exploration and Analysis Tool

Papers written by Author(s): 'John M. Abowd'

The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.
Click here to search again

Frequently Occurring Concepts within this Search

Longitudinal Employer Household Dynamics - 30

National Science Foundation - 30

Alfred P Sloan Foundation - 25

Cornell University - 24

Bureau of Labor Statistics - 18

Quarterly Workforce Indicators - 17

American Community Survey - 16

Social Security Administration - 15

Unemployment Insurance - 15

Social Security Number - 14

Survey of Income and Program Participation - 14

National Institute on Aging - 14

LEHD Program - 14

Current Population Survey - 13

National Bureau of Economic Research - 13

Census Bureau Disclosure Review Board - 12

Internal Revenue Service - 12

Cornell Institute for Social and Economic Research - 12

Quarterly Census of Employment and Wages - 11

North American Industry Classification System - 11

Economic Census - 11

Research Data Center - 11

Employer Identification Numbers - 10

Disclosure Review Board - 9

AKM - 9

Social Security - 9

Business Register - 9

Sloan Foundation - 8

Center for Economic Studies - 8

Census Bureau Business Register - 8

Service Annual Survey - 8

Decennial Census - 7

International Trade Research Report - 7

Standard Industrial Classification - 7

2010 Census - 6

Protected Identification Key - 5

Federal Statistical Research Data Center - 5

Longitudinal Business Database - 5

MIT Press - 5

Ordinary Least Squares - 5

University of Michigan - 5

Local Employment Dynamics - 5

Employer Characteristics File - 5

American Economic Review - 5

Department of Labor - 5

Office of Personnel Management - 4

Statistics Canada - 4

University of Chicago - 4

Journal of Labor Economics - 4

Health and Retirement Study - 4

PSID - 4

Bureau of Economic Analysis - 4

American Statistical Association - 4

Federal Reserve Bank - 4

Chicago Census Research Data Center - 4

Special Sworn Status - 4

Census Numident - 3

National Academy of Sciences - 3

Person Validation System - 3

Census Edited File - 3

Department of Economics - 3

Department of Justice - 3

Metropolitan Statistical Area - 3

Longitudinal Research Database - 3

National Center for Health Statistics - 3

Public Use Micro Sample - 3

National Institutes of Health - 3

County Business Patterns - 3

Detailed Earnings Records - 3

Quarterly Journal of Economics - 3

Journal of Econometrics - 3

W-2 - 3

IZA - 3

Employment History File - 3

Individual Characteristics File - 3

Financial, Insurance and Real Estate Industries - 3

Viewing papers 1 through 10 of 42


  • Working Paper

    Estimating the Potential Impact of Combined Race and Ethnicity Reporting on Long-Term Earnings Statistics

    September 2024

    Working Paper Number:

    CES-24-48

    We use place of birth information from the Social Security Administration linked to earnings data from the Longitudinal Employer-Household Dynamics Program and detailed race and ethnicity data from the 2010 Census to study how long-term earnings differentials vary by place of birth for different self-identified race and ethnicity categories. We focus on foreign-born persons from countries that are heavily Hispanic and from countries in the Middle East and North Africa (MENA). We find substantial heterogeneity of long-term earnings differentials within country of birth, some of which will be difficult to detect when the reporting format changes from the current two-question version to the new single-question version because they depend on self-identifications that place the individual in two distinct categories within the single-question format, specifically, Hispanic and White or Black, and MENA and White or Black. We also study the USA-born children of these same immigrants. Long-term earnings differences for the 2nd generation also vary as a function of self-identified ethnicity and race in ways that changing to the single-question format could affect.
    View Full Paper PDF
  • Working Paper

    The 2010 Census Confidentiality Protections Failed, Here's How and Why

    December 2023

    Working Paper Number:

    CES-23-63

    Using only 34 published tables, we reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records. Using the 38-bin age variable tabulated at the census block level, at most 20.1% of reconstructed records can differ from their confidential source on even a single value for these five variables. Using only published data, an attacker can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. The tabular publications in Summary File 1 thus have prohibited disclosure risk similar to the unreleased confidential microdata. Reidentification studies confirm that an attacker can, within blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with nonmodal characteristics) with 95% accuracy, the same precision as the confidential data achieve and far greater than statistical baselines. The flaw in the 2010 Census framework was the assumption that aggregation prevented accurate microdata reconstruction, justifying weaker disclosure limitation methods than were applied to 2010 Census public microdata. The framework used for 2020 Census publications defends against attacks that are based on reconstruction, as we also demonstrate here. Finally, we show that alternatives to the 2020 Census Disclosure Avoidance System with similar accuracy (enhanced swapping) also fail to protect confidentiality, and those that partially defend against reconstruction attacks (incomplete suppression implementations) destroy the primary statutory use case: data for redistricting all legislatures in the country in compliance with the 1965 Voting Rights Act.
    View Full Paper PDF
  • Working Paper

    An In-Depth Examination of Requirements for Disclosure Risk Assessment

    October 2023

    Working Paper Number:

    CES-23-49

    The use of formal privacy to protect the confidentiality of responses in the 2020 Decennial Census of Population and Housing has triggered renewed interest and debate over how to measure the disclosure risks and societal benefits of the published data products. Following long-established precedent in economics and statistics, we argue that any proposal for quantifying disclosure risk should be based on pre-specified, objective criteria. Such criteria should be used to compare methodologies to identify those with the most desirable properties. We illustrate this approach, using simple desiderata, to evaluate the absolute disclosure risk framework, the counterfactual framework underlying differential privacy, and prior-to-posterior comparisons. We conclude that satisfying all the desiderata is impossible, but counterfactual comparisons satisfy the most while absolute disclosure risk satisfies the fewest. Furthermore, we explain that many of the criticisms levied against differential privacy would be levied against any technology that is not equivalent to direct, unrestricted access to confidential data. Thus, more research is needed, but in the near-term, the counterfactual approach appears best-suited for privacy-utility analysis.
    View Full Paper PDF
  • Working Paper

    Mixed-Effects Methods For Search and Matching Research

    September 2023

    Working Paper Number:

    CES-23-43

    We study mixed-effects methods for estimating equations containing person and firm effects. In economics such models are usually estimated using fixed-effects methods. Recent enhancements to those fixed-effects methods include corrections to the bias in estimating the covariance matrix of the person and firm effects, which we also consider.
    View Full Paper PDF
  • Working Paper

    Finding Needles in Haystacks: Multiple-Imputation Record Linkage Using Machine Learning

    November 2021

    Working Paper Number:

    CES-21-35

    This paper considers the problem of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across establishments is highly skewed. To address these difficulties, this paper develops a probabilistic record linkage methodology that combines machine learning (ML) with multiple imputation (MI). This ML-MI methodology is applied to link survey respondents in the Health and Retirement Study to their workplaces in the Census Business Register. The linked data reveal new evidence that non-sampling errors in household survey data are correlated with respondents' workplace characteristics.
    View Full Paper PDF
  • Working Paper

    U.S. Long-Term Earnings Outcomes by Sex, Race, Ethnicity, and Place of Birth

    May 2021

    Working Paper Number:

    CES-21-07R

    This paper is part of the Global Income Dynamics Project cross-country comparison of earnings inequality, volatility, and mobility. Using data from the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) infrastructure files we produce a uniform set of earnings statistics for the U.S. From 1998 to 2019, we find U.S. earnings inequality has increased and volatility has decreased. The combination of increased inequality and reduced volatility suggest earnings growth differs substantially across different demographic groups. We explore this further by estimating 12-year average earnings for a single cohort of age 25-54 eligible workers. Differences in labor supply (hours paid and quarters worked) are found to explain almost 90% of the variation in worker earnings, although even after controlling for labor supply substantial earnings differences across demographic groups remain unexplained. Using a quantile regression approach, we estimate counterfactual earnings distributions for each demographic group. We find that at the bottom of the earnings distribution differences in characteristics such as hours paid, geographic division, industry, and education explain almost all the earnings gap, however above the median the contribution of the differences in the returns to characteristics becomes the dominant component.
    View Full Paper PDF
  • Working Paper

    Determination of the 2020 U.S. Citizen Voting Age Population (CVAP) Using Administrative Records and Statistical Methodology Technical Report

    October 2020

    Working Paper Number:

    CES-20-33

    This report documents the efforts of the Census Bureau's Citizen Voting-Age Population (CVAP) Internal Expert Panel (IEP) and Technical Working Group (TWG) toward the use of multiple data sources to produce block-level statistics on the citizen voting-age population for use in enforcing the Voting Rights Act. It describes the administrative, survey, and census data sources used, and the four approaches developed for combining these data to produce CVAP estimates. It also discusses other aspects of the estimation process, including how records were linked across the multiple data sources, and the measures taken to protect the confidentiality of the data.
    View Full Paper PDF
  • Working Paper

    Male Earnings Volatility in LEHD before, during, and after the Great Recession

    September 2020

    Working Paper Number:

    CES-20-31

    This paper is part of a coordinated collection of papers on prime-age male earnings volatility. Each paper produces a similar set of statistics for the same reference population using a different primary data source. Our primary data source is the Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) infrastructure files. Using LEHD data from 1998 to 2016, we create a well-defined population frame to facilitate accurate estimation of temporal changes comparable to designed longitudinal samples of people. We show that earnings volatility, excluding increases during recessions, has declined over the analysis period, a finding robust to various sensitivity analyses. Although we find volatility is declining, the effect is not homogeneous, particularly for workers with tenuous labor force attachment for whom volatility is increasing. These 'not stable' workers have earnings volatility approximately 30 times larger than stable workers, but more important for earnings volatility trends we observe a large increase in the share of stable employment from 60% in 1998 to 67% in 2016, which we show to largely be responsible for the decline in overall earnings volatility. To further emphasize the importance of not stable and/or low earning workers we also conduct comparisons with the PSID and show how changes over time in the share of workers at the bottom tail of the cross-sectional earnings distributions can produce either declining or increasing earnings volatility trends.
    View Full Paper PDF
  • Working Paper

    Total Error and Variability Measures for the Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in OnTheMap

    September 2020

    Working Paper Number:

    CES-20-30

    We report results from the first comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total flow-employment, beginning-of-quarter employment, full quarter employment, average monthly earnings of full-quarter employees, and total quarterly payroll. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace reports as displayed in On-TheMap (OTM), including OnTheMap for Emergency Management. We account for errors due to coverage; record-level non response; edit and imputation of item missing data; and statistical disclosure limitation. The analysis reveals that the five publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs are a transition zone, where cells may be fit for use with caution. Tabulations involving one or two jobs, which are generally suppressed on fitness-for-use criteria in the QWI and synthesized in LODES, have substantial total variability but can still be used to estimate statistics for untabulated aggregates as long as the job count in the aggregate is more than 10.
    View Full Paper PDF
  • Working Paper

    United States Earnings Dynamics: Inequality, Mobility, and Volatility

    September 2020

    Working Paper Number:

    CES-20-29

    Using data from the Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) infrastructure files, we study changes over time and across sub-national populations in the distribution of real labor earnings. We consider four large MSAs (Detroit, Los Angeles, New York, and San Francisco) for the period 1998 to 2017, with particular attention paid to the subperiods before, during, and after the Great Recession. For the four large MSAs we analyze, there are clear national trends represented in each of the local areas, the most prominent of which is the increase in the share of earnings accruing to workers at the top of the earnings distribution in 2017 compared with 1998. However, the magnitude of these trends varies across MSAs, with New York and San Francisco showing relatively large increases and Los Angeles somewhere in the middle relative to Detroit whose total real earnings distribution is relatively stable over the period. Our results contribute to the emerging literature on differences between national and regional economic outcomes, exemplifying what will be possible with a new data exploration tool'the Earnings and Mobility Statistics (EAMS) web application'currently under development at the U.S. Census Bureau.
    View Full Paper PDF