CREAT: Census Research Exploration and Analysis Tool

Papers written by Author(s): 'John M. Abowd'

The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.
Click here to search again

Frequently Occurring Concepts within this Search

Longitudinal Employer Household Dynamics - 30

National Science Foundation - 30

Cornell University - 25

Alfred P Sloan Foundation - 25

Bureau of Labor Statistics - 18

Quarterly Workforce Indicators - 17

American Community Survey - 16

Social Security Administration - 15

Unemployment Insurance - 15

Social Security Number - 14

Current Population Survey - 14

Survey of Income and Program Participation - 14

National Institute on Aging - 14

LEHD Program - 14

Census Bureau Disclosure Review Board - 13

National Bureau of Economic Research - 13

Internal Revenue Service - 12

Cornell Institute for Social and Economic Research - 12

Quarterly Census of Employment and Wages - 11

North American Industry Classification System - 11

Economic Census - 11

Research Data Center - 11

Employer Identification Numbers - 10

Disclosure Review Board - 9

Center for Economic Studies - 9

AKM - 9

Social Security - 9

Business Register - 9

Sloan Foundation - 8

Census Bureau Business Register - 8

Service Annual Survey - 8

Decennial Census - 7

2010 Census - 7

International Trade Research Report - 7

Standard Industrial Classification - 7

Protected Identification Key - 6

Federal Statistical Research Data Center - 6

Statistics Canada - 5

Longitudinal Business Database - 5

MIT Press - 5

Ordinary Least Squares - 5

University of Michigan - 5

Local Employment Dynamics - 5

Employer Characteristics File - 5

American Economic Review - 5

Department of Labor - 5

Office of Personnel Management - 4

Public Use Micro Sample - 4

National Academy of Sciences - 4

Person Validation System - 4

Census Edited File - 4

University of Chicago - 4

Journal of Labor Economics - 4

Health and Retirement Study - 4

PSID - 4

Bureau of Economic Analysis - 4

American Statistical Association - 4

Federal Reserve Bank - 4

Chicago Census Research Data Center - 4

Special Sworn Status - 4

Census Numident - 3

Some Other Race - 3

Office of Management and Budget - 3

1940 Census - 3

United States Census Bureau - 3

Personally Identifiable Information - 3

Department of Economics - 3

Department of Justice - 3

Metropolitan Statistical Area - 3

Longitudinal Research Database - 3

National Center for Health Statistics - 3

National Institutes of Health - 3

County Business Patterns - 3

Detailed Earnings Records - 3

Quarterly Journal of Economics - 3

Journal of Econometrics - 3

W-2 - 3

IZA - 3

Employment History File - 3

Individual Characteristics File - 3

Financial, Insurance and Real Estate Industries - 3

Viewing papers 21 through 30 of 43


  • Working Paper

    Modeling Endogenous Mobility in Wage Determiniation

    June 2015

    Working Paper Number:

    CES-15-18

    We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax the exogenous mobility assumptions by modeling the evolution of the matched data as an evolving bipartite graph using a Bayesian latent class framework. Our results suggest that endogenous mobility biases estimated firm effects toward zero. To assess validity, we match our estimates of the wage components to out-of-sample estimates of revenue per worker. The corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates.
    View Full Paper PDF
  • Working Paper

    NOISE INFUSION AS A CONFIDENTIALITY PROTECTION MEASURE FOR GRAPH-BASED STATISTICS

    September 2014

    Working Paper Number:

    CES-14-30

    We use the bipartite graph representation of longitudinally linked em-ployer-employee data, and the associated projections onto the employer and em-ployee nodes, respectively, to characterize the set of potential statistical summar-ies that the trusted custodian might produce. We consider noise infusion as the primary confidentiality protection method. We show that a relatively straightfor-ward extension of the dynamic noise-infusion method used in the U.S. Census Bureau's Quarterly Workforce Indicators can be adapted to provide the same confidentiality guarantees for the graph-based statistics: all inputs have been modified by a minimum percentage deviation (i.e., no actual respondent data are used) and, as the number of entities contributing to a particular statistic increases, the accuracy of that statistic approaches the unprotected value. Our method also ensures that the protected statistics will be identical in all releases based on the same inputs.
    View Full Paper PDF
  • Working Paper

    Dynamically Consistent Noise Infusion and Partially Synthetic Data as Confidentiality Protection Measures for Related Time Series

    July 2012

    Working Paper Number:

    CES-12-13

    The Census Bureau's Quarterly Workforce Indicators (QWI) provide detailed quarterly statistics on employment measures such as worker and job flows, tabulated by worker characteristics in various combinations. The data are released for several levels of NAICS industries and geography, the lowest aggregation of the latter being counties. Disclosure avoidance methods are required to protect the information about individuals and businesses that contribute to the underlying data. The QWI disclosure avoidance mechanism we describe here relies heavily on the use of noise infusion through a permanent multiplicative noise distortion factor, used for magnitudes, counts, differences and ratios. There is minimal suppression and no complementary suppressions. To our knowledge, the release in 2003 of the QWI was the first large-scale use of noise infusion in any official statistical product. We show that the released statistics are analytically valid along several critical dimensions { measures are unbiased and time series properties are preserved. We provide an analysis of the degree to which confidentiality is protected. Furthermore, we show how the judicious use of synthetic data, injected into the tabulation process, can completely eliminate suppressions, maintain analytical validity, and increase the protection of the underlying confidential data.
    View Full Paper PDF
  • Working Paper

    Estimating Measurement Error in SIPP Annual Job Earnings: A Comparison of Census Bureau Survey and SSA Administrative Data

    July 2011

    Working Paper Number:

    CES-11-20

    We quantify sources of variation in annual job earnings data collected by the Survey of Income and Program Participation (SIPP) to determine how much of the variation is the result of measurement error. Jobs reported in the SIPP are linked to jobs reported in an administrative database, the Detailed Earnings Records (DER) drawn from the Social Security Administration's Master Earnings File, a universe file of all earnings reported on W-2 tax forms. As a result of the match, each job potentially has two earnings observations per year: survey and administrative. Unlike previous validation studies, both of these earnings measures are viewed as noisy measures of some underlying true amount of annual earnings. While the existence of survey error resulting from respondent mistakes or misinterpretation is widely accepted, the idea that administrative data are also error-prone is new. Possible sources of employer reporting error, employee under-reporting of compensation such as tips, and general differences between how earnings may be reported on tax forms and in surveys, necessitates the discarding of the assumption that administrative data are a true measure of the quantity that the survey was designed to collect. In addition, errors in matching SIPP and DER jobs, a necessary task in any use of administrative data, also contribute to measurement error in both earnings variables. We begin by comparing SIPP and DER earnings for different demographic and education groups of SIPP respondents. We also calculate different measures of changes in earnings for individuals switching jobs. We estimate a standard earnings equation model using SIPP and DER earnings and compare the resulting coefficients. Finally exploiting the presence of individuals with multiple jobs and shared employers over time, we estimate an econometric model that includes random person and firm effects, a common error component shared by SIPP and DER earnings, and two independent error components that represent the variation unique to each earnings measure. We compare the variance components from this model and consider how the DER and SIPP differ across unobservable components.
    View Full Paper PDF
  • Working Paper

    Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database

    February 2011

    Working Paper Number:

    CES-11-04

    In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments\' confidentiality. One approach with the potential for overcoming these risks is to release synthetic data; that is, the released establishment data are simulated from statistical models designed to mimic the distributions of the underlying real microdata. In this article, we describe an application of this strategy to create a public use file for the Longitudinal Business Database, an annual economic census of establishments in the United States comprising more than 20 million records dating back to 1976. The U.S. Bureau of the Census and the Internal Revenue Service recently approved the release of these synthetic microdata for public use, making the synthetic Longitudinal Business Database the first-ever business microdata set publicly released in the United States. We describe how we created the synthetic data, evaluated analytical validity, and assessed disclosure risk.
    View Full Paper PDF
  • Working Paper

    National Estimates of Gross Employment and Job Flows from the Quarterly Workforce Indicators with Demographic and Industry Detail

    June 2010

    Working Paper Number:

    CES-10-11

    The Quarterly Workforce Indicators (QWI) are local labor market data produced and released every quarter by the United States Census Bureau. Unlike any other local labor market series produced in the U.S. or the rest of the world, the QWI measure employment flows for workers (accession and separations), jobs (creations and destructions) and earnings for demographic subgroups (age and gender), economic industry (NAICS industry groups), detailed geography (block (experimental), county, Core- Based Statistical Area, and Workforce Investment Area), and ownership (private, all) with fully interacted publication tables. The current QWI data cover 47 states, about 98% of the private workforce in those states, and about 92% of all private employment in the entire economy. State participation is sufficiently extensive to permit us to present the first national estimates constructed from these data. We focus on worker, job, and excess (churning) reallocation rates, rather than on levels of the basic variables. This permits comparison to existing series from the Job Openings and Labor Turnover Survey and the Business Employment Dynamics Series from the Bureau of Labor Statistics. The national estimates from the QWI are an important enhancement to existing series because they include demographic and industry detail for both worker and job flow data compiled from underlying micro-data that have been integrated at the job and establishment levels by the Longitudinal Employer-Household Dynamics Program at the Census Bureau. The estimates presented herein were compiled exclusively from public-use data series and are available for download.
    View Full Paper PDF
  • Working Paper

    A Formal Test of Assortative Matching in the Labor Market

    November 2009

    Working Paper Number:

    CES-09-40

    We estimate a structural model of job assignment in the presence of coordination frictions due to Shimer (2005). The coordination friction model places restrictions on the joint distribution of worker and firm effects from a linear decomposition of log labor earnings. These restrictions permit estimation of the unobservable ability and productivity differences between workers and their employers as well as the way workers sort into jobs on the basis of these unobservable factors. The estimation is performed on matched employer-employee data from the LEHD program of the U.S. Census Bureau. The estimated correlation between worker and firm effects from the earnings decomposition is close to zero, a finding that is often interpreted as evidence that there is no sorting by comparative advantage in the labor market. Our estimates suggest that his finding actually results from a lack of sufficient heterogeneity in the workforce and available jobs. Workers do sort into jobs on the basis of productive differences, but the effects of sorting are not visible because of the composition of workers and employers.
    View Full Paper PDF
  • Working Paper

    Access Methods for United States Microdata

    August 2007

    Working Paper Number:

    CES-07-25

    Beyond the traditional methods of tabulations and public-use microdata samples, statistical agencies have developed four key alternatives for providing non-government researchers with access to confidential microdata to improve statistical modeling. The first, licensing, allows qualified researchers access to confidential microdata at their own facilities, provided certain security requirements are met. The second, statistical data enclaves, offer qualified researchers restricted access to confidential economic and demographic data at specific agency-controlled locations. Third, statistical agencies can offer remote access, through a computer interface, to the confidential data under automated or manual controls. Fourth, synthetic data developed from the original data but retaining the correlations in the original data have the potential for allowing a wide range of analyses.
    View Full Paper PDF
  • Working Paper

    Confidentiality Protection in the Census Bureau Quarterly Workforce Indicators

    February 2006

    Working Paper Number:

    tp-2006-02

    The QuarterlyWorkforce Indicators are new estimates developed by the Census Bureau's Longitudinal Employer-Household Dynamics Program as a part of its Local Employment Dynamics partnership with 37 state Labor Market Information offices. These data provide detailed quarterly statistics on employment, accessions, layoffs, hires, separations, full-quarter employment (and related flows), job creations, job destructions, and earnings (for flow and stock categories of workers). The data are released for NAICS industries (and 4-digit SICs) at the county, workforce investment board, and metropolitan area levels of geography. The confidential microdata - unemployment insurance wage records, ES-202 establishment employment, and Title 13 demographic and economic information - are protected using a permanent multiplicative noise distortion factor. This factor distorts all input sums, counts, differences and ratios. The released statistics are analytically valid - measures are unbiased and time series properties are preserved. The confidentiality protection is manifested in the release of some statistics that are flagged as "significantly distorted to preserve confidentiality." These statistics differ from the undistorted statistics by a significant proportion. Even for the significantly distorted statistics, the data remain analytically valid for time series properties. The released data can be aggregated; however, published aggregates are less distorted than custom postrelease aggregates. In addition to the multiplicative noise distortion, confidentiality protection is provided by the estimation process for the QWIs, which multiply imputes all missing data (including missing establishment, given UI account, in the UI wage record data) and dynamically re-weights the establishment data to provide state-level comparability with the BLS's Quarterly Census of Employment and Wages.
    View Full Paper PDF
  • Working Paper

    The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators

    January 2006

    Working Paper Number:

    tp-2006-01

    The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau, with the support of several national research agencies, has built a set of infrastructure files using administrative data provided by state agencies, enhanced with information from other administrative data sources, demographic and economic (business) surveys and censuses. The LEHD Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. Beginning in 2003 and building on this infrastructure, the Census Bureau has published the Quarterly Workforce Indicators (QWI), a new collection of data series that offers unprecedented detail on the local dynamics of labor markets. Despite the fine detail, confidentiality is maintained due to the application of state-of-the-art confidentiality protection methods. This article describes how the input files are compiled and combined to create the infrastructure files. We describe the multiple imputation methods used to impute in missing data and the statistical matching techniques used to combine and edit data when a direct identifier match requires improvement. Both of these innovations are crucial to the success of the final product. Finally, we pay special attention to the details of the confidentiality protection system used to protect the identity and micro data values of the underlying entities used to form the published estimates. We provide a brief description of public-use and restricted-access data files with pointers to further documentation for researchers interested in using these data.
    View Full Paper PDF