-
Estimation of Job-to-Job Flow Rates under Partially Missing Geography
September 2012
Working Paper Number:
CES-12-29
Integration of data from different regions presents challenges for the calculation of entitylevel longitudinal statistics with a strong geographic component: for example, movements between employers, migration, business dynamics, and health statistics. In this paper, we consider the estimation of worker-level employment statistics when the geographies (in our application, US states) over which such measures are defined are partially missing. We focus on the recent pilot set of job-to-job flow statistics produced by the US Census Bureau's Longitudinal Employer- Household Dynamics (LEHD) program, which measure the frequency of worker movements between jobs and into and out of nonemployment. LEHD's coverage of the labor force gradually increases during the 1990s and 2000s because some states have a longer time series than others, so employment transitions involving missing states are only partially or not at all observed. We propose and implement a method for estimating national-level job-to-job flow statistics that involves dropping observed states to recover the relationship between missing states and directly tabulated job-to-job flow rates. Using the estimated relationship between the observable characteristics of the missing states and changes in the employment measures, we provide estimates of the rates of job-to-job, and job-to-nonemployment, job-to-nonemploymentto- job flows were all states uniformly available.
View Full
Paper PDF
-
Estimating Measurement Error in SIPP Annual Job Earnings: A Comparison of Census Bureau Survey and SSA Administrative Data
July 2011
Working Paper Number:
CES-11-20
We quantify sources of variation in annual job earnings data collected by the Survey of Income and Program Participation (SIPP) to determine how much of the variation is the result of measurement error. Jobs reported in the SIPP are linked to jobs reported in an administrative database, the Detailed Earnings Records (DER) drawn from the Social Security Administration's Master Earnings File, a universe file of all earnings reported on W-2 tax forms. As a result of the match, each job potentially has two earnings observations per year: survey and administrative. Unlike previous validation studies, both of these earnings measures are viewed as noisy measures of some underlying true amount of annual earnings. While the existence of survey error resulting from respondent mistakes or misinterpretation is widely accepted, the idea that administrative data are also error-prone is new. Possible sources of employer reporting error, employee under-reporting of compensation such as tips, and general differences between how earnings may be reported on tax forms and in surveys, necessitates the discarding of the assumption that administrative data are a true measure of the quantity that the survey was designed to collect. In addition, errors in matching SIPP and DER jobs, a necessary task in any use of administrative data, also contribute to measurement error in both earnings variables. We begin by comparing SIPP and DER earnings for different demographic and education groups of SIPP respondents. We also calculate different measures of changes in earnings for individuals switching jobs. We estimate a standard earnings equation model using SIPP and DER earnings and compare the resulting coefficients. Finally exploiting the presence of individuals with multiple jobs and shared employers over time, we estimate an econometric model that includes random person and firm effects, a common error component shared by SIPP and DER earnings, and two independent error components that represent the variation unique to each earnings measure. We compare the variance components from this model and consider how the DER and SIPP differ across unobservable components.
View Full
Paper PDF
-
LEHD Infrastructure Files in the Census RDC: Overview of S2004 Snapshot
April 2011
Working Paper Number:
CES-11-13
The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau, with the support of several national research agencies, has built a set of infrastructure files using administrative data provided by state agencies, enhanced with information from other administrative data sources, demographic and economic (business) surveys and censuses. The LEHD Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. This document describes the structure and content of the 2004 Snapshot of the LEHD Infrastructure files as they are made available in the Census Bureau's Research Data Center network.
View Full
Paper PDF
-
Exploring Differences in Employment between Household and Establishment Data
April 2009
Working Paper Number:
CES-09-09
Using a large data set that links individual Current Population Survey (CPS) records to employer-reported administrative data, we document substantial discrepancies in basic measures of employment status that persist even after controlling for known definitional differences between the two data sources. We hypothesize that reporting discrepancies should be most prevalent for marginal workers and marginal jobs, and find systematic associations between the incidence of reporting discrepancies and observable person and job characteristics that are consistent with this hypothesis. The paper discusses the implications of the reported findings for both micro and macro labor market analysis
View Full
Paper PDF
-
Access Methods for United States Microdata
August 2007
Working Paper Number:
CES-07-25
Beyond the traditional methods of tabulations and public-use microdata samples, statistical agencies have developed four key alternatives for providing non-government researchers with access to confidential microdata to improve statistical modeling. The first, licensing, allows qualified researchers access to confidential microdata at their own facilities, provided certain security requirements are met. The second, statistical data enclaves, offer qualified researchers restricted access to confidential economic and demographic data at specific agency-controlled locations. Third, statistical agencies can offer remote access, through a computer interface, to the confidential data under automated or manual controls. Fourth, synthetic data developed from the original data but retaining the correlations in the original data have the potential for allowing a wide range of analyses.
View Full
Paper PDF
-
Distribution Preserving Statistical Disclosure Limitation
September 2006
Working Paper Number:
tp-2006-04
One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed,
partially synthetic data sets. These are data on actual respondents, but with confidential data
replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate
inferences because the distribution of synthetic data is completely determined by the model used
to generate them. We present two practical methods of generating synthetic values when the imputer
has only limited information about the true data generating process. One is applicable when
the true likelihood is known up to a monotone transformation. The second requires only limited
knowledge of the true likelihood, but nevertheless preserves the conditional distribution of the confidential
data, up to sampling error, on arbitrary subdomains. Our method maximizes data utility
and minimizes incremental disclosure risk up to posterior uncertainty in the imputation model and
sampling error in the estimated transformation. We validate the approach with a simulation and
application to a large linked employer-employee database.
View Full
Paper PDF
-
Integrated Longitudinal Employee-Employer Data for the United States
May 2004
Working Paper Number:
tp-2004-02
View Full
Paper PDF
-
The 1990 Decennial Employer-Employee Dataset
October 2002
Working Paper Number:
CES-02-23
We describe the construction and assessment of a new matched employer-employee data set, the 1990 Decennial Employer-Employee Dataset (1990 DEED). By using place of work name and address, we link workers from the 1990 Long Form Sample to their place of work in the 1990 Standard Statistical Establishment List. The resulting data set is much larger and more representative across regional and industry dimensions than previous matched data sets for the United States. The known strengths and limitations of the data set are discussed in detail.
View Full
Paper PDF
-
Agent Heterogeneity and Learning: An Application to Labor Markets
October 2002
Working Paper Number:
tp-2002-20
I develop a matching model with heterogeneous workers, rms, and worker-firm
matches, and apply it to longitudinal linked data on employers and employees. Workers
vary in their marginal product when employed and their value of leisure when unemployed.
Firms vary in their marginal product and cost of maintaining a vacancy. The
marginal product of a worker-firm match also depends on a match-specific interaction
between worker and rm that I call match quality. Agents have complete information
about worker and rm heterogeneity, and symmetric but incomplete information about
match quality. They learn its value slowly by observing production outcomes. There
are two key results. First, under a Nash bargain, the equilibrium wage is linear in a
person-specific component, a firm-specific component, and the posterior mean of beliefs
about match quality. Second, in each period the separation decision depends only on
the posterior mean of beliefs and person and rm characteristics. These results have
several implications for an empirical model of earnings with person and rm eects.
The rst implies that residuals within a worker-firm match are a martingale; the second
implies the distribution of earnings is truncated.
I test predictions from the matching model using data from the Longitudinal
Employer-Household Dynamics (LEHD) Program at the US Census Bureau. I present
both xed and mixed model specifications of the equilibrium wage function, taking
account of structural aspects implied by the learning process. In the most general
specification, earnings residuals have a completely unstructured covariance within a
worker-firm match. I estimate and test a variety of more parsimonious error structures,
including the martingale structure implied by the learning process. I nd considerable
support for the matching model in these data.
View Full
Paper PDF
-
The Sensitivity of Economic Statistics to Coding Errors in Personal Identifiers
October 2002
Working Paper Number:
tp-2002-17
In this paper, we describe the sensitivity of small-cell flow statistics
to coding errors in the identity of the underlying entities. Specifically,
we present results based on a comparison of the U.S. Census Bureau's
Quarterly Workforce Indicators (QWI) before and after correcting for
such errors in SSN-based identifiers in the underlying individual wage
records. The correction used involves a novel application of existing
statistical matching techniques. It is found that even a very conservative
correction procedure has a sizable impact on the statistics. The
average bias ranges from 0.25 percent up to 15 percent for flow statistics,
and up to 5 percent for payroll aggregates.
View Full
Paper PDF