The third chapter investigates measurement error in SIPP annual job
earnings data linked to SSA administrative earnings data. The multiple
earnings measures provided by the survey and administrative data enable
the identification of components of true variation and variation due to
measurement error. We find that 18% of the variation in SIPP annual job
earnings can be attributed to measurement error. We also find that in
both the SIPP and the DER, measurement error is persistent over time.
A lower level of auto-correlation in the SIPP measurement error than in
the economic error component leads to a lower reliability ratio of .62 for
first-differenced earnings.
-
Estimating Measurement Error in SIPP Annual Job Earnings: A Comparison of Census Bureau Survey and SSA Administrative Data
July 2011
Working Paper Number:
CES-11-20
We quantify sources of variation in annual job earnings data collected by the Survey of Income and Program Participation (SIPP) to determine how much of the variation is the result of measurement error. Jobs reported in the SIPP are linked to jobs reported in an administrative database, the Detailed Earnings Records (DER) drawn from the Social Security Administration's Master Earnings File, a universe file of all earnings reported on W-2 tax forms. As a result of the match, each job potentially has two earnings observations per year: survey and administrative. Unlike previous validation studies, both of these earnings measures are viewed as noisy measures of some underlying true amount of annual earnings. While the existence of survey error resulting from respondent mistakes or misinterpretation is widely accepted, the idea that administrative data are also error-prone is new. Possible sources of employer reporting error, employee under-reporting of compensation such as tips, and general differences between how earnings may be reported on tax forms and in surveys, necessitates the discarding of the assumption that administrative data are a true measure of the quantity that the survey was designed to collect. In addition, errors in matching SIPP and DER jobs, a necessary task in any use of administrative data, also contribute to measurement error in both earnings variables. We begin by comparing SIPP and DER earnings for different demographic and education groups of SIPP respondents. We also calculate different measures of changes in earnings for individuals switching jobs. We estimate a standard earnings equation model using SIPP and DER earnings and compare the resulting coefficients. Finally exploiting the presence of individuals with multiple jobs and shared employers over time, we estimate an econometric model that includes random person and firm effects, a common error component shared by SIPP and DER earnings, and two independent error components that represent the variation unique to each earnings measure. We compare the variance components from this model and consider how the DER and SIPP differ across unobservable components.
View Full
Paper PDF
-
Estimating the Relationship between Employer-Provided Health Insurance, Worker Mobility, and Wages
September 2002
Working Paper Number:
tp-2002-23
In this paper, a joint model of wages, hazard of a job ending, and
probability of holding employer-provided health insurance is estimated,
taking account of unobservable person and job characteristics. A unique
data source, the 1990 and 1996 SIPP Panels linked to SSA administrative
job histories, enables the identification of random person and job effects
and the correlation of these effects across the three equations. The explicit
modeling of this correlation produces consistent estimates of the
effect of tenure on wages and the effect of health insurance on mobility.
Substantial levels of job-lock and significant annual returns to seniority
are found. Increasing the job-specific probability of obtaining employerprovided
health insurance from 60% to 63%, or increasing the job-specific
hourly wage rate by $.80, are both associated with an equivalent decrease
in the hazard of the job ending. However, the dollar value of the wage
benefit is substantially higher.
View Full
Paper PDF
-
The Sensitivity of Economic Statistics to Coding Errors in Personal Identifiers
October 2002
Working Paper Number:
tp-2002-17
In this paper, we describe the sensitivity of small-cell flow statistics
to coding errors in the identity of the underlying entities. Specifically,
we present results based on a comparison of the U.S. Census Bureau's
Quarterly Workforce Indicators (QWI) before and after correcting for
such errors in SSN-based identifiers in the underlying individual wage
records. The correction used involves a novel application of existing
statistical matching techniques. It is found that even a very conservative
correction procedure has a sizable impact on the statistics. The
average bias ranges from 0.25 percent up to 15 percent for flow statistics,
and up to 5 percent for payroll aggregates.
View Full
Paper PDF
-
Is it Who You Are, Where You Work, or With Whom You Work? Reassessing the Relationship Between Skill Segregation and Wage Inequality
June 2002
Working Paper Number:
tp-2002-10
In a recent paper, Kremer & Maskin (QJE, forthcoming) develop an assignment model in
which increases in the dispersion and mean of the skill distribution can lead simultaneously
to increases in wage inequality and skill segregation. They then present evidence that,
concurrent with rising wage inequality, wage segregation increased for production workers in
the United States between 1975 and 1986. My paper argues that relying on wages as a proxy
for skill may be problematic. Using a newly developed longitudinal dataset linking virtually
the entire universe of workers in the state of Illinois to their employers, I decompose wages
into components due, not only to person and firm heterogeneity, but also to the characteristics
of their co-workers. Such "co-worker effects" capture the impact of a weighted sum of the
characteristics of all workers in a firm on each individual employee's wage. While rising wage
segregation can result from greater skill segregation, it may also be due to changes in the
variance of co-worker effects in the economy, or to changes in the covariance between the
person, firm, and co-worker components of wages.
Due to the limited availability of demographic information on workers, I rely on the
person specific component of wages to proxy for co-worker "skills." Because these person
effects are unknown ex ante, I implement an iterative estimation approach where they are
first obtained from a preliminary regression that excludes any role for co-workers. Because
virtually all person and firm effects are identified, the approach yields consistent estimates
of the co-worker parameters. My estimates imply that a one standard deviation increase
in both a firm's average person effect and experience level is associated, on average, with
wage increases of 3% to 5%. Firms that increase the wage premia they pay workers appear
to do so in conjunction with upgrading worker quality. Interestingly, the average effect
masks considerable variation in the relative importance of co-workers across industries. After
allowing the co-worker parameters to vary across 2 digit industries, I find that industry
average co-worker effects explain 26% of observed inter-industry wage differentials. Finally,
I decompose the overall distribution of wages into components due to persons, firms, and coworkers.
While co-worker effects do indeed serve to exacerbate wage inequality, the tendency
for high and low skilled workers to sort non-randomly into firms plays a considerably more
prominent role.
View Full
Paper PDF
-
Comparing Measures of Earnings Instability Based on Survey and Adminstrative Reports
August 2010
Working Paper Number:
CES-10-15
In Celik, Juhn, McCue, and Thompson (2009), we found that estimated levels of earnings instability based on data from the Current Population Survey (CPS) and the Survey of Income and Program Participation (SIPP) were reasonably close to each other and to others' estimates from the Panel Study of Income Dynamics (PSID), but estimates from unemployment insurance (UI) earnings were much larger. Given that the UI data are from administrative records which are often posited to be more accurate than survey reports, this raises concerns that measures based on survey data understate true earnings instability. To address this, we use links between survey samples from the SIPP and UI earnings records in the LEHD database to identify sources of differences in work history and earnings information. Substantial work has been done comparing earnings levels from administrative records to those collected in the SIPP and CPS, but our understanding of earnings instability would benefit from further examination of differences across sources in the properties of changes in earnings. We first compare characteristics of the overall and matched samples to address issues of selection in the matching process. We then compare earnings levels and jobs in the SIPP and LEHD data to identify differences between them. Finally we begin to examine how such differences affect estimates of earnings instability. Our preliminary findings suggest that differences in earnings changes for those in the lower tail of the earnings distribution account for much of the difference in instability estimates.
View Full
Paper PDF
-
Modeling Labor Markets with Heterogeneous Agents and Matches
May 2002
Working Paper Number:
tp-2002-19
I present a matching model with heterogeneous workers, firms, and worker-fim
matches. The model generalizes the seminal Jovanovic (1979) model to the case of
heterogeneous agents. The equilibrium wage is linear in a person-specific component,
a firm-specific component, and a match specific component that varies with tenure.
Under certain conditions, the equilibrium wage takes a simpler structure where the
match specific component does not vary with tenure. I discuss fixed- and mixedeffect
methods for estimating wage models with this structure on longitudinal linked
employer-employee data. The fixed effect specification relies on restrictive identification
conditions, but is feasible for very large databases. The mixed model requires less
restrictive identification conditions, but is feasible only on relatively small databases.
Both the fixed and mixed models generate empirical person, firm, and match effects
with characteristics that are consistent with predictions from the matching model; the
mixed model moreso than the fixed model. Shortcomings of the fixed model appear to
be artifacts of the identification conditions.
View Full
Paper PDF
-
Finding Needles in Haystacks: Multiple-Imputation Record Linkage Using Machine Learning
November 2021
Working Paper Number:
CES-21-35
This paper considers the problem of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across establishments is highly skewed. To address these difficulties, this paper develops a probabilistic record linkage methodology that combines machine learning (ML) with multiple imputation (MI). This ML-MI methodology is applied to link survey respondents in the Health and Retirement Study to their workplaces in the Census Business Register. The linked data reveal new evidence that non-sampling errors in household survey data are correlated with respondents' workplace characteristics.
View Full
Paper PDF
-
Estimating the "True" Cost of Job Loss: Evidence Using Matched Data from Califormia 1991-2000
June 2009
Working Paper Number:
CES-09-14
Estimates of the cost of job displacement from survey and administrative data differ markedly. This paper uses a unique match of data between the Displaced Worker Survey (DWS) and administrative wage records from California to examine the sources of this discrepancy. When we use similar estimation methods and account for measurement error in survey wages correlated with worker demographics, estimates of earnings losses at displacement are similar from both datasets and significantly larger than those based on the DWS alone. Also correcting for measurement errors in reported displacements suggests both sources of such estimates may yield lower bounds for the true cost of displacement.
View Full
Paper PDF
-
Longitudinal analysis of SSN response on SIPP 1990-1993 panel
September 2000
Working Paper Number:
tp-2000-01
This document describes the analysis of the SIPP-SSN match quality, and the file resulting for that analysis as distributable to the Census RDCs.
View Full
Paper PDF
-
Displaced workers, early leavers, and re-employment wages
November 2002
Working Paper Number:
tp-2002-18
In this paper, we lay out a search model that takes explicitly into account the
information flow prior to a mass layoff. Using universal wage data files that allow
us to identify individuals working with healthy and displacing firms both at
the time of displacement as well as any other time period, we test the predictions
of the model on re-employment wage differentials. Workers leaving a "distressed"
firm have higher re-employment wages than workers who stay with the
distressed firm until displacement. This result is robust to the inclusion of controls
for worker quality and unobservable firm characteristics.
View Full
Paper PDF