-
The Alpha Beta Gamma of the Labor Market
April 2022
Working Paper Number:
CES-22-10
Using a large panel dataset of US workers, we calibrate a search-theoretic model of the labor market, where workers are heterogeneous with respect to the parameters governing their employment transitions. We first approximate heterogeneity with a discrete number of latent types, and then calibrate type-specific parameters by matching type-specific moments. Heterogeneity is well approximated by 3 types: as, 's and ?s. Workers of type a find employment quickly because they have large gains from trade, and stick to their jobs because their productivity is similar across jobs. Workers of type ? find employment slowly because they have small gains from trade, and are unlikely to stick to their job because they keep searching for jobs in the right tail of the productivity distribution. During the Great Recession, the magnitude and persistence of aggregate unemployment is caused by ?s, who are vulnerable to shocks and, once displaced, they cycle through multiple unemployment spells before finding stable employment.
View Full
Paper PDF
-
Estimating the Costs of Covering Dependents through Employer-Sponsored Plans
January 2017
Working Paper Number:
CES-17-48
Several health reform microsimulation models use synthetic firms to estimate how changes in federal and state policies will affect employers' offers of health insurance, as well as the price of health insurance for workers and firms. These models typically rely on distinct measures of the average costs of single and dependent coverage, for employees and employers, which do not capture the joint distribution of these costs. Since some firms pay a large share of the premium for single polices but a lower share for dependent coverage, or the reverse, simulation models that do not account for the joint distribution of premium costs may not be sufficient to answer certain policy questions. To address this issue, we developed a method to extract estimates of the joint distribution of employer and employee costs of health insurance coverage from the Medical Expenditure Panel Survey ' Insurance Component (MEPS-IC). This paper describes how these distributions were constructed and how they were incorporated into the Urban Institute's Health Insurance Policy Simulation Model (HIPSM). The estimates presented in this paper and those available in supplementary datasets may be useful for other simulation models that need to utilize information on the joint distribution of single and dependent employee premium contributions.
View Full
Paper PDF
-
A Comparison of Training Modules for Administrative Records Use in Nonresponse Followup Operations: The 2010 Census and the American Community Survey
January 2017
Working Paper Number:
CES-17-47
While modeling work in preparation for the 2020 Census has shown that administrative records can be predictive of Nonresponse Followup (NRFU) enumeration outcomes, there is scope to examine the robustness of the models by using more recent training data. The models deployed for workload removal from the 2015 and 2016 Census Tests were based on associations of the 2010 Census with administrative records. Training the same models with more recent data from the American Community Survey (ACS) can identify any changes in parameter associations over time that might reduce the accuracy of model predictions. Furthermore, more recent training data would allow for the
incorporation of new administrative record sources not available in 2010. However, differences in ACS methodology and the smaller sample size may limit its applicability. This paper replicates earlier results and examines model predictions based on the ACS in comparison with NRFU outcomes. The evaluation
consists of a comparison of predicted counts and household compositions with actual 2015 NRFU outcomes. The main findings are an overall validation of the methodology using independent data.
View Full
Paper PDF
-
File Matching with Faulty Continuous Matching Variables
January 2017
Working Paper Number:
CES-17-45
We present LFCMV, a Bayesian file linking methodology designed to link records using continuous matching variables in situations where we do not expect values of these matching variables to agree exactly across matched pairs. The method involves a linking model for the distance between the matching variables of records in one file and the matching variables of their linked records in the second. This linking model is conditional on a vector indicating the links. We specify a mixture model for the distance component of the linking model, as this latent structure allows the distance between matching variables in linked pairs to vary across types of linked pairs. Finally, we specify a model for the linking vector. We describe the Gibbs sampling algorithm for sampling from the posterior distribution of this linkage model and use artificial data to illustrate model performance. We also introduce a linking application using public survey information and data from the U.S. Census of Manufactures and use
LFCMV to link the records.
View Full
Paper PDF
-
Geography in Reduced Form
January 2017
Working Paper Number:
CES-17-10
Geography models have introduced and estimated a set of competing explanations for the persistent relationships between firm and location characteristics, but cannot identify these forces. I introduce a solution method for models in arbitrary geographies that generates reduced-form predictions and tests to identify forces acting through geographic linkages. This theoretical approach creates a new strategy for spatial empirics. Using the correct observables, the model shows that geographic forces can be taken into account without being directly estimated; establishment and employment density emerge as sufficient statistics for all geographic forces. I present two applications. First, the model can be used to evaluate whether geographic linkages matter and when simplified models suffice: the mono-centric model is a good fit for business services firms but cannot capture the geography of manufactures. Second, the model generates reduced-form tests that distinguish between spillovers and firm sorting and finds evidence of sorting.
View Full
Paper PDF
-
Simultaneous Edit-Imputation for Continuous Microdata
December 2015
Working Paper Number:
CES-15-44
Many statistical organizations collect data that are expected to satisfy linear constraints; as examples, component variables should sum to total variables, and ratios of pairs of variables should be bounded by expert-specified constants. When reported data violate constraints, organizations identify and replace values potentially in error in a process known as edit-imputation. To date, most approaches separate the error localization and imputation steps, typically using optimization methods to identify the variables to change followed by hot deck imputation. We present an approach that fully integrates editing and imputation for continuous microdata under linear constraints. Our approach relies on a Bayesian hierarchical model that includes (i) a flexible joint probability model for the underlying true values of the data with support only on the set of values that satisfy all editing constraints, (ii) a model for latent indicators of the variables that are in error, and (iii) a model for the reported responses for variables in error. We illustrate the potential advantages of the Bayesian editing approach over existing approaches using simulation studies. We apply the model to edit faulty data from the 2007 U.S. Census of Manufactures. Supplementary materials for this article are available online.
View Full
Paper PDF
-
A FIRST STEP TOWARDS A GERMAN SYNLBD: CONSTRUCTING A GERMAN LONGITUDINAL BUSINESS DATABASE
February 2014
Working Paper Number:
CES-14-13
One major criticism against the use of synthetic data has been that the efforts necessary to generate useful synthetic data are so in- tense that many statistical agencies cannot afford them. We argue many lessons in this evolving field have been learned in the early years of synthetic data generation, and can be used in the development of new synthetic data products, considerably reducing the required in- vestments. The final goal of the project described in this paper will be to evaluate whether synthetic data algorithms developed in the U.S. to generate a synthetic version of the Longitudinal Business Database (LBD) can easily be transferred to generate a similar data product for other countries. We construct a German data product with infor- mation comparable to the LBD - the German Longitudinal Business Database (GLBD) - that is generated from different administrative sources at the Institute for Employment Research, Germany. In a fu- ture step, the algorithms developed for the synthesis of the LBD will be applied to the GLBD. Extensive evaluations will illustrate whether the algorithms provide useful synthetic data without further adjustment. The ultimate goal of the project is to provide access to multiple synthetic datasets similar to the SynLBD at Cornell to enable comparative studies between countries. The Synthetic GLBD is a first step towards that goal.
View Full
Paper PDF
-
IMPROVING THE SYNTHETIC LONGITUDINAL BUSINESS DATABASE
February 2014
Working Paper Number:
CES-14-12
In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments' confidentiality. Agencies potentially can manage these risks by releasing synthetic microdata, i.e., individual establishment records simulated from statistical models de- signed to mimic the joint distribution of the underlying observed data. Previously, we used this approach to generate a public-use version'now available for public use'of the U. S. Census Bureau's Longitudinal Business Database (LBD), a longitudinal cen- sus of establishments dating back to 1976. While the synthetic LBD has proven to be a useful product, we now seek to improve and expand it by using new synthesis models and adding features. This article describes our efforts to create the second generation of the SynLBD, including synthesis procedures that we believe could be replicated in other contexts.
View Full
Paper PDF
-
MISCLASSIFICATION IN BINARY CHOICE MODELS
May 2013
Working Paper Number:
CES-13-27
We derive the asymptotic bias from misclassification of the dependent variable in binary choice models. Measurement error is necessarily non-classical in this case, which leads to bias in linear and non-linear models even if only the dependent variable is mismeasured. A Monte Carlo study and an application to food stamp receipt show that the bias formulas are useful to analyze the sensitivity of substantive conclusions, to interpret biased coefficients and imply features of the estimates that are robust to misclassification. Using administrative records linked to survey data as validation data, we examine estimators that are consistent under misclassification. They can improve estimates if their assumptions hold, but can aggravate the problem if the assumptions are invalid. The estimators differ
in their robustness to such violations, which can be improved by incorporating additional information. We propose tests for the presence and nature of misclassification that can help to choose an estimator.
View Full
Paper PDF
-
Estimating the Distribution of Plant-Level Manufacturing Energy Efficiency with Stochastic Frontier Regression
March 2007
Working Paper Number:
CES-07-07
A feature commonly used to distinguish between parametric/statistical models and engineering models is that engineering models explicitly represent best practice technologies while the parametric/statistical models are typically based on average practice. Measures of energy intensity based on average practice are less useful in the corporate management of energy or for public policy goal setting. In the context of company or plant level energy management, it is more useful to have a measure of energy intensity capable of representing where a company or plant lies within a distribution of performance. In other words, is the performance close (or far) from the industry best practice? This paper presents a parametric/statistical approach that can be used to measure best practice, thereby providing a measure of the difference, or 'efficiency gap' at a plant, company or overall industry level. The approach requires plant level data and applies a stochastic frontier regression analysis to energy use. Stochastic frontier regression analysis separates the energy intensity into three components, systematic effects, inefficiency, and statistical (random) error. The stochastic frontier can be viewed as a sub-vector input distance function. One advantage of this approach is that physical product mix can be included in the distance function, avoiding the problem of aggregating output to define a single energy/output ratio to measure energy intensity. The paper outlines the methods and gives an example of the analysis conducted for a non-public micro-dataset of wet corn refining plants.
View Full
Paper PDF