Papers written by Author(s): 'Simon Woodcock'
The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.
See Working Papers by Tag(s), Keywords(s), Author(s), or Search Text
Click here to search again
Frequently Occurring Concepts within this Search
Viewing papers 1 through 7 of 7
-
Working PaperDynamically Consistent Noise Infusion and Partially Synthetic Data as Confidentiality Protection Measures for Related Time Series
July 2012
Working Paper Number:
CES-12-13
The Census Bureau's Quarterly Workforce Indicators (QWI) provide detailed quarterly statistics on employment measures such as worker and job flows, tabulated by worker characteristics in various combinations. The data are released for several levels of NAICS industries and geography, the lowest aggregation of the latter being counties. Disclosure avoidance methods are required to protect the information about individuals and businesses that contribute to the underlying data. The QWI disclosure avoidance mechanism we describe here relies heavily on the use of noise infusion through a permanent multiplicative noise distortion factor, used for magnitudes, counts, differences and ratios. There is minimal suppression and no complementary suppressions. To our knowledge, the release in 2003 of the QWI was the first large-scale use of noise infusion in any official statistical product. We show that the released statistics are analytically valid along several critical dimensions { measures are unbiased and time series properties are preserved. We provide an analysis of the degree to which confidentiality is protected. Furthermore, we show how the judicious use of synthetic data, injected into the tabulation process, can completely eliminate suppressions, maintain analytical validity, and increase the protection of the underlying confidential data.View Full Paper PDF
-
Working PaperDistribution Preserving Statistical Disclosure Limitation
September 2006
Working Paper Number:
tp-2006-04
One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with confidential data replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate them. We present two practical methods of generating synthetic values when the imputer has only limited information about the true data generating process. One is applicable when the true likelihood is known up to a monotone transformation. The second requires only limited knowledge of the true likelihood, but nevertheless preserves the conditional distribution of the confidential data, up to sampling error, on arbitrary subdomains. Our method maximizes data utility and minimizes incremental disclosure risk up to posterior uncertainty in the imputation model and sampling error in the estimated transformation. We validate the approach with a simulation and application to a large linked employer-employee database.View Full Paper PDF
-
Working PaperThe LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators
January 2006
Working Paper Number:
tp-2006-01
The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau, with the support of several national research agencies, has built a set of infrastructure files using administrative data provided by state agencies, enhanced with information from other administrative data sources, demographic and economic (business) surveys and censuses. The LEHD Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. Beginning in 2003 and building on this infrastructure, the Census Bureau has published the Quarterly Workforce Indicators (QWI), a new collection of data series that offers unprecedented detail on the local dynamics of labor markets. Despite the fine detail, confidentiality is maintained due to the application of state-of-the-art confidentiality protection methods. This article describes how the input files are compiled and combined to create the infrastructure files. We describe the multiple imputation methods used to impute in missing data and the statistical matching techniques used to combine and edit data when a direct identifier match requires improvement. Both of these innovations are crucial to the success of the final product. Finally, we pay special attention to the details of the confidentiality protection system used to protect the identity and micro data values of the underlying entities used to form the published estimates. We provide a brief description of public-use and restricted-access data files with pointers to further documentation for researchers interested in using these data.View Full Paper PDF
-
Working PaperMultiply-Imputing Confidential Characteristics and File Links in Longitudinal Linked Data
June 2004
Working Paper Number:
tp-2004-04
This paper describes ongoing research to protect confidentiality in longitudinal linked data through creation of multiply-imputed, partially synthetic data. We present two enhancements to the methods of [2]. The first is designed to preserve marginal distributions in the partially synthetic data. The second is designed to protect confidential links between sampling frames.View Full Paper PDF
-
Working PaperAgent Heterogeneity and Learning: An Application to Labor Markets
October 2002
Working Paper Number:
tp-2002-20
I develop a matching model with heterogeneous workers, rms, and worker-firm matches, and apply it to longitudinal linked data on employers and employees. Workers vary in their marginal product when employed and their value of leisure when unemployed. Firms vary in their marginal product and cost of maintaining a vacancy. The marginal product of a worker-firm match also depends on a match-specific interaction between worker and rm that I call match quality. Agents have complete information about worker and rm heterogeneity, and symmetric but incomplete information about match quality. They learn its value slowly by observing production outcomes. There are two key results. First, under a Nash bargain, the equilibrium wage is linear in a person-specific component, a firm-specific component, and the posterior mean of beliefs about match quality. Second, in each period the separation decision depends only on the posterior mean of beliefs and person and rm characteristics. These results have several implications for an empirical model of earnings with person and rm eects. The rst implies that residuals within a worker-firm match are a martingale; the second implies the distribution of earnings is truncated. I test predictions from the matching model using data from the Longitudinal Employer-Household Dynamics (LEHD) Program at the US Census Bureau. I present both xed and mixed model specifications of the equilibrium wage function, taking account of structural aspects implied by the learning process. In the most general specification, earnings residuals have a completely unstructured covariance within a worker-firm match. I estimate and test a variety of more parsimonious error structures, including the martingale structure implied by the learning process. I nd considerable support for the matching model in these data.View Full Paper PDF
-
Working PaperModeling Labor Markets with Heterogeneous Agents and Matches
May 2002
Working Paper Number:
tp-2002-19
I present a matching model with heterogeneous workers, firms, and worker-fim matches. The model generalizes the seminal Jovanovic (1979) model to the case of heterogeneous agents. The equilibrium wage is linear in a person-specific component, a firm-specific component, and a match specific component that varies with tenure. Under certain conditions, the equilibrium wage takes a simpler structure where the match specific component does not vary with tenure. I discuss fixed- and mixedeffect methods for estimating wage models with this structure on longitudinal linked employer-employee data. The fixed effect specification relies on restrictive identification conditions, but is feasible for very large databases. The mixed model requires less restrictive identification conditions, but is feasible only on relatively small databases. Both the fixed and mixed models generate empirical person, firm, and match effects with characteristics that are consistent with predictions from the matching model; the mixed model moreso than the fixed model. Shortcomings of the fixed model appear to be artifacts of the identification conditions.View Full Paper PDF
-
Working PaperThe LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators
March 2002
Working Paper Number:
tp-2002-05