This paper describes ongoing research to protect confidentiality in longitudinal linked
data through creation of multiply-imputed, partially synthetic data. We present two enhancements to the methods
of [2]. The first is designed to preserve marginal distributions in the partially synthetic data. The second is
designed to protect confidential links between sampling frames.
-
Distribution Preserving Statistical Disclosure Limitation
September 2006
Working Paper Number:
tp-2006-04
One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed,
partially synthetic data sets. These are data on actual respondents, but with confidential data
replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate
inferences because the distribution of synthetic data is completely determined by the model used
to generate them. We present two practical methods of generating synthetic values when the imputer
has only limited information about the true data generating process. One is applicable when
the true likelihood is known up to a monotone transformation. The second requires only limited
knowledge of the true likelihood, but nevertheless preserves the conditional distribution of the confidential
data, up to sampling error, on arbitrary subdomains. Our method maximizes data utility
and minimizes incremental disclosure risk up to posterior uncertainty in the imputation model and
sampling error in the estimated transformation. We validate the approach with a simulation and
application to a large linked employer-employee database.
View Full
Paper PDF
-
Synthetic Data and Confidentiality Protection
September 2003
Working Paper Number:
tp-2003-10
View Full
Paper PDF
-
Firm Market Power and the Earnings Distribution
December 2011
Working Paper Number:
CES-11-41
Using the Longitudinal Employer Household Dynamics (LEHD) data from the United States Census Bureau, I compute firm-level measures of labor market (monopsony) power. To generate these measures, I extend the dynamic model proposed by Manning (2003) and estimate the labor supply elasticity facing each private non-farm firm in the US. While a link between monopsony power and earnings has traditionally been assumed, I provide the first direct evidence of the positive relationship between a firm\'s labor supply elasticity and the earnings of its workers. I also contrast the semistructural method with the more traditional use of concentration ratios to measure a firm\'s labor market power. In addition, I provide several alternative measures of labor market power which account for potential threats to identification such as endogenous mobility. Finally, I construct a counterfactual earnings distribution which allows the effects of firm market power to vary across the earnings distribution. I estimate the average firm\'s labor supply elasticity to be 1.08, however my findings suggest there to be significant variability in the distribution of firm market power across US firms, and that dynamic monopsony models are superior to the use of concentration ratios in evaluating a firm\'s labor market power. I find that a one-unit increase in the labor supply elasticity to the firm is associated with wage gains of between 5 and 18 percent. While nontrivial, these estimates imply that firms do not fully exercise their labor market power over their workers. Furthermore, I find that the negative earnings impact of a firm\'s market power is strongest in the lower half of the earnings distribution, and that a one standard deviation increase in firms\' labor supply elasticities reduces the variance of the earnings distribution by 9 percent.
View Full
Paper PDF
-
IT Investment and Firm Performance in U.S. Retail Trade
June 2002
Working Paper Number:
CES-02-14
We examine the relationships between investments in information technology (IT) and two measures of retail firm performance -- productivity and establishment growth -- over the 1992 to 1997 period. We use untapped firm and establishment micro data from the Censuses of Retail Trade and the Assets and Expenditures Survey. We show that large firms account for most retail IT investment, employment and establishment growth. We find evidence of a significant relationship between IT investment intensity and productivity growth. We found no such evidence of a link between IT growth in the number of establishments operated by retail firms.
View Full
Paper PDF
-
Human Capital Spillovers in Manufacturing: Evidence from Plant-Level Production Functions
November 2002
Working Paper Number:
CES-02-27
I assess the magnitude of human capital spillovers in US cities by estimating plant level production functions. I use a unique firm-worker matched dataset, obtained by combining the Census of Manufacturers with the Census of Population. After controlling for a plant's own human capital, plant fixed effects, industry-specific and state-specific transitory shocks, I find that the output of plants located in cities that experience large increases in the share of college graduates rises more than the output of similar plants located in cities that experience small increases in the share of college graduates. Several specification tests indicate that the estimated effect is not completely spurious. First, within a city, the spillover between plants that are geographically and economically close is positive, while spillovers between plants that are geographically close but economically distant is zero. Second, most of the estimated spillover comes from hi-tech plants. For non hi-tech productions, the spillover is virtually zero. When I stratify the sample by the percentage of employees who are college educated, I find that the spillover is larger the larger the percentage of college educated workers in the plant. Third, density of physical capital in a city outside a plant has no effect on a plant's productivity. Consistent with a model that includes both standard and general equilibrium forces and spillovers, the estimated productivity differences between cities with high and low levels of human capital match remarkably well differences in labor costs that are typically observed between cities with high and low levels of human capital. This is important because, in equilibrium, any productivity gain generated by human capital spillover should be offset by increased costs.
View Full
Paper PDF
-
Total Error and Variability Measures for the Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in OnTheMap
September 2020
Working Paper Number:
CES-20-30
We report results from the first comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total flow-employment, beginning-of-quarter employment, full quarter employment, average monthly earnings of full-quarter employees, and total quarterly payroll. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace reports as displayed in On-TheMap (OTM), including OnTheMap for Emergency Management. We account for errors due to coverage; record-level non response; edit and imputation of item missing data; and statistical disclosure limitation. The analysis reveals that the five publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs are a transition zone, where cells may be fit for use with caution. Tabulations involving one or two jobs, which are generally suppressed on fitness-for-use criteria in the QWI and synthesized in LODES, have substantial total variability but can still be used to estimate statistics for untabulated aggregates as long as the job count in the aggregate is more than 10.
View Full
Paper PDF
-
Agent Heterogeneity and Learning: An Application to Labor Markets
October 2002
Working Paper Number:
tp-2002-20
I develop a matching model with heterogeneous workers, rms, and worker-firm
matches, and apply it to longitudinal linked data on employers and employees. Workers
vary in their marginal product when employed and their value of leisure when unemployed.
Firms vary in their marginal product and cost of maintaining a vacancy. The
marginal product of a worker-firm match also depends on a match-specific interaction
between worker and rm that I call match quality. Agents have complete information
about worker and rm heterogeneity, and symmetric but incomplete information about
match quality. They learn its value slowly by observing production outcomes. There
are two key results. First, under a Nash bargain, the equilibrium wage is linear in a
person-specific component, a firm-specific component, and the posterior mean of beliefs
about match quality. Second, in each period the separation decision depends only on
the posterior mean of beliefs and person and rm characteristics. These results have
several implications for an empirical model of earnings with person and rm eects.
The rst implies that residuals within a worker-firm match are a martingale; the second
implies the distribution of earnings is truncated.
I test predictions from the matching model using data from the Longitudinal
Employer-Household Dynamics (LEHD) Program at the US Census Bureau. I present
both xed and mixed model specifications of the equilibrium wage function, taking
account of structural aspects implied by the learning process. In the most general
specification, earnings residuals have a completely unstructured covariance within a
worker-firm match. I estimate and test a variety of more parsimonious error structures,
including the martingale structure implied by the learning process. I nd considerable
support for the matching model in these data.
View Full
Paper PDF
-
Using Partially Synthetic Microdata to Protect Sensitive Cells in Business Statistics
February 2016
Working Paper Number:
CES-16-10
We describe and analyze a method that blends records from both observed and synthetic microdata into public-use tabulations on establishment statistics. The resulting tables use synthetic data only in potentially sensitive cells. We describe different algorithms, and present preliminary results when applied to the Census Bureau's Business Dynamics Statistics and Synthetic Longitudinal Business Database, highlighting accuracy and protection afforded by the method when compared to existing public-use tabulations (with suppressions).
View Full
Paper PDF
-
Total Error and Variability Measures with Integrated Disclosure Limitation for Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in On The Map
January 2017
Working Paper Number:
CES-17-71
We report results from the rst comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total employment, beginning-of-quarter employment, full-quarter employment, total payroll, and average monthly earnings of full-quarter employees. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace reports as displayed in OnTheMap (OTM). The evaluation is conducted by generating multiple threads of the edit and imputation models used in the LEHD Infrastructure File System. These threads conform to the Rubin (1987) multiple imputation model, with each thread or implicate being the output of formal probability models that address coverage, edit, and imputation errors. Design-based sampling variability and nite population corrections are also included in the evaluation. We derive special formulas for the Rubin total variability and its components that are consistent with the disclosure avoidance system used for QWI and LODES/OTM workplace reports. These formulas allow us to publish the complete set of detailed total quality measures for QWI and LODES. The analysis reveals that the five publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs have quality in the range generally deemed acceptable. Tabulations involving zero, one or two jobs, which are generally suppressed in the QWI and synthesized in LODES, have substantial total variability but their publication in LODES allows the formation of larger custom aggregations, which will in general have the accuracy estimated for tabulations in the QWI based on a similar number of workers.
View Full
Paper PDF
-
New Approaches to Confidentiality Protection Synthetic Data, Remote Access and Research Data Centers
June 2004
Working Paper Number:
tp-2004-03
View Full
Paper PDF