-
Measuring Cross-Country Differences in Misallocation
January 2016
Working Paper Number:
CES-16-50R
We describe differences between the commonly used version of the U.S. Census of Manufactures available at the RDCs and what establishments themselves report. The originally reported data has substantially more dispersion in measured establishment productivity. Measured allocative efficiency is substantially higher in the cleaned data than the raw data: 4x higher in 2002, 20x in 2007, and 80x in 2012. Many of the important editing strategies at the Census, including industry analysts' manual edits and edits using tax records, are infeasible in non-U.S. datasets. We describe a new Bayesian approach for editing and imputation that can be used across contexts.
View Full
Paper PDF
-
When Race and Hispanic Origin Reporting are Discrepant Across Administrative Records and Third Party Sources: Exploring Methods to Assign Responses
December 2015
Working Paper Number:
carra-2015-08
The U.S. Census Bureau is researching uses of administrative records and third party data in survey and decennial census operations. One potential use of administrative records is to utilize these data when race and Hispanic origin responses are missing. When federal and third party administrative records are compiled, race and Hispanic origin responses are not always the same for an individual across sources. We explore different methods to assign one race and one Hispanic response when these responses are discrepant. We also describe the characteristics of individuals with matching, non-matching, and missing race and Hispanic origin data by demographic, household, and contextual variables. We find that minorities, especially Hispanics, are more likely to have non-matching Hispanic origin and race responses in administrative records and third party data compared to the 2010 Census. Minority groups and individuals ages 0-17 are more likely to have missing race or Hispanic origin data in administrative records and third party data. Larger households tend to have more missing race data in administrative records and third party data than smaller households.
View Full
Paper PDF
-
Statistics on the International Trade Administration's Global Markets Program
September 2015
Working Paper Number:
CES-15-17
Recent mandates for evidence-based policy choices from both the Executive and Legislative branches of the federal government underscore the importance of understanding the relationship between program participation and business outcomes. In this paper, we examine the correlations between participation in an export-promotion program and business outcomes. We use this experience to provide more general lessons learned about combining program data on treatments with Census Bureau micro data that can be used as a control. Note this paper does not evaluate a program, but instead provides critical information about a program.
The mission of the Commercial Service/Global Markets program is to help companies either start or increase their exports of goods and services. It pursues this mission through advocacy, events, and counseling. This study looks at a very small part of the overall program. While we cannot rule-out several sources of bias in our results, we do observe several consistent patterns across our models. In particular, program participation is positively correlated with export growth and change and, for small businesses, also with positive employment growth. However, overall, and for large firms in particular, there is a negative correlation with employment growth and counseling. The paper concludes with a 'Lessons Learned' section that highlights areas where measurement can be improved.
View Full
Paper PDF
-
Business Dynamics of Innovating Firms: Linking U.S. Patents with Administrative Data on Workers and Firms
July 2015
Working Paper Number:
CES-15-19
This paper discusses the construction of a new longitudinal database tracking inventors and patent-owning firms over time. We match granted patents between 2000 and 2011 to administrative databases of firms and workers housed at the U.S. Census Bureau. We use inventor information in addition to the patent assignee firm name to and improve on previous efforts linking patents to firms. The triangulated database allows us to maximize match rates and provide validation for a large fraction of matches. In this paper, we describe the construction of the database and explore basic features of the data. We find patenting firms, particularly young patenting firms, disproportionally contribute jobs to the U.S. economy. We find patenting is a relatively rare event among small firms but that most patenting firms are nevertheless small, and that patenting is not as rare an event for the youngest firms compared to the oldest firms. While manufacturing firms are more likely to patent than firms in other sectors, we find most patenting firms are in the services and wholesale sectors. These new data are a product of collaboration within the U.S. Department of Commerce, between the U.S. Census Bureau and the U.S. Patent and Trademark Office.
View Full
Paper PDF
-
Coverage and Agreement of Administrative Records and 2010 American Community Survey Demographic Data
November 2014
Working Paper Number:
carra-2014-14
The U.S. Census Bureau is researching possible uses of administrative records in decennial census and survey operations. The 2010 Census Match Study and American Community Survey (ACS) Match Study represent recent efforts by the Census Bureau to evaluate the extent to which administrative records provide data on persons and addresses in the 2010 Census and 2010 ACS. The 2010 Census Match Study also examines demographic response data collected in administrative records. Building on this analysis, we match data from the 2010 ACS to federal administrative records and third party data as well as to previous census data and examine administrative records coverage and agreement of ACS age, sex, race, and Hispanic origin responses. We find high levels of coverage and agreement for sex and age responses and variable coverage and agreement across race and Hispanic origin groups. These results are similar to findings from the 2010 Census Match Study.
View Full
Paper PDF
-
NOISE INFUSION AS A CONFIDENTIALITY PROTECTION MEASURE FOR GRAPH-BASED STATISTICS
September 2014
Working Paper Number:
CES-14-30
We use the bipartite graph representation of longitudinally linked em-ployer-employee data, and the associated projections onto the employer and em-ployee nodes, respectively, to characterize the set of potential statistical summar-ies that the trusted custodian might produce. We consider noise infusion as the primary confidentiality protection method. We show that a relatively straightfor-ward extension of the dynamic noise-infusion method used in the U.S. Census Bureau's Quarterly Workforce Indicators can be adapted to provide the same confidentiality guarantees for the graph-based statistics: all inputs have been modified by a minimum percentage deviation (i.e., no actual respondent data are used) and, as the number of entities contributing to a particular statistic increases, the accuracy of that statistic approaches the unprotected value. Our method also ensures that the protected statistics will be identical in all releases based on the same inputs.
View Full
Paper PDF
-
2010 American Community Survey Match Study
July 2014
Working Paper Number:
carra-2014-03
Using administrative records data from federal government agencies and commercial sources, the 2010 ACS Match Study measures administrative records coverage of 2010 ACS addresses, persons, and persons at addresses at different levels of geography as well as by demographic characteristics and response mode. The 2010 ACS Match Study represents a continuation of the research undertaken in the 2010 Census Match Study, the first national-level evaluation of administrative records data coverage. Preliminary results indicate that administrative records provide substantial coverage for addresses and persons in the 2010 ACS (92.7 and 92.1 percent respectively), and less extensive though substantial coverage, for person-address pairs (74.3 percent). In addition, some variation in address, person and/or person-address coverage is found across demographic and response mode groups. This research informs future uses of administrative records in survey and decennial census operations to address the increasing costs of data collection and declining response rates.
View Full
Paper PDF
-
LEHD Infrastructure files in the Census RDC - Overview
June 2014
Working Paper Number:
CES-14-26
The Longitudinal Employer-Household Dynamics (LEHD) Program at the U.S. Census Bureau, with the support of several national research agencies, maintains a set of infrastructure files using administrative data provided by state agencies, enhanced with information from other administrative data sources, demographic and economic (business) surveys and censuses. The LEHD Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. This document describes the structure and content of the 2011 Snapshot of the LEHD Infrastructure files as they are made available in the Census Bureaus secure and restricted-access Research Data Center network. The document attempts to provide a comprehensive description of all researcher-accessible files, of their creation, and of any modifcations made to the files to facilitate researcher access.
View Full
Paper PDF
-
A FIRST STEP TOWARDS A GERMAN SYNLBD: CONSTRUCTING A GERMAN LONGITUDINAL BUSINESS DATABASE
February 2014
Working Paper Number:
CES-14-13
One major criticism against the use of synthetic data has been that the efforts necessary to generate useful synthetic data are so in- tense that many statistical agencies cannot afford them. We argue many lessons in this evolving field have been learned in the early years of synthetic data generation, and can be used in the development of new synthetic data products, considerably reducing the required in- vestments. The final goal of the project described in this paper will be to evaluate whether synthetic data algorithms developed in the U.S. to generate a synthetic version of the Longitudinal Business Database (LBD) can easily be transferred to generate a similar data product for other countries. We construct a German data product with infor- mation comparable to the LBD - the German Longitudinal Business Database (GLBD) - that is generated from different administrative sources at the Institute for Employment Research, Germany. In a fu- ture step, the algorithms developed for the synthesis of the LBD will be applied to the GLBD. Extensive evaluations will illustrate whether the algorithms provide useful synthetic data without further adjustment. The ultimate goal of the project is to provide access to multiple synthetic datasets similar to the SynLBD at Cornell to enable comparative studies between countries. The Synthetic GLBD is a first step towards that goal.
View Full
Paper PDF
-
IMPROVING THE SYNTHETIC LONGITUDINAL BUSINESS DATABASE
February 2014
Working Paper Number:
CES-14-12
In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments' confidentiality. Agencies potentially can manage these risks by releasing synthetic microdata, i.e., individual establishment records simulated from statistical models de- signed to mimic the joint distribution of the underlying observed data. Previously, we used this approach to generate a public-use version'now available for public use'of the U. S. Census Bureau's Longitudinal Business Database (LBD), a longitudinal cen- sus of establishments dating back to 1976. While the synthetic LBD has proven to be a useful product, we now seek to improve and expand it by using new synthesis models and adding features. This article describes our efforts to create the second generation of the SynLBD, including synthesis procedures that we believe could be replicated in other contexts.
View Full
Paper PDF