A primary responsibility of the Center for Economic Studies (CES) of the U.S. Bureau of the Census is to facilitate researcher access to confidential economic microdata files. Benefits from this program accrue not only to policy makers--there is a growing awareness of the importance of microdata for analyzing both the descriptive and welfare implications of regulatory and environmental changes--but also and importantly to the statistical agencies themselves. In fact, there is substantial recent literature arguing for the proposition that the largest single improvement that the U.S. statistical system could make is to improve its analytic capabilities. In this paper I briefly discuss these benefits to greater access for analytical work and ways to achieve them. Due to the nature of business data, public use databases and masking technologies are not available as vehicles for releasing useful microdata files. I conclude that a combination of outside and inside research programs, carefully coordinated and integrated is the best model for ensuring that statistical agencies reap the gains from analytic data users. For the United States, at least, this is fortuitous with respect to justifying access since any direct research with confidential data by outsiders must have a "statistical purpose". Until the advent of CES, it was virtually impossible for researchers to work with the economic microdata collected by the various economic censuses. While the CES program is quite large, as it now stands, researchers, or their representatives, must come to the Census Bureau in Washington, D.C. to access the data. The success of the program has led to increasing demands for data access in facilities outside of the Washington, D.C. area. Two options are considered: 1) Establish Census Bureau facilities in various universities or similar nonprofit research facilities and 2) Develop CES regional operations in existing Census Bureau regional offices.
-
Multiple Classification Systems For Economic Data: Can A Thousand Flowers Bloom? And Should They?
December 1991
Working Paper Number:
CES-91-08
The principle that the statistical system should provide flexibility-- possibilities for generating multiple groupings of data to satisfy multiple objectives--if it is to satisfy users is universally accepted. Yet in practice, this goal has not been achieved. This paper discusses the feasibility of providing flexibility in the statistical system to accommodate multiple uses of the industrial data now primarily examined within the Standard Industrial Classification (SIC) system. In one sense, the question of feasibility is almost trivial. With today's computer technology, vast amounts of data can be manipulated and stored at very low cost. Reconfigurations of the basic data are very inexpensive compared to the cost of collecting the data. Flexibility in the statistical system implies more than the technical ability to regroup data. It requires that the basic data are sufficiently detailed to support user needs and are processed and maintained in a fashion that makes the use of a variety of aggregation rules possible. For this to happen, statistical agencies must recognize the need for high quality microdata and build this into their planning processes. Agencies need to view their missions from a multiple use perspective and move away from use of a primary reporting and collection vehicle. Although the categories used to report data must be flexible, practical considerations dictate that data collection proceed within a fixed classification system. It is simply too expensive for both respondents and statistical agencies to process survey responses in the absence of standardized forms, data entry programs, etc. I argue for a basic classification centered on commodities--products, services, raw materials and labor inputs--as the focus of data collection. The idea is to make the principle variables of interest--the commodities--the vehicle for the collection and processing of the data. For completeness, the basic classification should include labor usage through some form of occupational classification. In most economic surveys at the Census Bureau, the reporting unit and the classified unit have been the establishment. But there is no need for this to be so. The basic principle to be followed in data collection is that the data should be collected in the most efficient way--efficiency being defined jointly in terms of statistical agency collection costs and respondent burdens.
View Full
Paper PDF
-
The Importance of Establishment Data in Economic Research
August 1993
Working Paper Number:
CES-93-10
The importance and usefulness of establishment microdata for economic research and policy analysis is outlined and contrasted with traditional products of statistical agencies -- aggregate cross-section tabulations. It is argued that statistical agencies must begin to seriously rethink the way they view establishment data products.
View Full
Paper PDF
-
Access Methods for United States Microdata
August 2007
Working Paper Number:
CES-07-25
Beyond the traditional methods of tabulations and public-use microdata samples, statistical agencies have developed four key alternatives for providing non-government researchers with access to confidential microdata to improve statistical modeling. The first, licensing, allows qualified researchers access to confidential microdata at their own facilities, provided certain security requirements are met. The second, statistical data enclaves, offer qualified researchers restricted access to confidential economic and demographic data at specific agency-controlled locations. Third, statistical agencies can offer remote access, through a computer interface, to the confidential data under automated or manual controls. Fourth, synthetic data developed from the original data but retaining the correlations in the original data have the potential for allowing a wide range of analyses.
View Full
Paper PDF
-
The Center for Economic Studies 1982-2007: A Brief History
October 2009
Working Paper Number:
CES-09-35
More than half a century ago, visionaries representing both the Census Bureau and the external research community laid the foundation for the Center for Economic Studies (CES) and the Research Data Center (RDC) system. They saw a clear need for a system meeting the inextricably related requirements of providing more and better information from existing Census Bureau data collections while preserving respondent confidentiality and privacy. CES opened in 1982 to house new longitudinal business databases, develop them further, and make them available to qualified researchers. CES and the RDC system evolved to meet the designers' requirements. Research at CES and the RDCs meets the commitments of the Census Bureau (and, recently, of other agencies) to preserving confidentiality while contributing paradigm-shifting fundamental research in a range of disciplines and up-to-the-minute critical tools for decision-makers.
View Full
Paper PDF
-
Lessons for Targeted Program Evaluation: A Personal and Professional History of the Survey of Program Dynamics
August 2007
Working Paper Number:
CES-07-24
The Survey of Program Dynamics (SPD) was created by the 1996 welfare reform legislation to facilitate its evaluation. This paper describes the evolution of that survey, discusses its implementation, and draws lessons for future evaluation. Large-scale surveys can be an important part of a portfolio of evaluation methods, but sufficient time must be given to data collection agencies if a high-quality longitudinal survey is expected. Such a survey must have both internal (agency) and external (policy analyst) buy-in. Investments in data analysis by agency staff, downplayed in favor of larger sample sizes given a fixed budget, could have contributed to more external acceptance. More attention up-front to reducing the potentially deleterious effects of attrition in longitudinal surveys, such as through the use of monetary incentives, might have been worthwhile. Given the problems encountered by the Census Bureau in producing the SPD, I argue that ongoing multi-purpose longitudinal surveys like the Survey of Income and Program Participation are potentially more valuable than episodic special-purpose surveys.
View Full
Paper PDF
-
Evaluation And Use Of The Pollution Abatement Costs And Expenditures Survey Micro Data
January 1996
Working Paper Number:
CES-96-01
The Pollution Abatement Costs and Expenditures Survey (PACE) is an annual survey of manufacturing establishment=s operating costs and capital investment expenditures for pollution abatement purposes. This paper provides a description and evaluation of the PACE micro data available at the Center for Economic Studies (CES). The paper provides an overview of the survey, how the sample is drawn, how the survey questionnaire has changed over time, an assessment of the data quality, and suggestions for the use of the data, as well as its limitations. Also included are suggestions for modifying the survey design and data processing procedures. The PACE data series, linked to the economic data in CES= Longitudinal Research Database (LRD), covers the years 1979-1993, excluding 1983 and 1987.
View Full
Paper PDF
-
The Longitudinal Research Database (LRD): Status And Research Possibilities
July 1988
Working Paper Number:
CES-88-02
This paper discusses the development and use of the Longitudinal Research Data available at the Center for Economic Studies of the Bureau of the Census in terms of what has been accomplished thus far, what projects are currently in progress, and what plans are in place for the near future. The major achievement to date is the construction of the database itself, which contains data for manufacturing establishments collected by the Census in 1963, 1967, 1972, 1977 and 1982, and the Annual Survey of Manufactures for non-Census years from 1973 to 1985. These data now reside in the Center's computer in a consistent format across all years. In addition, a large software development task that greatly simplifies the task of selecting subsets of the database for specific research projects is well underway. Finally, a number of powerful microcomputers have been purchased for use by researchers for their statistical analysis. Current efforts underway at the Center include research on such policy-relevant issues as mergers and their impact on profits and production, high technology trade, import competition, plant level productivity, entry and exit, and productivity differences between large and small firms. Due to the confidentiality requirements of the Census data, most of their research is performed by Center staff and Special Sworn Employees. Under certain circumstances, the Center accepts user-written programs from outside researchers. These routines are executed by Center staff, and the resultant output is reviewed thoroughly for disclosure problems. The Center is also an active member of a task force working on methods on release "masked" or "cloned" microdata in public-use files that will protect the confidentiality of the data while at the same time provide a research tool for outside users. The Center research program contributes directly to future research possibilities. The current batch of research projects is adding insight into the nature of the LRD database. This information is continually being incorporated into the Center's software system, thus facilitating yet more research activity. Moreover, since a good portion of the research involves linking the Longitudinal Research Data to other data files, such as the NSF/Census R&D data, the scope of the databases is continually being expanded. Furthermore, the Center is exploring the possibility of linking the demographic data collected by the Census Bureau to the LRD database.
View Full
Paper PDF
-
LOOKING BACK ON THREE YEARS OF USING THE SYNTHETIC LBD BETA
February 2014
Working Paper Number:
CES-14-11
Distributions of business data are typically much more skewed than those for household or individual data and public knowledge of the underlying units is greater. As a results, national statistical offices (NSOs) rarely release establishment or firm-level business microdata due to the risk to respondent confidentiality. One potential approach for overcoming these risks is to release synthetic data where the establishment data are simulated from statistical models designed to mimic the distributions of the real underlying microdata. The US Census Bureau's Center for Economic Studies in collaboration with Duke University, the National Institute of Statistical Sciences, and Cornell University made available a synthetic public use file for the Longitudinal Business Database (LBD) comprising more than 20 million records for all business establishment with paid employees dating back to 1976. The resulting product, dubbed the SynLBD, was released in 2010 and is the first-ever comprehensive business microdata set publicly released in the United States including data on establishments employment and payroll, birth and death years, and industrial classification. This pa- per documents the scope of projects that have requested and used the SynLBD.
View Full
Paper PDF
-
Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Improve the U.S. Statistical System?
January 2017
Authors:
Lars Vilhuber,
John M. Abowd,
Daniel Weinberg,
Jerome P. Reiter,
Matthew D. Shapiro,
Robert F. Belli,
Noel Cressie,
David C. Folch,
Scott H. Holan,
Margaret C. Levenstein,
Kristen M. Olson,
Jolene Smyth,
Leen-Kiat Soh,
Bruce D. Spencer,
Seth E. Spielman,
Christopher K. Wikle
Working Paper Number:
CES-17-59R
The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistical System (FSS), particularly the Census Bureau. The activities to date have covered both fundamental and applied statistical research and have focused at least in part on the training of current and future generations of researchers in skills of relevance to surveys and alternative measurement of economic units, households, and persons. This paper discusses some of the key research findings of the eight nodes, organized into six topics: (1) Improving census and survey data collection methods; (2) Using alternative sources of data; (3) Protecting privacy and confidentiality by improving disclosure avoidance; (4) Using spatial and spatio-temporal statistical modeling to improve estimates; (5) Assessing data cost and quality tradeoffs; and (6) Combining information from multiple sources. It also reports on collaborations across nodes and with federal agencies, new software developed, and educational activities and outcomes. The paper concludes with an evaluation of the ability of the FSS to apply the NCRN's research outcomes and suggests some next steps, as well as the implications of this research-network model for future federal government renewal initiatives.
View Full
Paper PDF
-
Disclosure Limitation and Confidentiality Protection in Linked Data
January 2018
Working Paper Number:
CES-18-07
Confidentiality protection for linked administrative data is a combination of access modalities and statistical disclosure limitation. We review traditional statistical disclosure limitation methods and newer methods based on synthetic data, input noise infusion and formal privacy. We discuss how these methods are integrated with access modalities by providing three detailed examples. The first example is the linkages in the Health and Retirement Study to Social Security Administration data. The second example is the linkage of the Survey of Income and Program Participation to administrative data from the Internal Revenue Service and the Social Security Administration. The third example is the Longitudinal Employer-Household Dynamics data, which links state unemployment insurance records for workers and firms to a wide variety of censuses and surveys at the U.S. Census Bureau. For examples, we discuss access modalities, disclosure limitation methods, the effectiveness of those methods, and the resulting analytical validity. The final sections discuss recent advances in access modalities for linked administrative data.
View Full
Paper PDF