-
Synthetic Data and Confidentiality Protection
September 2003
Working Paper Number:
tp-2003-10
View Full
Paper PDF
-
Integrated Longitudinal Employee-Employer Data for the United States
May 2004
Working Paper Number:
tp-2004-02
View Full
Paper PDF
-
Resolving the Tension Between Access and Confidentiality: Past Experience and Future Plans at the U.S. Census Bureau
September 2009
Working Paper Number:
CES-09-33
This paper provides an historical context for access to U.S. Federal statistical data with a primary focus on the U.S. Census Bureau. We review the various modes used by the Census Bureau to make data available to users, and highlight the costs and benefits associated with each. We highlight some of the specific improvements underway or under consideration at the Census Bureau to better serve its data users, as well as discuss the broad strategies employed by statistical agencies to respond to the challenges of data access.
View Full
Paper PDF
-
Disclosure Limitation and Confidentiality Protection in Linked Data
January 2018
Working Paper Number:
CES-18-07
Confidentiality protection for linked administrative data is a combination of access modalities and statistical disclosure limitation. We review traditional statistical disclosure limitation methods and newer methods based on synthetic data, input noise infusion and formal privacy. We discuss how these methods are integrated with access modalities by providing three detailed examples. The first example is the linkages in the Health and Retirement Study to Social Security Administration data. The second example is the linkage of the Survey of Income and Program Participation to administrative data from the Internal Revenue Service and the Social Security Administration. The third example is the Longitudinal Employer-Household Dynamics data, which links state unemployment insurance records for workers and firms to a wide variety of censuses and surveys at the U.S. Census Bureau. For examples, we discuss access modalities, disclosure limitation methods, the effectiveness of those methods, and the resulting analytical validity. The final sections discuss recent advances in access modalities for linked administrative data.
View Full
Paper PDF
-
Access Methods for United States Microdata
August 2007
Working Paper Number:
CES-07-25
Beyond the traditional methods of tabulations and public-use microdata samples, statistical agencies have developed four key alternatives for providing non-government researchers with access to confidential microdata to improve statistical modeling. The first, licensing, allows qualified researchers access to confidential microdata at their own facilities, provided certain security requirements are met. The second, statistical data enclaves, offer qualified researchers restricted access to confidential economic and demographic data at specific agency-controlled locations. Third, statistical agencies can offer remote access, through a computer interface, to the confidential data under automated or manual controls. Fourth, synthetic data developed from the original data but retaining the correlations in the original data have the potential for allowing a wide range of analyses.
View Full
Paper PDF
-
Public Use Microdata: Disclosure And Usefulness
September 1988
Working Paper Number:
CES-88-03
Official statistical agencies such as the Census Bureau and the Bureau of Labor Statistics collect enormous quantities of microdata in statistical surveys. These data are valuable for economic research and market and policy analysis. However, the data cannot be released to the public because of confidentiality commitments to individual respondents. These commitments, coupled with the strong research demand for microdata, have led the agencies to consider various proposals for releasing public use microdata. Most proposals for public use microdata call for the development of surrogate data that disguise the original data. Thus, they involve the addition of measurement errors to the data. In this paper, we examine disclosure issues and explore alternative masking methods for generating panels of useful economic microdata that can be released to researchers. While our analysis applies to all confidential microdata, applications using the Census Bureau's Longitudinal Research Data Base (LRD) are used for illustrative purposes throughout the discussion.
View Full
Paper PDF
-
An In-Depth Examination of Requirements for Disclosure Risk Assessment
October 2023
Authors:
Ron Jarmin,
John M. Abowd,
Ian M. Schmutte,
Jerome P. Reiter,
Nathan Goldschlag,
Victoria A. Velkoff,
Michael B. Hawes,
Robert Ashmead,
Ryan Cumings-Menon,
Sallie Ann Keller,
Daniel Kifer,
Philip Leclerc,
Rolando A. RodrÃguez,
Pavel Zhuravlev
Working Paper Number:
CES-23-49
The use of formal privacy to protect the confidentiality of responses in the 2020 Decennial Census of Population and Housing has triggered renewed interest and debate over how to measure the disclosure risks and societal benefits of the published data products. Following long-established precedent in economics and statistics, we argue that any proposal for quantifying disclosure risk should be based on pre-specified, objective criteria. Such criteria should be used to compare methodologies to identify those with the most desirable properties. We illustrate this approach, using simple desiderata, to evaluate the absolute disclosure risk framework, the counterfactual framework underlying differential privacy, and prior-to-posterior comparisons. We conclude that satisfying all the desiderata is impossible, but counterfactual comparisons satisfy the most while absolute disclosure risk satisfies the fewest. Furthermore, we explain that many of the criticisms levied against differential privacy would be levied against any technology that is not equivalent to direct, unrestricted access to confidential data. Thus, more research is needed, but in the near-term, the counterfactual approach appears best-suited for privacy-utility analysis.
View Full
Paper PDF
-
The Longitudinal Research Database (LRD): Status And Research Possibilities
July 1988
Working Paper Number:
CES-88-02
This paper discusses the development and use of the Longitudinal Research Data available at the Center for Economic Studies of the Bureau of the Census in terms of what has been accomplished thus far, what projects are currently in progress, and what plans are in place for the near future. The major achievement to date is the construction of the database itself, which contains data for manufacturing establishments collected by the Census in 1963, 1967, 1972, 1977 and 1982, and the Annual Survey of Manufactures for non-Census years from 1973 to 1985. These data now reside in the Center's computer in a consistent format across all years. In addition, a large software development task that greatly simplifies the task of selecting subsets of the database for specific research projects is well underway. Finally, a number of powerful microcomputers have been purchased for use by researchers for their statistical analysis. Current efforts underway at the Center include research on such policy-relevant issues as mergers and their impact on profits and production, high technology trade, import competition, plant level productivity, entry and exit, and productivity differences between large and small firms. Due to the confidentiality requirements of the Census data, most of their research is performed by Center staff and Special Sworn Employees. Under certain circumstances, the Center accepts user-written programs from outside researchers. These routines are executed by Center staff, and the resultant output is reviewed thoroughly for disclosure problems. The Center is also an active member of a task force working on methods on release "masked" or "cloned" microdata in public-use files that will protect the confidentiality of the data while at the same time provide a research tool for outside users. The Center research program contributes directly to future research possibilities. The current batch of research projects is adding insight into the nature of the LRD database. This information is continually being incorporated into the Center's software system, thus facilitating yet more research activity. Moreover, since a good portion of the research involves linking the Longitudinal Research Data to other data files, such as the NSF/Census R&D data, the scope of the databases is continually being expanded. Furthermore, the Center is exploring the possibility of linking the demographic data collected by the Census Bureau to the LRD database.
View Full
Paper PDF
-
Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Improve the U.S. Statistical System?
January 2017
Authors:
Lars Vilhuber,
John M. Abowd,
Daniel Weinberg,
Jerome P. Reiter,
Matthew D. Shapiro,
Robert F. Belli,
Noel Cressie,
David C. Folch,
Scott H. Holan,
Margaret C. Levenstein,
Kristen M. Olson,
Jolene Smyth,
Leen-Kiat Soh,
Bruce D. Spencer,
Seth E. Spielman,
Christopher K. Wikle
Working Paper Number:
CES-17-59R
The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistical System (FSS), particularly the Census Bureau. The activities to date have covered both fundamental and applied statistical research and have focused at least in part on the training of current and future generations of researchers in skills of relevance to surveys and alternative measurement of economic units, households, and persons. This paper discusses some of the key research findings of the eight nodes, organized into six topics: (1) Improving census and survey data collection methods; (2) Using alternative sources of data; (3) Protecting privacy and confidentiality by improving disclosure avoidance; (4) Using spatial and spatio-temporal statistical modeling to improve estimates; (5) Assessing data cost and quality tradeoffs; and (6) Combining information from multiple sources. It also reports on collaborations across nodes and with federal agencies, new software developed, and educational activities and outcomes. The paper concludes with an evaluation of the ability of the FSS to apply the NCRN's research outcomes and suggests some next steps, as well as the implications of this research-network model for future federal government renewal initiatives.
View Full
Paper PDF
-
EXPANDING THE ROLE OF SYNTHETIC DATA AT THE U.S. CENSUS BUREAU
February 2014
Working Paper Number:
CES-14-10
National Statistical offices (NSOs) create official statistics from data collected from survey respondents, government administrative records and other sources. The raw source data is usually considered to be confidential. In the case of the U.S. Census Bureau, confidentiality of survey and administrative records microdata is mandated by statute, and this mandate to protect confidentiality is often at odds with the needs of users to extract as much information from the data as possible. Traditional disclosure protection techniques result in official data products that do not fully utilize the information content of the underlying microdata. Typically, these products take the form of simple aggregate tabulations. In a few cases anonymized public- use micro samples are made available, but these face a growing risk of re-identification by the increasing amounts of information about individuals and firms available in the public domain. One approach for overcoming these risks is to release products based on synthetic data where values are simulated from statistical models designed to mimic the (joint) distributions of the underlying microdata. We discuss re- cent Census Bureau work to develop and deploy such products. We discuss the benefits and challenges involved with extending the scope of synthetic data products in official statistics.
View Full
Paper PDF