CREAT - Census Bureau

Disclosure Limitation and Confidentiality Protection in Linked Data

January 2018

Written by: Lars Vilhuber, John M. Abowd, Ian M. Schmutte

Working Paper Number:

CES-18-07

Abstract

Confidentiality protection for linked administrative data is a combination of access modalities and statistical disclosure limitation. We review traditional statistical disclosure limitation methods and newer methods based on synthetic data, input noise infusion and formal privacy. We discuss how these methods are integrated with access modalities by providing three detailed examples. The first example is the linkages in the Health and Retirement Study to Social Security Administration data. The second example is the linkage of the Survey of Income and Program Participation to administrative data from the Internal Revenue Service and the Social Security Administration. The third example is the Longitudinal Employer-Household Dynamics data, which links state unemployment insurance records for workers and firms to a wide variety of censuses and surveys at the U.S. Census Bureau. For examples, we discuss access modalities, disclosure limitation methods, the effectiveness of those methods, and the resulting analytical validity. The final sections discuss recent advances in access modalities for linked administrative data.

Document Tags and Keywords

Keywords:

data, statistical, database, census data, microdata, survey, disclosure, agency, respondent, confidentiality, linked census, information, statistician, privacy, record, employment statistics, irs, ssa, filing, employee data, datasets, statistical disclosure

Tags:

American Economic Association, Internal Revenue Service, Standard Industrial Classification, Bureau of Labor Statistics, Social Security Administration, Service Annual Survey, American Statistical Association, National Science Foundation, Center for Economic Studies, Stern School of Business, County Business Patterns, Federal Reserve Bank, Statistics Canada, National Longitudinal Survey of Youth, Department of Economics, Chicago Census Research Data Center, Survey of Income and Program Participation, Cornell University, Social Security, Unemployment Insurance, Research Data Center, North American Industry Classification System, American Community Survey, Social Security Number, Health and Retirement Study, National Institute on Aging, Alfred P Sloan Foundation, Longitudinal Employer Household Dynamics, Detailed Earnings Records, Summary Earnings Records, Federal Insurance Contribution Act, Sloan Foundation, National Center for Health Statistics, Quarterly Workforce Indicators, Medicaid Services, Centers for Medicare, European Union, National Institutes of Health, Quarterly Census of Employment and Wages, University of Michigan, Census Bureau Disclosure Review Board, Commodity Flow Survey, United Nations, Disclosure Review Board, Business Dynamics Statistics, Federal Statistical Research Data Center, LODES

Similar Working Papers

The 10 most similar working papers to the working paper 'Disclosure Limitation and Confidentiality Protection in Linked Data' are listed below in order of similarity.

Working Paper
🔥

Access Methods for United States Microdata

August 2007

Authors: John M. Abowd, Daniel Weinberg, Sandra Rowland, Philip Steel, Laura Zayatz

Working Paper Number:

CES-07-25

Beyond the traditional methods of tabulations and public-use microdata samples, statistical agencies have developed four key alternatives for providing non-government researchers with access to confidential microdata to improve statistical modeling. The first, licensing, allows qualified researchers access to confidential microdata at their own facilities, provided certain security requirements are met. The second, statistical data enclaves, offer qualified researchers restricted access to confidential economic and demographic data at specific agency-controlled locations. Third, statistical agencies can offer remote access, through a computer interface, to the confidential data under automated or manual controls. Fourth, synthetic data developed from the original data but retaining the correlations in the original data have the potential for allowing a wide range of analyses.
View Full Paper PDF
Working Paper
🔥

Resolving the Tension Between Access and Confidentiality: Past Experience and Future Plans at the U.S. Census Bureau

September 2009

Authors: Ron Jarmin, Lucia Foster, Lynn Riggs

Working Paper Number:

CES-09-33

This paper provides an historical context for access to U.S. Federal statistical data with a primary focus on the U.S. Census Bureau. We review the various modes used by the Census Bureau to make data available to users, and highlight the costs and benefits associated with each. We highlight some of the specific improvements underway or under consideration at the Census Bureau to better serve its data users, as well as discuss the broad strategies employed by statistical agencies to respond to the challenges of data access.
View Full Paper PDF
Working Paper
🔥

An In-Depth Examination of Requirements for Disclosure Risk Assessment

October 2023

Authors: Ron Jarmin, John M. Abowd, Ian M. Schmutte, Jerome P. Reiter, Nathan Goldschlag, Victoria A. Velkoff, Michael B. Hawes, Robert Ashmead, Ryan Cumings-Menon, Sallie Ann Keller, Daniel Kifer, Philip Leclerc, Rolando A. Rodríguez, Pavel Zhuravlev

Working Paper Number:

CES-23-49

The use of formal privacy to protect the confidentiality of responses in the 2020 Decennial Census of Population and Housing has triggered renewed interest and debate over how to measure the disclosure risks and societal benefits of the published data products. Following long-established precedent in economics and statistics, we argue that any proposal for quantifying disclosure risk should be based on pre-specified, objective criteria. Such criteria should be used to compare methodologies to identify those with the most desirable properties. We illustrate this approach, using simple desiderata, to evaluate the absolute disclosure risk framework, the counterfactual framework underlying differential privacy, and prior-to-posterior comparisons. We conclude that satisfying all the desiderata is impossible, but counterfactual comparisons satisfy the most while absolute disclosure risk satisfies the fewest. Furthermore, we explain that many of the criticisms levied against differential privacy would be levied against any technology that is not equivalent to direct, unrestricted access to confidential data. Thus, more research is needed, but in the near-term, the counterfactual approach appears best-suited for privacy-utility analysis.
View Full Paper PDF
Working Paper
🔥

Dynamically Consistent Noise Infusion and Partially Synthetic Data as Confidentiality Protection Measures for Related Time Series

July 2012

Authors: Lars Vilhuber, John M. Abowd, Kevin L. McKinney, Bryce Stephens, Simon Woodcock, Kaj Gittings

Working Paper Number:

CES-12-13

The Census Bureau's Quarterly Workforce Indicators (QWI) provide detailed quarterly statistics on employment measures such as worker and job flows, tabulated by worker characteristics in various combinations. The data are released for several levels of NAICS industries and geography, the lowest aggregation of the latter being counties. Disclosure avoidance methods are required to protect the information about individuals and businesses that contribute to the underlying data. The QWI disclosure avoidance mechanism we describe here relies heavily on the use of noise infusion through a permanent multiplicative noise distortion factor, used for magnitudes, counts, differences and ratios. There is minimal suppression and no complementary suppressions. To our knowledge, the release in 2003 of the QWI was the first large-scale use of noise infusion in any official statistical product. We show that the released statistics are analytically valid along several critical dimensions { measures are unbiased and time series properties are preserved. We provide an analysis of the degree to which confidentiality is protected. Furthermore, we show how the judicious use of synthetic data, injected into the tabulation process, can completely eliminate suppressions, maintain analytical validity, and increase the protection of the underlying confidential data.
View Full Paper PDF
Working Paper
🔥

Why the Economics Profession Must Actively Participate in the Privacy Protection Debate

March 2019

Authors: Lars Vilhuber, John M. Abowd, Ian M. Schmutte, William N. Sexton

Working Paper Number:

CES-19-09

When Google or the U.S. Census Bureau publish detailed statistics on browsing habits or neighborhood characteristics, some privacy is lost for everybody while supplying public information. To date, economists have not focused on the privacy loss inherent in data publication. In their stead, these issues have been advanced almost exclusively by computer scientists who are primarily interested in technical problems associated with protecting privacy. Economists should join the discussion, first, to determine where to balance privacy protection against data quality; a social choice problem. Furthermore, economists must ensure new privacy models preserve the validity of public data for economic research.
View Full Paper PDF
Working Paper

Synthetic Data and Confidentiality Protection

September 2003

Authors: Julia I. Lane, John M. Abowd

Working Paper Number:

tp-2003-10

View Full Paper PDF
Working Paper

Confidentiality Protection in the Census Bureau Quarterly Workforce Indicators

February 2006

Authors: Lars Vilhuber, John M. Abowd, Bryce Stephens

Working Paper Number:

tp-2006-02

The QuarterlyWorkforce Indicators are new estimates developed by the Census Bureau's Longitudinal Employer-Household Dynamics Program as a part of its Local Employment Dynamics partnership with 37 state Labor Market Information offices. These data provide detailed quarterly statistics on employment, accessions, layoffs, hires, separations, full-quarter employment (and related flows), job creations, job destructions, and earnings (for flow and stock categories of workers). The data are released for NAICS industries (and 4-digit SICs) at the county, workforce investment board, and metropolitan area levels of geography. The confidential microdata - unemployment insurance wage records, ES-202 establishment employment, and Title 13 demographic and economic information - are protected using a permanent multiplicative noise distortion factor. This factor distorts all input sums, counts, differences and ratios. The released statistics are analytically valid - measures are unbiased and time series properties are preserved. The confidentiality protection is manifested in the release of some statistics that are flagged as "significantly distorted to preserve confidentiality." These statistics differ from the undistorted statistics by a significant proportion. Even for the significantly distorted statistics, the data remain analytically valid for time series properties. The released data can be aggregated; however, published aggregates are less distorted than custom postrelease aggregates. In addition to the multiplicative noise distortion, confidentiality protection is provided by the estimation process for the QWIs, which multiply imputes all missing data (including missing establishment, given UI account, in the UI wage record data) and dynamically re-weights the establishment data to provide state-level comparability with the BLS's Quarterly Census of Employment and Wages.
View Full Paper PDF
Working Paper

SYNTHETIC DATA FOR SMALL AREA ESTIMATION IN THE AMERICAN COMMUNITY SURVEY

April 2013

Authors: Joseph W. Sakshaug, Trivellore Raghunathan

Working Paper Number:

CES-13-19

Small area estimates provide a critical source of information used to study local populations. Statistical agencies regularly collect data from small areas but are prevented from releasing detailed geographical identifiers in public-use data sets due to disclosure concerns. Alternative data dissemination methods used in practice include releasing summary/aggregate tables, suppressing detailed geographic information in public-use data sets, and accessing restricted data via Research Data Centers. This research examines an alternative method for disseminating microdata that contains more geographical details than are currently being released in public-use data files. Specifically, the method replaces the observed survey values with imputed, or synthetic, values simulated from a hierarchical Bayesian model. Confidentiality protection is enhanced because no actual values are released. The method is demonstrated using restricted data from the 2005-2009 American Community Survey. The analytic validity of the synthetic data is assessed by comparing small area estimates obtained from the synthetic data with those obtained from the observed data.
View Full Paper PDF
Working Paper

New Approaches to Confidentiality Protection Synthetic Data, Remote Access and Research Data Centers

June 2004

Authors: Julia I. Lane, John M. Abowd

Working Paper Number:

tp-2004-03

View Full Paper PDF
Working Paper

Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Improve the U.S. Statistical System?

January 2017

Authors: Lars Vilhuber, John M. Abowd, Daniel Weinberg, Jerome P. Reiter, Matthew D. Shapiro, Robert F. Belli, Noel Cressie, David C. Folch, Scott H. Holan, Margaret C. Levenstein, Kristen M. Olson, Jolene Smyth, Leen-Kiat Soh, Bruce D. Spencer, Seth E. Spielman, Christopher K. Wikle

Working Paper Number:

CES-17-59R

The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistical System (FSS), particularly the Census Bureau. The activities to date have covered both fundamental and applied statistical research and have focused at least in part on the training of current and future generations of researchers in skills of relevance to surveys and alternative measurement of economic units, households, and persons. This paper discusses some of the key research findings of the eight nodes, organized into six topics: (1) Improving census and survey data collection methods; (2) Using alternative sources of data; (3) Protecting privacy and confidentiality by improving disclosure avoidance; (4) Using spatial and spatio-temporal statistical modeling to improve estimates; (5) Assessing data cost and quality tradeoffs; and (6) Combining information from multiple sources. It also reports on collaborations across nodes and with federal agencies, new software developed, and educational activities and outcomes. The paper concludes with an evaluation of the ability of the FSS to apply the NCRN's research outcomes and suggests some next steps, as well as the implications of this research-network model for future federal government renewal initiatives.
View Full Paper PDF

Disclosure Limitation and Confidentiality Protection in Linked Data

January 2018

Working Paper Number:

CES-18-07

Abstract

Document Tags and Keywords

The 10 most similar working papers to the working paper 'Disclosure Limitation and Confidentiality Protection in Linked Data' are listed below in order of similarity.

August 2007

Working Paper Number:

CES-07-25

September 2009

Working Paper Number:

CES-09-33

October 2023

Working Paper Number:

CES-23-49

July 2012

Working Paper Number:

CES-12-13

March 2019

Working Paper Number:

CES-19-09

September 2003

Working Paper Number:

tp-2003-10

February 2006

Working Paper Number:

tp-2006-02

April 2013

Working Paper Number:

CES-13-19

June 2004

Working Paper Number:

tp-2004-03

January 2017

Working Paper Number:

CES-17-59R