CREAT - Census Bureau

An In-Depth Examination of Requirements for Disclosure Risk Assessment

October 2023

Written by: Ron Jarmin, John M. Abowd, Ian M. Schmutte, Jerome P. Reiter, Nathan Goldschlag, Victoria A. Velkoff, Michael B. Hawes, Robert Ashmead, Ryan Cumings-Menon, Sallie Ann Keller, Daniel Kifer, Philip Leclerc, Rolando A. Rodríguez, Pavel Zhuravlev

Working Paper Number:

CES-23-49

Abstract

The use of formal privacy to protect the confidentiality of responses in the 2020 Decennial Census of Population and Housing has triggered renewed interest and debate over how to measure the disclosure risks and societal benefits of the published data products. Following long-established precedent in economics and statistics, we argue that any proposal for quantifying disclosure risk should be based on pre-specified, objective criteria. Such criteria should be used to compare methodologies to identify those with the most desirable properties. We illustrate this approach, using simple desiderata, to evaluate the absolute disclosure risk framework, the counterfactual framework underlying differential privacy, and prior-to-posterior comparisons. We conclude that satisfying all the desiderata is impossible, but counterfactual comparisons satisfy the most while absolute disclosure risk satisfies the fewest. Furthermore, we explain that many of the criticisms levied against differential privacy would be levied against any technology that is not equivalent to direct, unrestricted access to confidential data. Thus, more research is needed, but in the near-term, the counterfactual approach appears best-suited for privacy-utility analysis.

Document Tags and Keywords

Keywords:

analysis, data, statistical, disclosure, respondent, confidentiality, statistician, privacy, enforcement, statistical disclosure, public, publicly

Tags:

Internal Revenue Service, Stern School of Business, Statistics Canada, Longitudinal Business Database, Decennial Census, Survey of Income and Program Participation, Federal Register, American Community Survey, Agency for Healthcare Research and Quality, 2010 Census, Disclosure Review Board, Census Edited File, Federal Statistical Research Data Center, Opportunity Atlas

Similar Working Papers

The 10 most similar working papers to the working paper 'An In-Depth Examination of Requirements for Disclosure Risk Assessment' are listed below in order of similarity.

Working Paper
🔥

The 2010 Census Confidentiality Protections Failed, Here's How and Why

December 2023

Authors: Lars Vilhuber, John M. Abowd, Ethan Lewis, Nathan Goldschlag, Robert Ashmead, Daniel Kifer, Philip Leclerc, Rolando A. Rodríguez, Tamara Adams, David Darais, Sourya Dey, Simson L. Garfinkel, Scott Moore, Ramy N. Tadros

Working Paper Number:

CES-23-63

Using only 34 published tables, we reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records. Using the 38-bin age variable tabulated at the census block level, at most 20.1% of reconstructed records can differ from their confidential source on even a single value for these five variables. Using only published data, an attacker can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. The tabular publications in Summary File 1 thus have prohibited disclosure risk similar to the unreleased confidential microdata. Reidentification studies confirm that an attacker can, within blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with nonmodal characteristics) with 95% accuracy, the same precision as the confidential data achieve and far greater than statistical baselines. The flaw in the 2010 Census framework was the assumption that aggregation prevented accurate microdata reconstruction, justifying weaker disclosure limitation methods than were applied to 2010 Census public microdata. The framework used for 2020 Census publications defends against attacks that are based on reconstruction, as we also demonstrate here. Finally, we show that alternatives to the 2020 Census Disclosure Avoidance System with similar accuracy (enhanced swapping) also fail to protect confidentiality, and those that partially defend against reconstruction attacks (incomplete suppression implementations) destroy the primary statutory use case: data for redistricting all legislatures in the country in compliance with the 1965 Voting Rights Act.
View Full Paper PDF
Working Paper
🔥

Why the Economics Profession Must Actively Participate in the Privacy Protection Debate

March 2019

Authors: Lars Vilhuber, John M. Abowd, Ian M. Schmutte, William N. Sexton

Working Paper Number:

CES-19-09

When Google or the U.S. Census Bureau publish detailed statistics on browsing habits or neighborhood characteristics, some privacy is lost for everybody while supplying public information. To date, economists have not focused on the privacy loss inherent in data publication. In their stead, these issues have been advanced almost exclusively by computer scientists who are primarily interested in technical problems associated with protecting privacy. Economists should join the discussion, first, to determine where to balance privacy protection against data quality; a social choice problem. Furthermore, economists must ensure new privacy models preserve the validity of public data for economic research.
View Full Paper PDF
Working Paper
🔥

Disclosure Limitation and Confidentiality Protection in Linked Data

January 2018

Authors: Lars Vilhuber, John M. Abowd, Ian M. Schmutte

Working Paper Number:

CES-18-07

Confidentiality protection for linked administrative data is a combination of access modalities and statistical disclosure limitation. We review traditional statistical disclosure limitation methods and newer methods based on synthetic data, input noise infusion and formal privacy. We discuss how these methods are integrated with access modalities by providing three detailed examples. The first example is the linkages in the Health and Retirement Study to Social Security Administration data. The second example is the linkage of the Survey of Income and Program Participation to administrative data from the Internal Revenue Service and the Social Security Administration. The third example is the Longitudinal Employer-Household Dynamics data, which links state unemployment insurance records for workers and firms to a wide variety of censuses and surveys at the U.S. Census Bureau. For examples, we discuss access modalities, disclosure limitation methods, the effectiveness of those methods, and the resulting analytical validity. The final sections discuss recent advances in access modalities for linked administrative data.
View Full Paper PDF
Working Paper
🔥

Access Methods for United States Microdata

August 2007

Authors: John M. Abowd, Daniel Weinberg, Sandra Rowland, Philip Steel, Laura Zayatz

Working Paper Number:

CES-07-25

Beyond the traditional methods of tabulations and public-use microdata samples, statistical agencies have developed four key alternatives for providing non-government researchers with access to confidential microdata to improve statistical modeling. The first, licensing, allows qualified researchers access to confidential microdata at their own facilities, provided certain security requirements are met. The second, statistical data enclaves, offer qualified researchers restricted access to confidential economic and demographic data at specific agency-controlled locations. Third, statistical agencies can offer remote access, through a computer interface, to the confidential data under automated or manual controls. Fourth, synthetic data developed from the original data but retaining the correlations in the original data have the potential for allowing a wide range of analyses.
View Full Paper PDF
Working Paper
🔥

An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices

August 2018

Authors: John M. Abowd, Ian M. Schmutte

Working Paper Number:

CES-18-35

Statistical agencies face a dual mandate to publish accurate statistics while protecting respondent privacy. Increasing privacy protection requires decreased accuracy. Recognizing this as a resource allocation problem, we propose an economic solution: operate where the marginal cost of increasing privacy equals the marginal benefit. Our model of production, from computer science, assumes data are published using an efficient differentially private algorithm. Optimal choice weighs the demand for accurate statistics against the demand for privacy. Examples from U.S. statistical programs show how our framework can guide decision-making. Further progress requires a better understanding of willingness-to-pay for privacy and statistical accuracy.
View Full Paper PDF
Working Paper

Disclosure Avoidance Techniques Used for the 1970 through 2010 Decennial Censuses of Population and Housing

November 2018

Authors: Laura McKenna

Working Paper Number:

CES-18-47

The U.S. Census Bureau conducts the decennial censuses under Title 13 of the U. S. Code with the Section 9 mandate to not 'use the information furnished under the provisions of this title for any purpose other than the statistical purposes for which it is supplied; or make any publication whereby the data furnished by any particular establishment or individual under this title can be identified; or permit anyone other than the sworn officers and employees of the Department or bureau or agency thereof to examine the individual reports (13 U.S.C. ' 9 (2007)).' The Census Bureau applies disclosure avoidance techniques to its publicly released statistical products in order to protect the confidentiality of its respondents and their data.
View Full Paper PDF
Working Paper

Resolving the Tension Between Access and Confidentiality: Past Experience and Future Plans at the U.S. Census Bureau

September 2009

Authors: Ron Jarmin, Lucia Foster, Lynn Riggs

Working Paper Number:

CES-09-33

This paper provides an historical context for access to U.S. Federal statistical data with a primary focus on the U.S. Census Bureau. We review the various modes used by the Census Bureau to make data available to users, and highlight the costs and benefits associated with each. We highlight some of the specific improvements underway or under consideration at the Census Bureau to better serve its data users, as well as discuss the broad strategies employed by statistical agencies to respond to the challenges of data access.
View Full Paper PDF
Working Paper

Releasing Earnings Distributions using Differential Privacy: Disclosure Avoidance System For Post Secondary Employment Outcomes (PSEO)

April 2019

Authors: Kevin L. McKinney, Andrew Foote, Ashwin Machanavajjhala

Working Paper Number:

CES-19-13

The U.S. Census Bureau recently released data on earnings percentiles of graduates from post secondary institutions. This paper describes and evaluates the disclosure avoidance system developed for these statistics. We propose a differentially private algorithm for releasing these data based on standard differentially private building blocks, by constructing a histogram of earnings and the application of the Laplace mechanism to recover a differentially-private CDF of earnings. We demonstrate that our algorithm can release earnings distributions with low error, and our algorithm out-performs prior work based on the concept of smooth sensitivity from Nissim, Raskhodnikova and Smith (2007).
View Full Paper PDF
Working Paper

SYNTHETIC DATA FOR SMALL AREA ESTIMATION IN THE AMERICAN COMMUNITY SURVEY

April 2013

Authors: Joseph W. Sakshaug, Trivellore Raghunathan

Working Paper Number:

CES-13-19

Small area estimates provide a critical source of information used to study local populations. Statistical agencies regularly collect data from small areas but are prevented from releasing detailed geographical identifiers in public-use data sets due to disclosure concerns. Alternative data dissemination methods used in practice include releasing summary/aggregate tables, suppressing detailed geographic information in public-use data sets, and accessing restricted data via Research Data Centers. This research examines an alternative method for disseminating microdata that contains more geographical details than are currently being released in public-use data files. Specifically, the method replaces the observed survey values with imputed, or synthetic, values simulated from a hierarchical Bayesian model. Confidentiality protection is enhanced because no actual values are released. The method is demonstrated using restricted data from the 2005-2009 American Community Survey. The analytic validity of the synthetic data is assessed by comparing small area estimates obtained from the synthetic data with those obtained from the observed data.
View Full Paper PDF
Working Paper

Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods

January 2017

Authors: John M. Abowd, Ian M. Schmutte

Working Paper Number:

CES-17-37

We consider the problem of determining the optimal accuracy of public statistics when increased accuracy requires a loss of privacy. To formalize this allocation problem, we use tools from statistics and computer science to model the publication technology used by a public statistical agency. We derive the demand for accurate statistics from first principles to generate interdependent preferences that account for the public-good nature of both data accuracy and privacy loss. We first show data accuracy is inefficiently undersupplied by a private provider. Solving the appropriate social planner's problem produces an implementable publication strategy. We implement the socially optimal publication plan for statistics on income and health status using data from the American Community Survey, National Health Interview Survey, Federal Statistical System Public Opinion Survey and Cornell National Social Survey. Our analysis indicates that welfare losses from providing too much privacy protection and, therefore, too little accuracy can be substantial.
View Full Paper PDF

An In-Depth Examination of Requirements for Disclosure Risk Assessment

October 2023

Working Paper Number:

CES-23-49

Abstract

Document Tags and Keywords

The 10 most similar working papers to the working paper 'An In-Depth Examination of Requirements for Disclosure Risk Assessment' are listed below in order of similarity.

December 2023

Working Paper Number:

CES-23-63

March 2019

Working Paper Number:

CES-19-09

January 2018

Working Paper Number:

CES-18-07

August 2007

Working Paper Number:

CES-07-25

August 2018

Working Paper Number:

CES-18-35

November 2018

Working Paper Number:

CES-18-47

September 2009

Working Paper Number:

CES-09-33

April 2019

Working Paper Number:

CES-19-13

April 2013

Working Paper Number:

CES-13-19

January 2017

Working Paper Number:

CES-17-37