Using only 34 published tables, we reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records. Using the 38-bin age variable tabulated at the census block level, at most 20.1% of reconstructed records can differ from their confidential source on even a single value for these five variables. Using only published data, an attacker can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. The tabular publications in Summary File 1 thus have prohibited disclosure risk similar to the unreleased confidential microdata. Reidentification studies confirm that an attacker can, within blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with nonmodal characteristics) with 95% accuracy, the same precision as the confidential data achieve and far greater than statistical baselines. The flaw in the 2010 Census framework was the assumption that aggregation prevented accurate microdata reconstruction, justifying weaker disclosure limitation methods than were applied to 2010 Census public microdata. The framework used for 2020 Census publications defends against attacks that are based on reconstruction, as we also demonstrate here. Finally, we show that alternatives to the 2020 Census Disclosure Avoidance System with similar accuracy (enhanced swapping) also fail to protect confidentiality, and those that partially defend against reconstruction attacks (incomplete suppression implementations) destroy the primary statutory use case: data for redistricting all legislatures in the country in compliance with the 1965 Voting Rights Act.
Document Tags and Keywords
KeywordsKeywords are automatically generated using KeyBERT, a powerful and innovative
keyword extraction tool that utilizes BERT embeddings to ensure high-quality and contextually relevant
keywords.
By analyzing the content of working papers, KeyBERT identifies terms and phrases that capture the essence of the
text, highlighting the most significant topics and trends. This approach not only enhances searchability but
provides connections that go beyond potentially domain-specific author-defined keywords.
:
The model is able to label words and phrases by part-of-speech,
including "organizations." By filtering for frequent words and phrases labeled as "organizations", papers are
identified to contain references to specific institutions, datasets, and other organizations.
:
Similar Working Papers
Similarity between working papers are determined by an unsupervised neural
network model
know as Doc2Vec.
Doc2Vec is a model that represents entire documents as fixed-length vectors, allowing for the
capture of semantic meaning in a way that relates to the context of words within the document. The model learns to
associate a unique vector with each document while simultaneously learning word vectors, enabling tasks such as
document classification, clustering, and similarity detection by preserving the order and structure of words. The
document vectors are compared using cosine similarity/distance to determine the most similar working papers.
Papers identified with 🔥 are in the top 20% of similarity.
The 10 most similar working papers to the working paper 'The 2010 Census Confidentiality Protections Failed, Here's How and Why'
are listed below in order of similarity.
What is CREAT?
Overview
The Census Research Exploration and Analysis Tool is a data
tool from the Center for Economic Studies (CES) at the US Census Bureau that uses natural
language processing and artificial intelligence tools to analyze, categorize, and sort the
economic research contained in the CES working paper series. The goal of this
project is to help CES researchers, managers, and other internal stakeholders explore
connections among existing research, form new collaborations, and separate research into
discrete topics. Working papers are sortable by author, tag, and keyword. For more
information, see the CREAT one-pager.
Keywords and Tags
Keywords and tags are automatically extracted from the text of the working papers. Keywords are
either one or two words, and are extracted to most closely match the research paper as a whole.
Tags are institutions, datasets, and other proper nouns that occur with relative frequency
among all working papers. Due to the automatic extraction, accuracy is not guaranteed.
Authors
Authors are extracted from the paper and automatically matched using fuzzy matching to ensure
that different forms of the same authors are consolidated, e.g., middle initial, title, etc.