This paper describes a novel database and an associated suicide event prediction model that surmount longstanding barriers in suicide risk factor research. The database comingles person-level records from the National Violent Death Reporting System (NVDRS) and the American Community Survey (ACS) to establish a case-control study sample that includes all identified suicide cases, while faithfully reflecting general population sociodemographics, in sixteen USA states during the years 2005 2011. It supports a statistical model of individual suicide risk that accommodates person-level factors and the moderation of these factors by their community rates. Named the United States Multi-Level Suicide Data Set (US-MSDS), the database was developed outside the RDC laboratory using publicly available ACS microdata, and reconstructed inside the laboratory using restricted access ACS microdata. Analyses of the latter version yielded findings that largely amplified but also extended those obtained from analyses of the former. This experience shows that the analytic precision achievable using restricted access ACS data can play an important role in conducting social research, although it also indicates that publicly available ACS data have considerable value in conducting preliminary analyses and preparing to use an RDC laboratory. The database development strategy may interest scientists investigating sociodemographic risk factors for other types of low-frequency mortality.
-
Mortality in a Multi-State Cohort of Former State Prisoners, 2010-2015
February 2022
Working Paper Number:
CES-22-06
Previous studies report that individuals who have been imprisoned have higher mortality rates than their demographic counterparts in the general population, particularly non-Hispanic white former prisoners. Most of these studies have been based on a single state's prison system, and the extent to which their findings can be generalized has not been established. In this study we explore the role that race/Hispanic origin, other demographic characteristics, and custodial/ criminal history factors have on post-release mortality, including on the timing of deaths. We also assess whether conditional release to community supervision or reimprisonment may explain the higher post-release mortality found among non-Hispanic whites. In the second part of the analysis, we estimate standardized mortality ratios (SMRs) by sex, age group, and race/Hispanic origin using as reference the U.S. general population. The data come from state prison releases from the Bureau of Justice Statistics' (BJS) National Corrections Reporting Program (NCRP). The NCRP records were linked to the Census Numident to identify deaths occurring within five years from prison release. We also linked NCRP records to previous decennial censuses and survey responses to obtain self-reported race and Hispanic origin if available. We found that non-Hispanic white former prisoners were more likely to die within five years after prison release and more likely to die in the initial weeks after release compared to racial minorities and Hispanics. Reimprisonment, age at release, and a history of multiple prison terms had a similar influence on the odds of dying across all race/Hispanic origin groups. Other factors, such as the type of release and the duration of the last term in prison, were associated with higher risks of mortality for some groups but not for others.
View Full
Paper PDF
-
Who are the people in my neighborhood? The 'contextual fallacy' of measuring individual context with census geographies
February 2018
Working Paper Number:
CES-18-11
Scholars deploy census-based measures of neighborhood context throughout the social sciences and epidemiology. Decades of research confirm that variation in how individuals are aggregated into geographic units to create variables that control for social, economic or political contexts can dramatically alter analyses. While most researchers are aware of the problem, they have lacked the tools to determine its magnitude in the literature and in their own projects. By using confidential access to the complete 2010 U.S. Decennial Census, we are able to construct'for all persons in the US'individual-specific contexts, which we group according to the Census-assigned block, block group, and tract. We compare these individual-specific measures to the published statistics at each scale, and we then determine the magnitude of variation in context for an individual with respect to the published measures using a simple statistic, the standard deviation of individual context (SDIC). For three key measures (percent Black, percent Hispanic, and Entropy'a measure of ethno-racial diversity), we find that block-level Census statistics frequently do not capture the actual context of individuals within them. More problematic, we uncover systematic spatial patterns in the contextual variables at all three scales. Finally, we show that within-unit variation is greater in some parts of the country than in others. We publish county-level estimates of the SDIC statistics that enable scholars to assess whether mis-specification in context variables is likely to alter analytic findings when measured at any of the three common Census units.
View Full
Paper PDF
-
A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report
December 2023
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. RodrÃguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-23-63R
For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level'individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all of the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act. You are reading the full technical report. For the summary paper see https://doi.org/10.1162/99608f92.4a1ebf70.
View Full
Paper PDF
-
A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census
August 2025
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Michael B. Hawes,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. RodrÃguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-25-57
For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level'individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all of the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act.
View Full
Paper PDF
-
Leaving Home: Modeling the Effect of Civic and Economic Structure on Individual Migration Patterns
June 2002
Working Paper Number:
CES-02-16
This research analyzes the effect of community structure upon individuals' probabilities of moving between 1985 and 1990. Using the full Census sample long form microdata for 1990, we re-allocate adult persons in 1990 to their 1985 county of residence. Then, using origin county macro-structural variables (derived from the Economic Census microdata) and individual characteristics (from Decennial Census microdata), we develop a two level hierarchical linear model. In level 1, we construct a logistic equation modeling individual probabilities of moving. In level 2, we model the contextual effects of origin community structure on these models. These contextual effects fall into two categories: 1) economic conditions that comprise the usual aggregate 'push' factors and 2) civic community factors that act to retain people in their community. Results specify the relationship between community context and individual migration patterns, and demonstrate effects of local economic structure and local civic structure on these individual probabilities. Most notably, we find that civic attributes of communities are associated with a propensity to stay in place, net of community economic factors and individual characteristics.
View Full
Paper PDF
-
RESIDENTIAL MOBILITY ACROSS LOCAL AREAS IN THE UNITED STATES AND THE GEOGRAPHIC DISTRIBUTION OF THE HEALTHY POPULATION
February 2014
Working Paper Number:
CES-14-14
Determining whether population dynamics provide competing explanations to place effects for observed geographic patterns of population health is critical for understanding health inequality. We focus on the working-age population where health disparities are greatest and analyze detailed data on residential mobility collected for the first time in the 2000 US census. Residential mobility over a 5-year period is frequent and selective, with some variation by race and gender. Even so, we find little evidence that mobility biases cross-sectional snapshots of local population health. Areas undergoing large or rapid population growth or decline may be exceptions. Overall, place of residence is an important health indicator; yet, the frequency of residential mobility raises questions of interpretation from etiological or policy perspectives, complicating simple understandings that residential exposures alone explain the association between place and health. Psychosocial stressors related to contingencies of social identity associated with being black, urban, or poor in the U.S. may also have adverse health impacts that track with structural location even with movement across residential areas.
View Full
Paper PDF
-
Examining Racial Identity Responses Among People with Middle Eastern and North African Ancestry in the American Community Survey
March 2024
Working Paper Number:
CES-24-14
People with Middle Eastern and North African (MENA) backgrounds living in the United States are defined and classified as White by current Federal standards for race and ethnicity, yet many MENA people do not identify as White in surveys, such as those conducted by the U.S. Census Bureau. Instead, they often select 'Some Other Race', if it is provided, and write in MENA responses such as Arab, Iranian, or Middle Eastern. In processing survey data for public release, the Census Bureau classifies these responses as White in accordance with Federal guidance set by the U.S. Office of Management and Budget. Research that uses these edited public data relies on limited information on MENA people's racial identification. To address this limitation, we obtained unedited race responses in the nationally representative American Community Survey from 2005-2019 to better understand how people of MENA ancestry report their race. We also use these data to compare the demographic, cultural, socioeconomic, and contextual characteristics of MENA individuals who identify as White versus those who do not identify as White. We find that one in four MENA people do not select White alone as their racial identity, despite official guidance that defines 'White' as people having origins in any of the original peoples of Europe, the Middle East, or North Africa. A variety of individual and contextual factors are associated with this choice, and some of these factors operate differently for U.S.-born and foreign-born MENA people living in the United States.
View Full
Paper PDF
-
Gradient Boosting to Address Statistical Problems Arising from Non-Linkage of Census Bureau Datasets
June 2024
Working Paper Number:
CES-24-27
This article introduces the twangRDC package, which contains functions to address non-linkage in US Census Bureau datasets. The Census Bureau's Person Identification Validation System facilitates data linkage by assigning unique person identifiers to federal, third party, decennial census, and survey data. Not all records in these datasets can be linked to the reference file and as such not all records will be assigned an identifier. This article is a tutorial for using the twangRDC to generate nonresponse weights to account for non-linkage of person records across US Census Bureau datasets.
View Full
Paper PDF
-
An In-Depth Examination of Requirements for Disclosure Risk Assessment
October 2023
Authors:
Ron Jarmin,
John M. Abowd,
Ian M. Schmutte,
Jerome P. Reiter,
Nathan Goldschlag,
Victoria A. Velkoff,
Michael B. Hawes,
Robert Ashmead,
Ryan Cumings-Menon,
Sallie Ann Keller,
Daniel Kifer,
Philip Leclerc,
Rolando A. RodrÃguez,
Pavel Zhuravlev
Working Paper Number:
CES-23-49
The use of formal privacy to protect the confidentiality of responses in the 2020 Decennial Census of Population and Housing has triggered renewed interest and debate over how to measure the disclosure risks and societal benefits of the published data products. Following long-established precedent in economics and statistics, we argue that any proposal for quantifying disclosure risk should be based on pre-specified, objective criteria. Such criteria should be used to compare methodologies to identify those with the most desirable properties. We illustrate this approach, using simple desiderata, to evaluate the absolute disclosure risk framework, the counterfactual framework underlying differential privacy, and prior-to-posterior comparisons. We conclude that satisfying all the desiderata is impossible, but counterfactual comparisons satisfy the most while absolute disclosure risk satisfies the fewest. Furthermore, we explain that many of the criticisms levied against differential privacy would be levied against any technology that is not equivalent to direct, unrestricted access to confidential data. Thus, more research is needed, but in the near-term, the counterfactual approach appears best-suited for privacy-utility analysis.
View Full
Paper PDF
-
The Case of the Missing Ethnicity: Indians without Tribes in the 21st Century
June 2011
Working Paper Number:
CES-11-17
Among American Indians and Alaska Natives, most aspects of ethnicity are tightly associated with the person's tribal origins. Language, history, foods, land, and traditions differ among the hundreds of tribes indigenous to the United States. Why did almost one million of them fail to respond to the tribal affiliation part of the Census 2000 race question? We investigate four hypotheses about why one-third of multiracial American Indians and one-sixth of single-race American Indians did not report a tribe: (1) survey item non-response which undermines all fillin- the-blank questions, (2) a non-salient tribal identity, (3) a genealogy-based affiliation, and (4) mestizo identity which does not require a tribe. We use multivariate logistic regression models and high-density restricted-use Census 2000 data. We find support for the first two hypotheses and note that the predictors and results differ substantially for single race versus multiple race American Indians.
View Full
Paper PDF