This paper describes a novel database and an associated suicide event prediction model that surmount longstanding barriers in suicide risk factor research. The database comingles person-level records from the National Violent Death Reporting System (NVDRS) and the American Community Survey (ACS) to establish a case-control study sample that includes all identified suicide cases, while faithfully reflecting general population sociodemographics, in sixteen USA states during the years 2005 2011. It supports a statistical model of individual suicide risk that accommodates person-level factors and the moderation of these factors by their community rates. Named the United States Multi-Level Suicide Data Set (US-MSDS), the database was developed outside the RDC laboratory using publicly available ACS microdata, and reconstructed inside the laboratory using restricted access ACS microdata. Analyses of the latter version yielded findings that largely amplified but also extended those obtained from analyses of the former. This experience shows that the analytic precision achievable using restricted access ACS data can play an important role in conducting social research, although it also indicates that publicly available ACS data have considerable value in conducting preliminary analyses and preparing to use an RDC laboratory. The database development strategy may interest scientists investigating sociodemographic risk factors for other types of low-frequency mortality.
-
Mortality in a Multi-State Cohort of Former State Prisoners, 2010-2015
February 2022
Working Paper Number:
CES-22-06
Previous studies report that individuals who have been imprisoned have higher mortality rates than their demographic counterparts in the general population, particularly non-Hispanic white former prisoners. Most of these studies have been based on a single state's prison system, and the extent to which their findings can be generalized has not been established. In this study we explore the role that race/Hispanic origin, other demographic characteristics, and custodial/ criminal history factors have on post-release mortality, including on the timing of deaths. We also assess whether conditional release to community supervision or reimprisonment may explain the higher post-release mortality found among non-Hispanic whites. In the second part of the analysis, we estimate standardized mortality ratios (SMRs) by sex, age group, and race/Hispanic origin using as reference the U.S. general population. The data come from state prison releases from the Bureau of Justice Statistics' (BJS) National Corrections Reporting Program (NCRP). The NCRP records were linked to the Census Numident to identify deaths occurring within five years from prison release. We also linked NCRP records to previous decennial censuses and survey responses to obtain self-reported race and Hispanic origin if available. We found that non-Hispanic white former prisoners were more likely to die within five years after prison release and more likely to die in the initial weeks after release compared to racial minorities and Hispanics. Reimprisonment, age at release, and a history of multiple prison terms had a similar influence on the odds of dying across all race/Hispanic origin groups. Other factors, such as the type of release and the duration of the last term in prison, were associated with higher risks of mortality for some groups but not for others.
View Full
Paper PDF
-
Leaving Home: Modeling the Effect of Civic and Economic Structure on Individual Migration Patterns
June 2002
Working Paper Number:
CES-02-16
This research analyzes the effect of community structure upon individuals' probabilities of moving between 1985 and 1990. Using the full Census sample long form microdata for 1990, we re-allocate adult persons in 1990 to their 1985 county of residence. Then, using origin county macro-structural variables (derived from the Economic Census microdata) and individual characteristics (from Decennial Census microdata), we develop a two level hierarchical linear model. In level 1, we construct a logistic equation modeling individual probabilities of moving. In level 2, we model the contextual effects of origin community structure on these models. These contextual effects fall into two categories: 1) economic conditions that comprise the usual aggregate 'push' factors and 2) civic community factors that act to retain people in their community. Results specify the relationship between community context and individual migration patterns, and demonstrate effects of local economic structure and local civic structure on these individual probabilities. Most notably, we find that civic attributes of communities are associated with a propensity to stay in place, net of community economic factors and individual characteristics.
View Full
Paper PDF
-
Who are the people in my neighborhood? The 'contextual fallacy' of measuring individual context with census geographies
February 2018
Working Paper Number:
CES-18-11
Scholars deploy census-based measures of neighborhood context throughout the social sciences and epidemiology. Decades of research confirm that variation in how individuals are aggregated into geographic units to create variables that control for social, economic or political contexts can dramatically alter analyses. While most researchers are aware of the problem, they have lacked the tools to determine its magnitude in the literature and in their own projects. By using confidential access to the complete 2010 U.S. Decennial Census, we are able to construct'for all persons in the US'individual-specific contexts, which we group according to the Census-assigned block, block group, and tract. We compare these individual-specific measures to the published statistics at each scale, and we then determine the magnitude of variation in context for an individual with respect to the published measures using a simple statistic, the standard deviation of individual context (SDIC). For three key measures (percent Black, percent Hispanic, and Entropy'a measure of ethno-racial diversity), we find that block-level Census statistics frequently do not capture the actual context of individuals within them. More problematic, we uncover systematic spatial patterns in the contextual variables at all three scales. Finally, we show that within-unit variation is greater in some parts of the country than in others. We publish county-level estimates of the SDIC statistics that enable scholars to assess whether mis-specification in context variables is likely to alter analytic findings when measured at any of the three common Census units.
View Full
Paper PDF
-
Federal-Local Partnerships on Immigration Law Enforcement: Are the Policies Effective in Reducing Violent Victimization?
April 2023
Working Paper Number:
CES-23-18
Our understanding of how immigration enforcement impacts crime has been informed by data from the police crime statistics. This study complements existing research by using longitudinal multilevel data from the National Crime Victimization Survey (NCVS) for 2005-2014 to simultaneously assess the impact of the three predominant immigration policies that have been implemented in local communities. The results indicate that the activation of Secure Communities and 287(g) task force agreements significantly increased violent victimization risk among Latinos, whereas they showed no evident impact on victimization risk among non-Latino Whites and Blacks. The activation of 287(g) jail enforcement agreements and anti-detainer policies had no significant impact on violent victimization risk during the period.Contrary to their stated purpose of enhancing public safety, our results show that the Secure Communities program and 287(g) task force agreements did not reduce crime, but instead eroded security in American communities by increasing the likelihood that Latinos experienced violent victimization. These results support the Federal government's ending of 287(g) task force agreements and its more recent move to end the Secure Communities program. Additionally, the results of our study add to the evidence challenging claims that anti-detainer policies pose a threat to violence risk.
View Full
Paper PDF
-
Shift or replenishment? Reassessing the prospect of stable Spanish bilingualism across contexts of ethnic change
June 2023
Working Paper Number:
CES-23-28
Much of the existing literature on Latinos' use of Spanish claims that a general pattern of intergenerational decline in the use of Spanish will produce an overall shift away from Spanish use in the U.S. (Rumbaut, Massey, and Bean 2006; Veltman 1983b, 1990). In contrast, recent works emphasize the importance of the social and linguistic context in reinforcing the use of Spanish as well as (pan)ethnic identities among U.S.-born Latinos (Linton 2004; Linton and Jim'nez 2009; Stevens 1992). This literature suggests conditions under which Spanish-English bilingualism might become stable at the level of metropolitan areas; however, such conditions depend on how immigration shapes the context of language use for native-born Latinos. Given the declining levels of immigration from Latin America, will bilingualism subside in the U.S., or have certain communities created conditions in which bilingualism can be stable? Using geocoded data from restricted access versions of the Survey of Income and Program Participation (SIPP) and the American Community Survey (ACS), we model the probability of Spanish-English bilingualism among second- and third-generation Latinos using multilevel models with contextual measures of immigration and language use at both the neighborhood and metropolitan levels. We find evidence that U.S.-born Latinos are heavily influenced by the prevalence of Spanish use among U.S. born Latinos at both the metropolitan and neighborhood levels. Further, the proportion of foreign-born Latinos has little effect on the native born, after controlling for Spanish use among U.S,-born Latinos. These results are a first step in understanding the link between ethnic or panethnic contexts and language practices, and also in producing a better characterization of stable bilingualism that can be tested quantitatively.
View Full
Paper PDF
-
The Case of the Missing Ethnicity: Indians without Tribes in the 21st Century
June 2011
Working Paper Number:
CES-11-17
Among American Indians and Alaska Natives, most aspects of ethnicity are tightly associated with the person's tribal origins. Language, history, foods, land, and traditions differ among the hundreds of tribes indigenous to the United States. Why did almost one million of them fail to respond to the tribal affiliation part of the Census 2000 race question? We investigate four hypotheses about why one-third of multiracial American Indians and one-sixth of single-race American Indians did not report a tribe: (1) survey item non-response which undermines all fillin- the-blank questions, (2) a non-salient tribal identity, (3) a genealogy-based affiliation, and (4) mestizo identity which does not require a tribe. We use multivariate logistic regression models and high-density restricted-use Census 2000 data. We find support for the first two hypotheses and note that the predictors and results differ substantially for single race versus multiple race American Indians.
View Full
Paper PDF
-
A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report
December 2023
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. RodrÃguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-23-63R
For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level'individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all of the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act. You are reading the full technical report. For the summary paper see https://doi.org/10.1162/99608f92.4a1ebf70.
View Full
Paper PDF
-
Examining Racial Identity Responses Among People with Middle Eastern and North African Ancestry in the American Community Survey
March 2024
Working Paper Number:
CES-24-14
People with Middle Eastern and North African (MENA) backgrounds living in the United States are defined and classified as White by current Federal standards for race and ethnicity, yet many MENA people do not identify as White in surveys, such as those conducted by the U.S. Census Bureau. Instead, they often select 'Some Other Race', if it is provided, and write in MENA responses such as Arab, Iranian, or Middle Eastern. In processing survey data for public release, the Census Bureau classifies these responses as White in accordance with Federal guidance set by the U.S. Office of Management and Budget. Research that uses these edited public data relies on limited information on MENA people's racial identification. To address this limitation, we obtained unedited race responses in the nationally representative American Community Survey from 2005-2019 to better understand how people of MENA ancestry report their race. We also use these data to compare the demographic, cultural, socioeconomic, and contextual characteristics of MENA individuals who identify as White versus those who do not identify as White. We find that one in four MENA people do not select White alone as their racial identity, despite official guidance that defines 'White' as people having origins in any of the original peoples of Europe, the Middle East, or North Africa. A variety of individual and contextual factors are associated with this choice, and some of these factors operate differently for U.S.-born and foreign-born MENA people living in the United States.
View Full
Paper PDF
-
SYNTHETIC DATA FOR SMALL AREA ESTIMATION IN THE AMERICAN COMMUNITY SURVEY
April 2013
Working Paper Number:
CES-13-19
Small area estimates provide a critical source of information used to study local populations. Statistical agencies regularly collect data from small areas but are prevented from releasing detailed geographical identifiers in public-use data sets due to disclosure concerns. Alternative data dissemination methods used in practice include releasing summary/aggregate tables, suppressing detailed geographic information in public-use data sets, and accessing restricted data via Research Data Centers. This research examines an alternative method for disseminating microdata that contains more geographical details than are currently being released in public-use data files. Specifically, the method replaces the observed survey values with imputed, or synthetic, values simulated from a hierarchical Bayesian model. Confidentiality protection is enhanced because no actual values are released. The method is demonstrated using restricted data from the 2005-2009 American Community Survey. The analytic validity of the synthetic data is assessed by comparing small area estimates obtained from the synthetic data with those obtained from the observed data.
View Full
Paper PDF
-
A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census
August 2025
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Michael B. Hawes,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. RodrÃguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-25-57
For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level'individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all of the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act.
View Full
Paper PDF