-
Comparison of Child Reporting in the American Community Survey and Federal Income Tax Returns Based on California Birth Records
September 2024
Working Paper Number:
CES-24-55
This paper takes advantage of administrative records from California, a state with a large child population and a significant historical undercount of children in Census Bureau data, dependent information in the Internal Revenue Service (IRS) Form 1040 records, and the American Community Survey to characterize undercounted children and compare child reporting. While IRS Form 1040 records offer potential utility for adjusting child undercounting in Census Bureau surveys, this analysis finds overlapping reporting issues among various demographic and economic groups. Specifically, older children, those of Non-Hispanic Black mothers and Hispanic mothers, children or parents with lower English proficiency, children whose mothers did not complete high school, and families with lower income-to-poverty ratio were less frequently reported in IRS 1040 records than other groups. Therefore, using IRS 1040 dependent records may have limitations for accurately representing populations with characteristics associated with the undercount of children in surveys.
View Full
Paper PDF
-
Estimating the Potential Impact of Combined Race and Ethnicity Reporting on Long-Term Earnings Statistics
September 2024
Working Paper Number:
CES-24-48
We use place of birth information from the Social Security Administration linked to earnings data from the Longitudinal Employer-Household Dynamics Program and detailed race and ethnicity data from the 2010 Census to study how long-term earnings differentials vary by place of birth for different self-identified race and ethnicity categories. We focus on foreign-born persons from countries that are heavily Hispanic and from countries in the Middle East and North Africa (MENA). We find substantial heterogeneity of long-term earnings differentials within country of birth, some of which will be difficult to detect when the reporting format changes from the current two-question version to the new single-question version because they depend on self-identifications that place the individual in two distinct categories within the single-question format, specifically, Hispanic and White or Black, and MENA and White or Black. We also study the USA-born children of these same immigrants. Long-term earnings differences for the 2nd generation also vary as a function of self-identified ethnicity and race in ways that changing to the single-question format could affect.
View Full
Paper PDF
-
Whose Neighborhood Now? Gentrification and Community Life in Low-Income Urban Neighborhoods
June 2024
Working Paper Number:
CES-24-29
Gentrification is a process of urban change that has wide-ranging social and political impacts, but previous studies provide divergent findings. Does gentrification leave residents feeling alienated, or does it bolster neighborhood social satisfaction? Politically, does urban change mobilize residents, or leave them disengaged? I assess a national, cross-sectional sample of about 17,500 respondents in lower-income urban neighborhoods, and use a structural equation modeling approach to model six latent variables pertaining to local social environment and political participation. Amongst the full sample, gentrification has a positive association with all six factors. However, this relationship depends upon respondents' level of income, length of residency, and racial identity. White residents and those with shorter length of residency report higher levels of social cohesion as gentrification increases, but there is no such association amongst racial minority groups and longer-term residents. This finding aligns with a perspective on gentrification as a racialized process, and demonstrates that gentrification-related amenities primarily serve the interests of white residents and newcomers. All groups, however, are more likely to participate in neighborhood politics as gentrification increases, drawing attention to the agency of local residents as they attempt to influence processes of urban change.
View Full
Paper PDF
-
Where Are Your Parents? Exploring Potential Bias in Administrative Records on Children
March 2024
Working Paper Number:
CES-24-18
This paper examines potential bias in the Census Household Composition Key's (CHCK) probabilistic parent-child linkages. By linking CHCK data to the American Community Survey (ACS), we reveal disparities in parent-child linkages among specific demographic groups and find that characteristics of children that can and cannot be linked to the CHCK vary considerably from the larger population. In particular, we find that children from low-income, less educated households and of Hispanic origin are less likely to be linked to a mother or a father in the CHCK. We also highlight some data considerations when using the CHCK.
View Full
Paper PDF
-
Examining Racial Identity Responses Among People with Middle Eastern and North African Ancestry in the American Community Survey
March 2024
Working Paper Number:
CES-24-14
People with Middle Eastern and North African (MENA) backgrounds living in the United States are defined and classified as White by current Federal standards for race and ethnicity, yet many MENA people do not identify as White in surveys, such as those conducted by the U.S. Census Bureau. Instead, they often select 'Some Other Race', if it is provided, and write in MENA responses such as Arab, Iranian, or Middle Eastern. In processing survey data for public release, the Census Bureau classifies these responses as White in accordance with Federal guidance set by the U.S. Office of Management and Budget. Research that uses these edited public data relies on limited information on MENA people's racial identification. To address this limitation, we obtained unedited race responses in the nationally representative American Community Survey from 2005-2019 to better understand how people of MENA ancestry report their race. We also use these data to compare the demographic, cultural, socioeconomic, and contextual characteristics of MENA individuals who identify as White versus those who do not identify as White. We find that one in four MENA people do not select White alone as their racial identity, despite official guidance that defines 'White' as people having origins in any of the original peoples of Europe, the Middle East, or North Africa. A variety of individual and contextual factors are associated with this choice, and some of these factors operate differently for U.S.-born and foreign-born MENA people living in the United States.
View Full
Paper PDF
-
A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report
December 2023
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. Rodríguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-23-63R
For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level'individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all of the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act. You are reading the full technical report. For the summary paper see https://doi.org/10.1162/99608f92.4a1ebf70.
View Full
Paper PDF
-
The 2010 Census Confidentiality Protections Failed, Here's How and Why
December 2023
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. Rodríguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-23-63
Using only 34 published tables, we reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records. Using the 38-bin age variable tabulated at the census block level, at most 20.1% of reconstructed records can differ from their confidential source on even a single value for these five variables. Using only published data, an attacker can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. The tabular publications in Summary File 1 thus have prohibited disclosure risk similar to the unreleased confidential microdata. Reidentification studies confirm that an attacker can, within blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with nonmodal characteristics) with 95% accuracy, the same precision as the confidential data achieve and far greater than statistical baselines. The flaw in the 2010 Census framework was the assumption that aggregation prevented accurate microdata reconstruction, justifying weaker disclosure limitation methods than were applied to 2010 Census public microdata. The framework used for 2020 Census publications defends against attacks that are based on reconstruction, as we also demonstrate here. Finally, we show that alternatives to the 2020 Census Disclosure Avoidance System with similar accuracy (enhanced swapping) also fail to protect confidentiality, and those that partially defend against reconstruction attacks (incomplete suppression implementations) destroy the primary statutory use case: data for redistricting all legislatures in the country in compliance with the 1965 Voting Rights Act.
View Full
Paper PDF
-
Where to Build Affordable Housing?
Evaluating the Tradeoffs of Location
December 2023
Working Paper Number:
CES-23-62R
How does the location of affordable housing affect tenant welfare, the distribution of assistance, and broader societal objectives such as racial integration? Using administrative data on tenants of units funded by the Low-Income Housing Tax Credit (LIHTC), we first show that characteristics such as race and proxies for need vary widely across neighborhoods. Despite fixed eligibility requirements, LIHTC developments in more opportunity-rich neighborhoods house tenants who are higher income, more educated, and far less likely to be Black. To quantify the welfare implications, we build a residential choice model in which households choose from both market-rate and affordable housing options, where the latter must be rationed. While building affordable housing in higher-opportunity neighborhoods costs more, it also increases household welfare and reduces city-wide segregation. The gains in household welfare, however, accrue to more moderate-need, non-Black/Hispanic households at the expense of other households. This change in the distribution of assistance is primarily due to a 'crowding out' effect: households that only apply for assistance in higher-opportunity neighborhoods crowd out those willing to apply regardless of location. Finally, other policy levers'such as lowering the income limits used for means-testing'have only limited effects relative to the choice of location.
View Full
Paper PDF
-
Granular Income Inequality and Mobility using IDDA: Exploring Patterns across Race and Ethnicity
November 2023
Working Paper Number:
CES-23-55
Shifting earnings inequality among U.S. workers over the last five decades has been widely stud ied, but understanding how these shifts evolve across smaller groups has been difficult. Publicly available data sources typically only ensure representative data at high levels of aggregation, so they obscure many details of earnings distributions for smaller populations. We define and construct a set of granular statistics describing income distributions, income mobility and con ditional income growth for a large number of subnational groups in the U.S. for a two-decade period (1998-2019). In this paper, we use the resulting data to explore the evolution of income inequality and mobility for detailed groups defined by race and ethnicity. We find that patterns identified from the universe of tax filers and W-2 recipients that we observe differ in important ways from those that one might identify in public sources. The full set of statistics that we construct is available publicly as the Income Distributions and Dynamics in America, or IDDA, data set.
View Full
Paper PDF
-
Coverage of Children in the American Community Survey Based on California Birth Records
September 2023
Working Paper Number:
CES-23-46
The U.S. Census Bureau's American Community Survey (ACS) collects information on individuals and households. The ACS provides survey-based estimates of children drawn from a sample of the U.S. population. However, survey responses may not match administrative records, such as birth records. Birth records should provide a complete account of all births, along with child-parent relationships and demographic characteristics. California is a state that has both a large population of children and a high undercount for young children. This paper uses California as a case study to examine differences between reported versus unreported children in the ACS based on state birth records. Child reporting rates were lower for more recent data years, younger children, for Black and Hispanic mothers, and for more complex households. Child reporting rates were higher for more educated mothers and for households above the poverty line. Using mother's race and Hispanic ethnicity from the birth records combined with poverty indices from the ACS, this analysis also finds that child reporting does not uniformly vary with poverty status across all race and ethnicity groups. This research builds support for the utility of state birth records in analyzing the undercount of children.
View Full
Paper PDF