-
Revisiting Methods to Assign Responses when Race and Hispanic Origin Reporting are Discrepant Across Administrative Records and Third Party Sources
May 2024
Working Paper Number:
CES-24-26
The Best Race and Ethnicity Administrative Records Composite file ('Best Race file') is an composite file which combines Census, federal, and Third Party Data (TPD) sources and applies business rules to assign race and ethnicity values to person records. The first version of the Best Race administrative records composite was first constructed in 2015 and subsequently updated each year to include more recent vintages, when available, of the data sources originally included in the composite file. Where updates were available for data sources, the most recent information for persons was retained, and the business rules were reapplied to assign a single race and single Hispanic origin value to each person record. The majority of person records on the Best Race file have consistent race and ethnicity information across data sources. Where there are discrepancies in responses across data sources, we apply a series of business rules to assign a single race and ethnicity to each record. To improve the quality of the Best Race administrative records composite, we have begun revising the business rules which were developed several years ago. This paper discusses the original business rules as well as the implemented changes and their impact on the composite file.
View Full
Paper PDF
-
Mobility, Opportunity, and Volatility Statistics (MOVS):
Infrastructure Files and Public Use Data
April 2024
Working Paper Number:
CES-24-23
Federal statistical agencies and policymakers have identified a need for integrated systems of household and personal income statistics. This interest marks a recognition that aggregated measures of income, such as GDP or average income growth, tell an incomplete story that may conceal large gaps in well-being between different types of individuals and families. Until recently, longitudinal income data that are rich enough to calculate detailed income statistics and include demographic characteristics, such as race and ethnicity, have not been available. The Mobility, Opportunity, and Volatility Statistics project (MOVS) fills this gap in comprehensive income statistics. Using linked demographic and tax records on the population of U.S. working-age adults, the MOVS project defines households and calculates household income, applying an equivalence scale to create a personal income concept, and then traces the progress of individuals' incomes over time. We then output a set of intermediate statistics by race-ethnicity group, sex, year, base-year state of residence, and base-year income decile. We select the intermediate statistics most useful in developing more complex intragenerational income mobility measures, such as transition matrices, income growth curves, and variance-based volatility statistics. We provide these intermediate statistics as part of a publicly released data tool with downloadable flat files and accompanying documentation. This paper describes the data build process and the output files, including a brief analysis highlighting the structure and content of our main statistics.
View Full
Paper PDF
-
Tracking Firm Use of AI in Real Time: A Snapshot from the Business Trends and Outlook Survey
March 2024
Working Paper Number:
CES-24-16
Timely and accurate measurement of AI use by firms is both challenging and crucial for understanding the impacts of AI on the U.S. economy. We provide new, real-time estimates of current and expected future use of AI for business purposes based on the Business Trends and Outlook Survey for September 2023 to February 2024. During this period, bi-weekly estimates of AI use rate rose from 3.7% to 5.4%, with an expected rate of about 6.6% by early Fall 2024. The fraction of workers at businesses that use AI is higher, especially for large businesses and in the Information sector. AI use is higher in large firms but the relationship between AI use and firm size is non-monotonic. In contrast, AI use is higher in young firms although, on an employment-weighted basis, is U-shaped in firm age. Common uses of AI include marketing automation, virtual agents, and data/text analytics. AI users often utilize AI to substitute for worker tasks and equipment/software, but few report reductions in employment due to AI use. Many firms undergo organizational changes to accommodate AI, particularly by training staff, developing new workflows, and purchasing cloud services/storage. AI users also exhibit better overall performance and higher incidence of employment expansion compared to other businesses. The most common reason for non-adoption is the inapplicability of AI to the business.
View Full
Paper PDF
-
The Changing Nature of Pollution, Income, and Environmental Inequality in the United States
January 2024
Working Paper Number:
CES-24-04
This paper uses administrative tax records linked to Census demographic data and high-resolution measures of fine small particulate (PM2.5) exposure to study the evolution of the Black-White pollution exposure gap over the past 40 years. In doing so, we focus on the various ways in which income may have contributed to these changes using a statistical decomposition. We decompose the overall change in the Black-White PM2.5 exposure gap into (1) components that stem from rank-preserving compression in the overall pollution distribution and (2) changes that stem from a reordering of Black and White households within the pollution distribution. We find a significant narrowing of the Black-White PM2.5 exposure gap over this time period that is overwhelmingly driven by rank-preserving changes rather than positional changes. However, the relative positions of Black and White households at the upper end of the pollution distribution have meaningfully shifted in the most recent years.
View Full
Paper PDF
-
Incorporating Administrative Data in Survey Weights for the Basic Monthly Current Population Survey
January 2024
Working Paper Number:
CES-24-02
Response rates to the Current Population Survey (CPS) have declined over time, raising the potential for nonresponse bias in key population statistics. A potential solution is to leverage administrative data from government agencies and third-party data providers when constructing survey weights. In this paper, we take two approaches. First, we use administrative data to build a non-parametric nonresponse adjustment step while leaving the calibration to population estimates unchanged. Second, we use administratively linked data in the calibration process, matching income data from the Internal Return Service and state agencies, demographic data from the Social Security Administration and the decennial census, and industry data from the Census Bureau's Business Register to both responding and nonresponding households. We use the matched data in the household nonresponse adjustment of the CPS weighting algorithm, which changes the weights of respondents to account for differential nonresponse rates among subpopulations.
After running the experimental weighting algorithm, we compare estimates of the unemployment rate and labor force participation rate between the experimental weights and the production weights. Before March 2020, estimates of the labor force participation rates using the experimental weights are 0.2 percentage points higher than the original estimates, with minimal effect on unemployment rate. After March 2020, the new labor force participation rates are similar, but the unemployment rate is about 0.2 percentage points higher in some months during the height of COVID-related interviewing restrictions. These results are suggestive that if there is any nonresponse bias present in the CPS, the magnitude is comparable to the typical margin of error of the unemployment rate estimate. Additionally, the results are overall similar across demographic groups and states, as well as using alternative weighting methodology. Finally, we discuss how our estimates compare to those from earlier papers that calculate estimates of bias in key CPS labor force statistics.
This paper is for research purposes only. No changes to production are being implemented at this time.
View Full
Paper PDF
-
Collaborative Micro-productivity Project: Establishment-Level Productivity Dataset, 1972-2020
December 2023
Working Paper Number:
CES-23-65
We describe the process for building the Collaborative Micro-productivity Project (CMP) microdata and calculating establishment-level productivity numbers. The documentation is for version 7 and the data cover the years 1972-2020. These data have been used in numerous research papers and are used to create the experimental public-use data product Dispersion Statistics on Productivity (DiSP).
View Full
Paper PDF
-
The 2010 Census Confidentiality Protections Failed, Here's How and Why
December 2023
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. Rodr�guez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-23-63
Using only 34 published tables, we reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records. Using the 38-bin age variable tabulated at the census block level, at most 20.1% of reconstructed records can differ from their confidential source on even a single value for these five variables. Using only published data, an attacker can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. The tabular publications in Summary File 1 thus have prohibited disclosure risk similar to the unreleased confidential microdata. Reidentification studies confirm that an attacker can, within blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with nonmodal characteristics) with 95% accuracy, the same precision as the confidential data achieve and far greater than statistical baselines. The flaw in the 2010 Census framework was the assumption that aggregation prevented accurate microdata reconstruction, justifying weaker disclosure limitation methods than were applied to 2010 Census public microdata. The framework used for 2020 Census publications defends against attacks that are based on reconstruction, as we also demonstrate here. Finally, we show that alternatives to the 2020 Census Disclosure Avoidance System with similar accuracy (enhanced swapping) also fail to protect confidentiality, and those that partially defend against reconstruction attacks (incomplete suppression implementations) destroy the primary statutory use case: data for redistricting all legislatures in the country in compliance with the 1965 Voting Rights Act.
View Full
Paper PDF
-
An In-Depth Examination of Requirements for Disclosure Risk Assessment
October 2023
Authors:
Ron Jarmin,
John M. Abowd,
Ian M. Schmutte,
Jerome P. Reiter,
Nathan Goldschlag,
Michael B. Hawes,
Robert Ashmead,
Ryan Cumings-Menon,
Sallie Ann Keller,
Daniel Kifer,
Philip Leclerc,
Rolando A. Rodr�guez,
Victoria A. Velkoff,
Pavel Zhuravlev
Working Paper Number:
CES-23-49
The use of formal privacy to protect the confidentiality of responses in the 2020 Decennial Census of Population and Housing has triggered renewed interest and debate over how to measure the disclosure risks and societal benefits of the published data products. Following long-established precedent in economics and statistics, we argue that any proposal for quantifying disclosure risk should be based on pre-specified, objective criteria. Such criteria should be used to compare methodologies to identify those with the most desirable properties. We illustrate this approach, using simple desiderata, to evaluate the absolute disclosure risk framework, the counterfactual framework underlying differential privacy, and prior-to-posterior comparisons. We conclude that satisfying all the desiderata is impossible, but counterfactual comparisons satisfy the most while absolute disclosure risk satisfies the fewest. Furthermore, we explain that many of the criticisms levied against differential privacy would be levied against any technology that is not equivalent to direct, unrestricted access to confidential data. Thus, more research is needed, but in the near-term, the counterfactual approach appears best-suited for privacy-utility analysis.
View Full
Paper PDF
-
Business Dynamics Statistics for Single-Unit Firms
December 2022
Working Paper Number:
CES-22-57
The Business Dynamics Statistics of Single Unit Firms (BDS-SU) is an experimental data product that provides information on employment and payroll dynamics for each quarter of the year at businesses that operate in one physical location. This paper describes the creation of the data tables and the value they add to the existing Business Dynamics Statistics (BDS) product. We then present some analysis of the published statistics to provide context for the numbers and demonstrate how they can be used to understand both national and local business conditions, with a particular focus on 2020 and the recession induced by the COVID-19 pandemic. We next examine how firms fared in this recession compared to the Great Recession that began in the fourth quarter of 2007. We also consider the heterogenous impact of the pandemic on various industries and areas of the country, showing which types of businesses in which locations were particularly hard hit. We examine business exit rates in some detail and consider why different metro areas experienced the pandemic in different ways. We also consider entry rates and look for evidence of a surge in new businesses as seen in other data sources. We finish by providing a preview of on-going research to match the BDS to worker demographics and show statistics on the relationship between the characteristics of the firm's workers and outcomes such as firm exit and net job creation.
View Full
Paper PDF
-
Improving Estimates of Neighborhood Change with Constant Tract Boundaries
May 2022
Working Paper Number:
CES-22-16
Social scientists routinely rely on methods of interpolation to adjust available data to their research needs. This study calls attention to the potential for substantial error in efforts to harmonize data to constant boundaries using standard approaches to areal and population interpolation. We compare estimates from a standard source (the Longitudinal Tract Data Base) to true values calculated by re-aggregating original 2000 census microdata to 2010 tract areas. We then demonstrate an alternative approach that allows the re-aggregated values to be publicly disclosed, using 'differential privacy' (DP) methods to inject random noise to protect confidentiality of the raw data. The DP estimates are considerably more accurate than the interpolated estimates. We also examine conditions under which interpolation is more susceptible to error. This study reveals cause for greater caution in the use of interpolated estimates from any source. Until and unless DP estimates can be publicly disclosed for a wide range of variables and years, research on neighborhood change should routinely examine data for signs of estimation error that may be substantial in a large share of tracts that experienced complex boundary changes.
View Full
Paper PDF