-
Non-Random Assignment of Individual Identifiers and Selection into Linked Data: Implications for Research
January 2026
Working Paper Number:
CES-26-06
The U.S. Census Bureau's Person Identification Validation System facilitates anonymous linkages between survey and administrative records by assigning Protected Identification Keys (PIKs) to person records. While PIK assignment is generally accurate, some person records are not successfully assigned a PIK, which can lead to sample selection bias in analyses of linked data. Using the American Community Survey (ACS) and the Current Population Survey Annual Social and Economic Supplement (CPS ASEC) between 2005 and 2022, we corroborate and extend existing findings on the drivers of PIK assignment, showing that the rate of PIK assignment varies widely across socio-demographic subgroups. Using earnings as a test case, we then show that limiting a survey sample of wage earners to person records with PIKs or successful linkages to W-2 wage records tends to overestimate self-reported wage earnings, on average, indicative of linkage-induced selection bias. In a validation exercise, we demonstrate that reweighting methods, such as inverse probability weighting or entropy balancing, can mitigate this bias.
View Full
Paper PDF
-
Manufacturing Dispersion: How Data Cleaning Choices Affect Measured Misallocation and Productivity Growth in the Annual Survey of Manufactures
September 2025
Working Paper Number:
CES-25-67
Measurement of dispersion of productivity levels and productivity growth rates across businesses is a key input for answering a variety of important economic questions, such as understanding the allocation of economic inputs across businesses and over time. While item nonresponse is a readily quantifiable issue, we show there is also misreporting by respondents in the Annual Survey of Manufactures (ASM). Aware of these measurement issues, the Census Bureau edits and imputes survey responses before tabulation and dissemination. However, edit and imputation methods that are suitable for publishing aggregate totals may not be suitable for estimating other measures from the microdata. We show that the methods used dramatically affect estimates of productivity dispersion, allocative efficiency, and aggregate productivity growth. Using a Bayesian approach for editing and imputation, we model the joint distributions of all variables needed to estimate these measures, and we quantify the degree of uncertainty in the estimates due to imputations for faulty or missing data.
View Full
Paper PDF
-
Job Tasks, Worker Skills, and Productivity
September 2025
Authors:
John Haltiwanger,
Lucia Foster,
Cheryl Grim,
Zoltan Wolf,
Cindy Cunningham,
Sabrina Wulff Pabilonia,
Jay Stewart,
Cody Tuttle,
G. Jacob Blackwood,
Matthew Dey,
Rachel Nesbit
Working Paper Number:
CES-25-63
We present new empirical evidence suggesting that we can better understand productivity dispersion across businesses by accounting for differences in how tasks, skills, and occupations are organized. This aligns with growing attention to the task content of production. We link establishment-level data from the Bureau of Labor Statistics Occupational Employment and Wage Statistics survey with productivity data from the Census Bureau's manufacturing surveys. Our analysis reveals strong relationships between establishment productivity and task, skill, and occupation inputs. These relationships are highly nonlinear and vary by industry. When we account for these patterns, we can explain a substantial share of productivity dispersion across establishments.
View Full
Paper PDF
-
Earnings Measurement Error, Nonresponse and Administrative Mismatch in the CPS
July 2025
Working Paper Number:
CES-25-48
Using the Current Population Survey Annual Social and Economic Supplement matched to Social Security Administration Detailed Earnings Records, we link observations across consecutive years to investigate a relationship between item nonresponse and measurement error in the earnings questions. Linking individuals across consecutive years allows us to observe switching from response to nonresponse and vice versa. We estimate OLS, IV, and finite mixture models that allow for various assumptions separately for men and women. We find that those who respond in both years of the survey exhibit less measurement error than those who respond in one year. Our findings suggest a trade-off between survey response and data quality that should be considered by survey designers, data collectors, and data users.
View Full
Paper PDF
-
The Rise of Industrial AI in America: Microfoundations of the Productivity J-curve(s)
April 2025
Working Paper Number:
CES-25-27
We examine the prevalence and productivity dynamics of artificial intelligence (AI) in American manufacturing. Working with the Census Bureau to collect detailed large-scale data for 2017 and 2021, we focus on AI-related technologies with industrial applications. We find causal evidence of J-curve-shaped returns, where short-term performance losses precede longer-term gains. Consistent with costly adjustment taking place within core production processes, industrial AI use increases work-in-progress inventory, investment in industrial robots, and labor shedding, while harming productivity and profitability in the short run. These losses are unevenly distributed, concentrating among older businesses while being mitigated by growth-oriented business strategies and within-firm spillovers. Dynamics, however, matter: earlier (pre-2017) adopters exhibit stronger growth over time, conditional on survival. Notably, among older establishments, abandonment of structured production-management practices accounts for roughly one-third of these losses, revealing a specific channel through which intangible factors shape AI's impact. Taken together, these results provide novel evidence on the microfoundations of technology J-curves, identifying mechanisms and illuminating how and why they differ across firm types. These findings extend our understanding of modern General Purpose Technologies, explaining why their economic impact'exemplified here by AI'may initially disappoint, particularly in contexts dominated by older, established firms.
View Full
Paper PDF
-
The Geography of Inventors and Local Knowledge Spillovers in R&D
October 2024
Working Paper Number:
CES-24-59
I causally estimate local knowledge spillovers in R&D and quantify their importance when implementing R&D policies. Using a new administrative panel on German inventors, I estimate these spillovers by isolating quasi-exogenous variation from the arrival of East German inventors across West Germany after the Reunification of Germany in 1990. Increasing the number of inventors by 1% increases inventor productivity by 0.4%. I build a spatial model of innovation, and show that these spillovers are crucial when reducing migration costs for inventors or implementing R&D subsidies to promote economic activity.
View Full
Paper PDF
-
Empirical Distribution of the Plant-Level Components of Energy and Carbon Intensity at the Six-digit NAICS Level Using a Modified KAYA Identity
September 2024
Working Paper Number:
CES-24-46
Three basic pillars of industry-level decarbonization are energy efficiency, decarbonization of energy sources, and electrification. This paper provides estimates of a decomposition of these three components of carbon emissions by industry: energy intensity, carbon intensity of energy, and energy (fuel) mix. These estimates are constructed at the six-digit NAICS level from non-public, plant-level data collected by the Census Bureau. Four quintiles of the distribution of each of the three components are constructed, using multiple imputation (MI) to deal with non-reported energy variables in the Census data. MI allows the estimates to avoid non-reporting bias. MI also allows more six-digit NAICS to be estimated under Census non-disclosure rules, since dropping non-reported observations may have reduced the sample sizes unnecessarily. The estimates show wide variation in each of these three components of emissions (intensity) and provide a first empirical look into the plant-level variation that underlies carbon emissions.
View Full
Paper PDF
-
Expanding the Frontier of Economic Statistics Using Big Data: A Case Study of Regional Employment
July 2024
Working Paper Number:
CES-24-37
Big data offers potentially enormous benefits for improving economic measurement, but it also presents challenges (e.g., lack of representativeness and instability), implying that their value is not always clear. We propose a framework for quantifying the usefulness of these data sources for specific applications, relative to existing official sources. We specifically weigh the potential benefits of additional granularity and timeliness, while examining the accuracy associated with any new or improved estimates, relative to comparable accuracy produced in existing official statistics. We apply the methodology to employment estimates using data from a payroll processor, considering both the improvement of existing state-level estimates, but also the production of new, more timely, county-level estimates. We find that incorporating payroll data can improve existing state-level estimates by 11% based on out-of-sample mean absolute error, although the improvement is considerably higher for smaller state-industry cells. We also produce new county-level estimates that could provide more timely granular estimates than previously available. We develop a novel test to determine if these new county-level estimates have errors consistent with official series. Given the level of granularity, we cannot reject the hypothesis that the new county estimates have an accuracy in line with official measures, implying an expansion of the existing frontier. We demonstrate the practical importance of these experimental estimates by investigating a hypothetical application during the COVID-19 pandemic, a period in which more timely and granular information could have assisted in implementing effective policies. Relative to existing estimates, we find that the alternative payroll data series could help identify areas of the country where employment was lagging. Moreover, we also demonstrate the value of a more timely series.
View Full
Paper PDF
-
Gradient Boosting to Address Statistical Problems Arising from Non-Linkage of Census Bureau Datasets
June 2024
Working Paper Number:
CES-24-27
This article introduces the twangRDC package, which contains functions to address non-linkage in US Census Bureau datasets. The Census Bureau's Person Identification Validation System facilitates data linkage by assigning unique person identifiers to federal, third party, decennial census, and survey data. Not all records in these datasets can be linked to the reference file and as such not all records will be assigned an identifier. This article is a tutorial for using the twangRDC to generate nonresponse weights to account for non-linkage of person records across US Census Bureau datasets.
View Full
Paper PDF
-
The Icing on the Cake: The Effects of Monetary Incentives on Income Data Quality in the SIPP
January 2024
Working Paper Number:
CES-24-03
Accurate measurement of key income variables plays a crucial role in economic research and policy decision-making. However, the presence of item nonresponse and measurement error in survey data can cause biased estimates. These biases can subsequently lead to sub-optimal policy decisions and inefficient allocation of resources. While there have been various studies documenting item nonresponse and measurement error in economic data, there have not been many studies investigating interventions that could reduce item nonresponse and measurement error. In our research, we investigate the impact of monetary incentives on reducing item nonresponse and measurement error for labor and investment income in the Survey of Income and Program Participation (SIPP). Our study utilizes a randomized incentive experiment in Waves 1 and 2 of the 2014 SIPP, which allows us to assess the effectiveness of incentives in reducing item nonresponse and measurement error. We find that households receiving incentives had item nonresponse rates that are 1.3 percentage points lower for earnings and 1.5 percentage points lower for Social Security income. Measurement error was 6.31 percentage points lower at the intensive margin for interest income, and 16.48 percentage points lower for dividend income compared to non-incentive recipient households. These findings provide valuable insights for data producers and users and highlight the importance of implementing strategies to improve data quality in economic research.
View Full
Paper PDF