-
Gradient Boosting to Address Statistical Problems Arising from Non-Linkage of Census Bureau Datasets
June 2024
Working Paper Number:
CES-24-27
This article introduces the twangRDC package, which contains functions to address non-linkage in US Census Bureau datasets. The Census Bureau's Person Identification Validation System facilitates data linkage by assigning unique person identifiers to federal, third party, decennial census, and survey data. Not all records in these datasets can be linked to the reference file and as such not all records will be assigned an identifier. This article is a tutorial for using the twangRDC to generate nonresponse weights to account for non-linkage of person records across US Census Bureau datasets.
View Full
Paper PDF
-
The Icing on the Cake: The Effects of Monetary Incentives on Income Data Quality in the SIPP
January 2024
Working Paper Number:
CES-24-03
Accurate measurement of key income variables plays a crucial role in economic research and policy decision-making. However, the presence of item nonresponse and measurement error in survey data can cause biased estimates. These biases can subsequently lead to sub-optimal policy decisions and inefficient allocation of resources. While there have been various studies documenting item nonresponse and measurement error in economic data, there have not been many studies investigating interventions that could reduce item nonresponse and measurement error. In our research, we investigate the impact of monetary incentives on reducing item nonresponse and measurement error for labor and investment income in the Survey of Income and Program Participation (SIPP). Our study utilizes a randomized incentive experiment in Waves 1 and 2 of the 2014 SIPP, which allows us to assess the effectiveness of incentives in reducing item nonresponse and measurement error. We find that households receiving incentives had item nonresponse rates that are 1.3 percentage points lower for earnings and 1.5 percentage points lower for Social Security income. Measurement error was 6.31 percentage points lower at the intensive margin for interest income, and 16.48 percentage points lower for dividend income compared to non-incentive recipient households. These findings provide valuable insights for data producers and users and highlight the importance of implementing strategies to improve data quality in economic research.
View Full
Paper PDF
-
The 2010 Census Confidentiality Protections Failed, Here's How and Why
December 2023
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. Rodríguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-23-63
Using only 34 published tables, we reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records. Using the 38-bin age variable tabulated at the census block level, at most 20.1% of reconstructed records can differ from their confidential source on even a single value for these five variables. Using only published data, an attacker can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. The tabular publications in Summary File 1 thus have prohibited disclosure risk similar to the unreleased confidential microdata. Reidentification studies confirm that an attacker can, within blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with nonmodal characteristics) with 95% accuracy, the same precision as the confidential data achieve and far greater than statistical baselines. The flaw in the 2010 Census framework was the assumption that aggregation prevented accurate microdata reconstruction, justifying weaker disclosure limitation methods than were applied to 2010 Census public microdata. The framework used for 2020 Census publications defends against attacks that are based on reconstruction, as we also demonstrate here. Finally, we show that alternatives to the 2020 Census Disclosure Avoidance System with similar accuracy (enhanced swapping) also fail to protect confidentiality, and those that partially defend against reconstruction attacks (incomplete suppression implementations) destroy the primary statutory use case: data for redistricting all legislatures in the country in compliance with the 1965 Voting Rights Act.
View Full
Paper PDF
-
The Economic Geography of Lifecycle Human Capital Accumulation: The Competing Effects of Labor Markets and Childhood Environments
November 2023
Working Paper Number:
CES-23-54
We examine how place shapes the production of human capital across the lifecycle. We ask: do those places that most effectively produce human capital in childhood also have local labor markets that do so in adulthood? We begin by modeling wages across place as driven by 1) location-specific wage premiums, 2) adult human capital accumulation due to local labor market exposure, and 3) childhood human capital accumulation. We construct estimates of location wage premiums using AKM style estimates of movers across US commuting zones and validate these estimates using evidence from plausibly exogenous out migration from New Orleans in response to Hurricane Katrina. Next, we examine differential earnings trajectories among movers to construct estimates of human capital accumulation due to labor market exposure. We validate these estimates using wage changes of multi-time movers. Finally, we estimate the impact of place on childhood human capital production using age variation in moves during childhood. Crucially, our estimates of location wage premiums and adult human capital accumulation allow us to construct estimates of the causal effect of place during childhood that are not confounded by correlated labor market exposure. Using these estimates, we show there is a tradeoff between those places that most effectively produce human capital in childhood and the local labor markets that do so in adulthood. We find that each 1-rank increase in earnings due to adult labor market exposure trades off with a 0.43 rank decrease in earnings due to the local childhood environment. This pattern is closely linked to city size, as adult human capital accumulation generally increases with city size, while childhood human capital accumulation falls. These divergent trajectories are associated with differences in both the physical structure of cities and the nature of social interaction therein. There is no tradeoff present in the largest cities, which provide greater exposure to high-wage earners and higher levels of local investment. Finally, we examine how these patterns are reflected in local rents. Location wage premia are heavily capitalized into rents, but the determinants of lifecycle human capital accumulation are not.
View Full
Paper PDF
-
Mixed-Effects Methods For Search and Matching Research
September 2023
Working Paper Number:
CES-23-43
We study mixed-effects methods for estimating equations containing person and firm effects. In economics such models are usually estimated using fixed-effects methods. Recent enhancements to those fixed-effects methods include corrections to the bias in estimating the covariance matrix of the person and firm effects, which we also consider.
View Full
Paper PDF
-
Noncitizen Coverage and Its Effects on U.S. Population Statistics
August 2023
Working Paper Number:
CES-23-42
We produce population estimates with the same reference date, April 1, 2020, as the 2020 Census of Population and Housing by combining 31 types of administrative record (AR) and third-party sources, including several new to the Census Bureau with a focus on noncitizens. Our AR census national population estimate is higher than other Census Bureau official estimates: 1.8% greater than the 2020 Demographic Analysis high estimate, 3.0% more than the 2020 Census count, and 3.6% higher than the vintage-2020 Population Estimates Program estimate. Our analysis suggests that inclusion of more noncitizens, especially those with unknown legal status, explains the higher AR census estimate. About 19.8% of AR census noncitizens have addresses that cannot be linked to an address in the 2020 Census collection universe, compared to 5.7% of citizens, raising the possibility that the 2020 Census did not collect data for a significant fraction of noncitizens residing in the United States under the residency criteria used for the census. We show differences in estimates by age, sex, Hispanic origin, geography, and socioeconomic characteristics symptomatic of the differences in noncitizen coverage.
View Full
Paper PDF
-
Unionization, Employer Opposition, and Establishment Closure
July 2023
Working Paper Number:
CES-23-35
We study the effect of private-sector unionization on establishment employment and survival. Specifically, we analyze National Labor Relations Board union elections from 1981'2005 using administrative Census data. Our empirical strategy extends standard difference-in-differences techniques with regression discontinuity extrapolation methods. This allows us to avoid biases from only comparing close elections and to estimate treatment effects that include larger marginof- victory elections. Using this strategy, we show that unionization decreases an establishment's employment and likelihood of survival, particularly in manufacturing and other blue-collar and industrial sectors. We hypothesize that two reasons for these effects are firms' ability to avoid working with new unions and employers' opposition to unions. We find that the negative effects are significantly larger for elections at multi-establishment firms. Additionally, after a successful union election at one establishment, employment increases at the firms' other establishments. Both pieces of evidence are consistent with firms avoiding new unions by shifting production from unionized establishments to other establishments. Finally, we find larger declines in employment and survival following elections where managers or owners were likely more opposed to the union. This evidence supports new reasons for the negative effects of unionization we document.
View Full
Paper PDF
-
Fatal Errors: The Mortality Value of Accurate Weather Forecasts
June 2023
Working Paper Number:
CES-23-30
We provide the first revealed preference estimates of the benefits of routine weather forecasts. The benefits come from how people use advance information to reduce mor tality from heat and cold. Theoretically, more accurate forecasts reduce mortality if and only if mortality risk is convex in forecast errors. We test for such convexity using data on the universe of mortality events and weather forecasts for a twelve-year period in the U.S. Results show that erroneously mild forecasts increase mortality whereas erro neously extreme forecasts do not reduce mortality. Making forecasts 50% more accurate would save 2,200 lives per year. The public would be willing to pay $112 billion to make forecasts 50% more accurate over the remainder of the century, of which $22 billion reflects how forecasts facilitate adaptation to climate change.
View Full
Paper PDF
-
Where Have All the "Creative Talents" Gone?
Employment Dynamics of US Inventors
April 2023
Working Paper Number:
CES-23-17
How are inventors allocated in the US economy and does that allocation affect innovative capacity? To answer these questions, we first build a model where an inventor with a new idea has the possibility to work for an entrant or incumbent firm. Strategic considerations encourage the incumbent to hire the inventor, offering higher wages, and then not implement her idea. We then combine data on 760 thousand U.S. inventors with the LEHD data. We find that when an inventor is hired by an incumbent, their earnings increases by 12.6 percent and their innovative output declines by 6 to 11 percent.
View Full
Paper PDF
-
Building the Census Bureau Index of Economic Activity (IDEA)
March 2023
Working Paper Number:
CES-23-15
The Census Bureau Index of Economic Activity (IDEA) is constructed from 15 of the Census Bureau's primary monthly economic time series. The index is intended to provide a single time series reflecting, to the extent possible, the variation over time in the whole set of component series. The component series provide monthly measures of activity in retail and wholesale trade, manufacturing, construction, international trade, and business formations. Most of the input series are Principal Federal Economic Indicators. The index is constructed by applying the method of principal components analysis (PCA) to the time series of monthly growth rates of the seasonally adjusted component series, after standardizing the growth rates to series with mean zero and variance 1. Similar PCA approaches have been used for the construction of other economic indices, including the Chicago Fed National Activity Index issued by the Federal Reserve Bank of Chicago, and the Weekly Economic Index issued by the Federal Reserve Bank of New York. While the IDEA is constructed from time series of monthly data, it is calculated and published every business day, and so is updated whenever a new monthly value is released for any of its component series. Since release dates of data values for a given month vary across the component series, with slight variations in the monthly release date for any one component series, updates to the index are frequent. It is unavoidably the case that, at almost all updates, some of the component series lack observations for the current (most recent) data month. To address this situation, component series that are one month behind are predicted (nowcast) for the current index month, using a multivariate autoregressive time series model. This report discusses the input series to the index, the construction of the index by PCA, and the nowcasting procedure used. The report then examines some properties of the index and its relation to quarterly U.S. Gross Domestic Product and to some monthly non-Census Bureau economic indicators.
View Full
Paper PDF