Statistical agencies frequently publish microdata that have been altered to protect confidentiality. Such data retain utility for many types of broad analyses but can yield biased or Insufficiently precise results in others. Research access to de-identified versions of the restricted-use data with little or no alteration is often possible, albeit costly and time-consuming. We investigate the the advantages and disadvantages of public-use and restricted-use data from the American Community
Survey (ACS) in constructing a wage index. The public-use data used were Public Use Microdata Samples, while the restricted-use data were accessed via a Federal Statistical Research Data Center. We discuss the advantages and disadvantages of each data source and compare estimated CWIs and standard errors at the state and labor market levels.
-
SYNTHETIC DATA FOR SMALL AREA ESTIMATION IN THE AMERICAN COMMUNITY SURVEY
April 2013
Working Paper Number:
CES-13-19
Small area estimates provide a critical source of information used to study local populations. Statistical agencies regularly collect data from small areas but are prevented from releasing detailed geographical identifiers in public-use data sets due to disclosure concerns. Alternative data dissemination methods used in practice include releasing summary/aggregate tables, suppressing detailed geographic information in public-use data sets, and accessing restricted data via Research Data Centers. This research examines an alternative method for disseminating microdata that contains more geographical details than are currently being released in public-use data files. Specifically, the method replaces the observed survey values with imputed, or synthetic, values simulated from a hierarchical Bayesian model. Confidentiality protection is enhanced because no actual values are released. The method is demonstrated using restricted data from the 2005-2009 American Community Survey. The analytic validity of the synthetic data is assessed by comparing small area estimates obtained from the synthetic data with those obtained from the observed data.
View Full
Paper PDF
-
Total Error and Variability Measures with Integrated Disclosure Limitation for Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in On The Map
January 2017
Working Paper Number:
CES-17-71
We report results from the rst comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total employment, beginning-of-quarter employment, full-quarter employment, total payroll, and average monthly earnings of full-quarter employees. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace reports as displayed in OnTheMap (OTM). The evaluation is conducted by generating multiple threads of the edit and imputation models used in the LEHD Infrastructure File System. These threads conform to the Rubin (1987) multiple imputation model, with each thread or implicate being the output of formal probability models that address coverage, edit, and imputation errors. Design-based sampling variability and nite population corrections are also included in the evaluation. We derive special formulas for the Rubin total variability and its components that are consistent with the disclosure avoidance system used for QWI and LODES/OTM workplace reports. These formulas allow us to publish the complete set of detailed total quality measures for QWI and LODES. The analysis reveals that the five publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs have quality in the range generally deemed acceptable. Tabulations involving zero, one or two jobs, which are generally suppressed in the QWI and synthesized in LODES, have substantial total variability but their publication in LODES allows the formation of larger custom aggregations, which will in general have the accuracy estimated for tabulations in the QWI based on a similar number of workers.
View Full
Paper PDF
-
LOOKING BACK ON THREE YEARS OF USING THE SYNTHETIC LBD BETA
February 2014
Working Paper Number:
CES-14-11
Distributions of business data are typically much more skewed than those for household or individual data and public knowledge of the underlying units is greater. As a results, national statistical offices (NSOs) rarely release establishment or firm-level business microdata due to the risk to respondent confidentiality. One potential approach for overcoming these risks is to release synthetic data where the establishment data are simulated from statistical models designed to mimic the distributions of the real underlying microdata. The US Census Bureau's Center for Economic Studies in collaboration with Duke University, the National Institute of Statistical Sciences, and Cornell University made available a synthetic public use file for the Longitudinal Business Database (LBD) comprising more than 20 million records for all business establishment with paid employees dating back to 1976. The resulting product, dubbed the SynLBD, was released in 2010 and is the first-ever comprehensive business microdata set publicly released in the United States including data on establishments employment and payroll, birth and death years, and industrial classification. This pa- per documents the scope of projects that have requested and used the SynLBD.
View Full
Paper PDF
-
Using Small-Area Estimation (SAE) to Estimate Prevalence of Child Health Outcomes at the Census Regional-, State-, and County-Levels
November 2022
Working Paper Number:
CES-22-48
In this study, we implement small-area estimation to assess the prevalence of child health outcomes at the county, state, and regional levels, using national survey data.
View Full
Paper PDF
-
Who Values Human Capitalists' Human Capital? Healthcare Spending and Physician Earnings
July 2020
Working Paper Number:
CES-20-23
Is government guiding the invisible hand at the top of the labor market? We study this question among physicians, the most common occupation among the top one percent of income earners, and whose billings comprise one-fifth of healthcare spending. We use a novel linkage of population-wide tax records with the administrative registry of all physicians in the U.S. to study the characteristics of these high earnings, and the influence of government payments in particular. We find a major role for government on the margin, with half of direct changes to government reimbursement rates flowing directly into physicians' incomes. These policies move physicians' relative and absolute incomes more than any reasonable changes to marginal tax rates. At the same time, the overall level of physician earnings can largely be explained by labor market fundamentals of long work and training hours. Competing occupations also pay well and provide a natural lower bound for physician earnings. We conclude that government plays a major role in determining the value of physicians' human capital, but it is unrealistic to use this power to reduce healthcare spending substantially.
View Full
Paper PDF
-
A METHOD OF CORRECTING FOR MISREPORTING APPLIED TO THE FOOD STAMP PROGRAM
May 2013
Working Paper Number:
CES-13-28
Survey misreporting is known to be pervasive and bias common statistical analyses. In this paper, I first use administrative data on SNAP receipt and amounts linked to American Community Survey data from New York State to show that survey data can misrepresent the program in important ways. For example, more than 1.4 billion dollars received are not reported in New York State alone. 46 percent of dollars received by house- holds with annual income above the poverty line are not reported in the survey data, while only 19 percent are missing below the poverty line. Standard corrections for measurement error cannot remove these biases. I then develop a method to obtain consistent estimates by combining parameter estimates from the linked data with publicly available data. This conditional density method recovers the correct estimates using public use data only, which solves the problem that access to linked administrative data is usually restricted. I examine the degree to which this approach can be used to extrapolate across time and geography, in order to solve the problem that validation data is often based on a convenience sample. I present evidence from within New York State that the extent of heterogeneity is small enough to make extrapolation work well across both time and geography. Extrapolation to the entire U.S. yields substantive differences to survey data and reduces deviations from official aggregates by a factor of 4 to 9 compared to survey aggregates.
View Full
Paper PDF
-
Re-examining Regional Income Convergence: A Distributional Approach
February 2023
Working Paper Number:
CES-23-05
We re-examine recent trends in regional income convergence, considering the full distribution of income rather than focusing on the mean. Measuring similarity by comparing each percentile of state
distributions to the corresponding percentile of the national distribution, we find that state incomes have become less similar (i.e. they have diverged) within the top 20 percent of the income distribution since 1969. The top percentile alone accounts for more than half of aggregate divergence across states over this period by our measure, and the top five percentiles combine to account for 93 percent. Divergence in top incomes across states appears to be driven largely by changes in top incomes among White people, while top incomes among Black people have experienced relatively little divergence.
View Full
Paper PDF
-
Receipt of Public and Private Food Assistance Across the Rural-Urban Continuum Before and During the COVID-19 Pandemic: Analysis of Current Population Survey Data
August 2025
Working Paper Number:
CES-25-51
Background: The nutrition safety net in the United States is critical to supporting food security among households in need. Food assistance in the United States includes both government-funded food programs and private community-based providers who distribute food to in need households. The COVID-19 pandemic impacted experiences of food security and use of private and public food assistance resources. However, this may have differed for households residing in urban versus rural areas. We explored receipt of Supplemental Nutrition Assistance Program (SNAP) benefits or food from community-based emergency food providers across a detailed measure of the rural-urban continuum before and during the COVID-19 pandemic.
Methods: We linked restricted use Current Population Survey Food Security Supplement data to census-tract level United States Department of Agriculture Rural-Urban Commuting Area codes to estimate prevalence of self-reported SNAP participation and receipt of emergency food support across temporal (2015-2019 versus 2020-2021) and socio-spatial (urban, large rural city/town, small rural town, or isolated rural town/area) dimensions. We report prevalences as point estimates with 95% confidence intervals, all weighted for national representation.
Results:
The weighted prevalence of self-reported SNAP participation was 8.9% (8.7-9.2%) in 2015-2019 and 9.1% (8.5-9.5%) in 2020-2021 in urban areas, 11.4% (10.8-12.2%) in 2015-2019 and 11.6% (10.5-12.9%) in 2020-2021 in large rural towns/cities, 13.4% (12.3-14.6%) in 2015-2019 and 12.3% (10.5-14.5%) in 2020-2021 in small rural towns, and 9.7% (8.6-10.9%) in 2015-2019 and 10.9% (8.8-13.4% )in 2020-2021 isolated rural towns. The weighted prevalence of self-reported receipt of emergency food was 4.9% (4.8-5.1%) in 2015-2019 and 6.2% (5.8-6.5%) in 2020-2021 in urban areas, 6.8% (6.2-7.4%) in 2015-2019 and 7.6% (6.6-8.6%) in 2020-2021 in large rural towns/cities, 8.1% (7.3-9.1%) in 2015-2019 and 7.1% (5.7-8.8%) in 2020-2021 in small rural towns, and 6.8% (5.9-7.7%) in 2015-2019 and 8.5% (6.7-10.6%) in 2020-2021 isolated rural towns.
Conclusion: Households in rural communities use public and private food assistance at higher rates than urban areas, but there is variation across communities depending on the level of rurality.
View Full
Paper PDF
-
LEHD Data Documentation LEHD-OVERVIEW-S2008-rev1
December 2011
Working Paper Number:
CES-11-43
View Full
Paper PDF
-
Estimating the Local Productivity Spillovers from Science
January 2017
Working Paper Number:
CES-17-56
We estimate the local productivity spillovers from science by relating wages and real estate
prices across metros to measures of scienti c activity in those metros. We address three fundamental challenges: (1) factor input adjustments using wages and real estate prices, along with Shepards Lemma, to estimate changes metros' productivity, which must equal changes in unit production cost; (2) unobserved differences in metros/causality using a share shift index that exploits historic variation in the mix of research in metros interacted with trends in federal funding for specific fields as an instrument; (3) unobserved differences in workers using data on the states in which people are born. Our estimates show a strong positive relationship between wages and scientifc research and a weak positive relationship for real estate prices. Overall, we estimate high rate of return to research.
View Full
Paper PDF