Statistical agencies frequently publish microdata that have been altered to protect confidentiality. Such data retain utility for many types of broad analyses but can yield biased or Insufficiently precise results in others. Research access to de-identified versions of the restricted-use data with little or no alteration is often possible, albeit costly and time-consuming. We investigate the the advantages and disadvantages of public-use and restricted-use data from the American Community
Survey (ACS) in constructing a wage index. The public-use data used were Public Use Microdata Samples, while the restricted-use data were accessed via a Federal Statistical Research Data Center. We discuss the advantages and disadvantages of each data source and compare estimated CWIs and standard errors at the state and labor market levels.
-
School Equalization in the Shadow of Jim Crow: Causes and Consequences of Resource Disparity in Mississippi circa 1940
May 2024
Working Paper Number:
CES-24-25
A school finance equalization program established in Mississippi in 1920 failed to help many of the state's Black students'an outcome that was typical in the segregated U.S. South (Horace Mann Bond, 1934). In majority-Black school districts, local decision-makers overwhelmingly favored white schools when allotting funds from the state's preexisting per capita fund, and the resulting high expenditures on white students rendered these districts ineligible for the equalization program. Thus, while Black students residing in majority-white districts benefitted from increased spending and standards for Black schools, those in majority-Black districts continued to experience extremely low'and even worsening'school funding. We model the processes that led the so-called equalization policy to create disparities in schooling resources for Black students, and estimate effects on Black children using both a neighboring-counties design and an IV strategy. We find that local educational spending had large impacts on Black enrollment rates, as reported in the 1940 census, with Black educational attainment increasing in marginal spending. Finally, we link the 1940 and 2000 censuses to show that Black children exposed to higher levels of school expenditures had significantly more completed schooling and higher income late in life.
View Full
Paper PDF
-
LOOKING BACK ON THREE YEARS OF USING THE SYNTHETIC LBD BETA
February 2014
Working Paper Number:
CES-14-11
Distributions of business data are typically much more skewed than those for household or individual data and public knowledge of the underlying units is greater. As a results, national statistical offices (NSOs) rarely release establishment or firm-level business microdata due to the risk to respondent confidentiality. One potential approach for overcoming these risks is to release synthetic data where the establishment data are simulated from statistical models designed to mimic the distributions of the real underlying microdata. The US Census Bureau's Center for Economic Studies in collaboration with Duke University, the National Institute of Statistical Sciences, and Cornell University made available a synthetic public use file for the Longitudinal Business Database (LBD) comprising more than 20 million records for all business establishment with paid employees dating back to 1976. The resulting product, dubbed the SynLBD, was released in 2010 and is the first-ever comprehensive business microdata set publicly released in the United States including data on establishments employment and payroll, birth and death years, and industrial classification. This pa- per documents the scope of projects that have requested and used the SynLBD.
View Full
Paper PDF
-
Who Values Human Capitalists' Human Capital? Healthcare Spending and Physician Earnings
July 2020
Working Paper Number:
CES-20-23
Is government guiding the invisible hand at the top of the labor market? We study this question among physicians, the most common occupation among the top one percent of income earners, and whose billings comprise one-fifth of healthcare spending. We use a novel linkage of population-wide tax records with the administrative registry of all physicians in the U.S. to study the characteristics of these high earnings, and the influence of government payments in particular. We find a major role for government on the margin, with half of direct changes to government reimbursement rates flowing directly into physicians' incomes. These policies move physicians' relative and absolute incomes more than any reasonable changes to marginal tax rates. At the same time, the overall level of physician earnings can largely be explained by labor market fundamentals of long work and training hours. Competing occupations also pay well and provide a natural lower bound for physician earnings. We conclude that government plays a major role in determining the value of physicians' human capital, but it is unrealistic to use this power to reduce healthcare spending substantially.
View Full
Paper PDF
-
The Racial and Ethnic Composition of Local Government Employees in Large Metro Areas, 1960-2010
August 2013
Working Paper Number:
CES-13-38
This study uses census microdata from 1960 to 2010 to look at how the racial and ethnic composition of local government employees has reflected the diversity of the general population in the 100 largest metro areas over the last half century. Historically, one route to upward social mobility has been employment in local government. This study uses microdata that predates key immigration and civil rights legislation of the 1960s through to the present to examine changes in the racial and ethnic composition of local government employees and in the general population. For this study, local government employees have been divided into high- and low-wage occupations. These data indicate that local workforces have grown more diverse over time, though representation across different racial and ethnic groups and geographic areas is uneven. African-Americans were underrepresented in high-wage local government employment and overrepresented in low-wage jobs in the early years of this study, particularly in the South, but have since become proportionally represented in high-wage jobs on a national level. In contrast, the most recent data indicate that Hispanic and other races are underrepresented in this employment group, particularly in the West. Though the numbers of Hispanic and Asian high-wage local government employees are increasing, it appears that it will take several years for those groups to achieve proportional representation throughout the United States.
View Full
Paper PDF
-
Design Comparison of LODES and ACS Commuting Data Products
October 2014
Working Paper Number:
CES-14-38
The Census Bureau produces two complementary data products, the American Community Survey (ACS) commuting and workplace data and the Longitudinal Employer-Household Dynamics (LEHD) Origin-Destination Employment Statistics (LODES), which can be used to answer questions about spatial, economic, and demographic questions relating to workplaces and home-to-work flows. The products are complementary in the sense that they measure similar activities but each has important unique characteristics that provide information that the other measure cannot. As a result of questions from data users, the Census Bureau has created this document to highlight the major design differences between these two data products. This report guides users on the relative advantages of each data product for various analyses and helps explain differences that may arise when using the products.2,3
As an overview, these two data products are sourced from different inputs, cover different populations and time periods, are subject to different sets of edits and imputations, are released under different confidentiality protection mechanisms, and are tabulated at different geographic and characteristic levels. As a general rule, the two data products should not be expected to match exactly for arbitrary queries and may differ substantially for some queries.
Within this document, we compare the two data products by the design elements that were deemed most likely to contribute to differences in tabulated data. These elements are: Collection, Coverage, Geographic and Longitudinal Scope, Job Definition and Reference Period, Job and Worker Characteristics, Location Definitions (Workplace and Residence), Completeness of Geographic Information and Edits/Imputations, Geographic Tabulation Levels, Control Totals, Confidentiality Protection and Suppression, and Related
Public-Use Data Products.
An in-depth data analysis'in aggregate or with the microdata'between the two data products will be the subject of a future technical report. The Census Bureau has begun a pilot project to integrate ACS microdata with LEHD administrative data to develop an enhanced frame of employment status, place of work, and commuting. The Census Bureau will publish quality metrics for person match rates, residence and workplace match rates, and commute distance comparisons.
View Full
Paper PDF
-
Estimating the Local Productivity Spillovers from Science
January 2017
Working Paper Number:
CES-17-56
We estimate the local productivity spillovers from science by relating wages and real estate
prices across metros to measures of scienti c activity in those metros. We address three fundamental challenges: (1) factor input adjustments using wages and real estate prices, along with Shepards Lemma, to estimate changes metros' productivity, which must equal changes in unit production cost; (2) unobserved differences in metros/causality using a share shift index that exploits historic variation in the mix of research in metros interacted with trends in federal funding for specific fields as an instrument; (3) unobserved differences in workers using data on the states in which people are born. Our estimates show a strong positive relationship between wages and scientifc research and a weak positive relationship for real estate prices. Overall, we estimate high rate of return to research.
View Full
Paper PDF
-
SYNTHETIC DATA FOR SMALL AREA ESTIMATION IN THE AMERICAN COMMUNITY SURVEY
April 2013
Working Paper Number:
CES-13-19
Small area estimates provide a critical source of information used to study local populations. Statistical agencies regularly collect data from small areas but are prevented from releasing detailed geographical identifiers in public-use data sets due to disclosure concerns. Alternative data dissemination methods used in practice include releasing summary/aggregate tables, suppressing detailed geographic information in public-use data sets, and accessing restricted data via Research Data Centers. This research examines an alternative method for disseminating microdata that contains more geographical details than are currently being released in public-use data files. Specifically, the method replaces the observed survey values with imputed, or synthetic, values simulated from a hierarchical Bayesian model. Confidentiality protection is enhanced because no actual values are released. The method is demonstrated using restricted data from the 2005-2009 American Community Survey. The analytic validity of the synthetic data is assessed by comparing small area estimates obtained from the synthetic data with those obtained from the observed data.
View Full
Paper PDF
-
Developing a Residence Candidate File for Use With Employer-Employee Matched Data
January 2017
Working Paper Number:
CES-17-40
This paper describes the Longitudinal Employer-Household Dynamics (LEHD) program's ongoing efforts to use administrative records in a predictive model that describes residence locations for workers. This project was motivated by the discontinuation of a residence file produced elsewhere at the U.S. Census Bureau. The goal of the Residence Candidate File (RCF) process is to provide the LEHD Infrastructure Files with residence information that maintains currency with the changing state of administrative sources and represents uncertainty in location as a probability distribution. The discontinued file provided only a single residence per person/year, even when contributing administrative data may have contained multiple residences. This paper describes the motivation for the project, our methodology, the administrative data sources, the model estimation and validation results, and the file specifications. We find that the best prediction of the person-place model provides similar, but superior, accuracy compared with previous methods and performs well for workers in the LEHD jobs frame. We outline possibilities for further improvement in sources and modeling as well as recommendations on how to use the preference weights in downstream processing.
View Full
Paper PDF
-
Using the P90/P10 Index to Measure U.S. Inequality Trends with Current Population Survey Data: A View From Inside the Census Bureau Vaults
June 2007
Working Paper Number:
CES-07-17
The March Current Population Survey (CPS) is the primary data source for estimation of levels and trends in labor earnings and income inequality in the USA. Time-inconsistency problems related to top coding in theses data have led many researchers to use the ratio of the 90th and 10th percentiles of these distributions (P90/P10) rather than a more traditional summary measure of inequality. With access to public use and restricted-access internal CPS data, and bounding methods, we show that using P90/P10 does not completely obviate time inconsistency problems, especially for household income inequality trends. Using internal data, we create consistent cell mean values for all top-coded public use values that, when used with public use data, closely track inequality trends in labor earnings and household income using internal data. But estimates of longer-term inequality trends with these corrected data based on P90/P10 differ from those based on the Gini coefficient. The choice of inequality measure matters.
View Full
Paper PDF
-
Using Small-Area Estimation (SAE) to Estimate Prevalence of Child Health Outcomes at the Census Regional-, State-, and County-Levels
November 2022
Working Paper Number:
CES-22-48
In this study, we implement small-area estimation to assess the prevalence of child health outcomes at the county, state, and regional levels, using national survey data.
View Full
Paper PDF