-
Disclosure Limitation and Confidentiality Protection in Linked Data
January 2018
Working Paper Number:
CES-18-07
Confidentiality protection for linked administrative data is a combination of access modalities and statistical disclosure limitation. We review traditional statistical disclosure limitation methods and newer methods based on synthetic data, input noise infusion and formal privacy. We discuss how these methods are integrated with access modalities by providing three detailed examples. The first example is the linkages in the Health and Retirement Study to Social Security Administration data. The second example is the linkage of the Survey of Income and Program Participation to administrative data from the Internal Revenue Service and the Social Security Administration. The third example is the Longitudinal Employer-Household Dynamics data, which links state unemployment insurance records for workers and firms to a wide variety of censuses and surveys at the U.S. Census Bureau. For examples, we discuss access modalities, disclosure limitation methods, the effectiveness of those methods, and the resulting analytical validity. The final sections discuss recent advances in access modalities for linked administrative data.
View Full
Paper PDF
-
Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Improve the U.S. Statistical System?
January 2017
Authors:
Lars Vilhuber,
John M. Abowd,
Daniel Weinberg,
Jerome P. Reiter,
Matthew D. Shapiro,
Robert F. Belli,
Noel Cressie,
David C. Folch,
Scott H. Holan,
Margaret C. Levenstein,
Kristen M. Olson,
Jolene Smyth,
Leen-Kiat Soh,
Bruce D. Spencer,
Seth E. Spielman,
Christopher K. Wikle
Working Paper Number:
CES-17-59R
The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistical System (FSS), particularly the Census Bureau. The activities to date have covered both fundamental and applied statistical research and have focused at least in part on the training of current and future generations of researchers in skills of relevance to surveys and alternative measurement of economic units, households, and persons. This paper discusses some of the key research findings of the eight nodes, organized into six topics: (1) Improving census and survey data collection methods; (2) Using alternative sources of data; (3) Protecting privacy and confidentiality by improving disclosure avoidance; (4) Using spatial and spatio-temporal statistical modeling to improve estimates; (5) Assessing data cost and quality tradeoffs; and (6) Combining information from multiple sources. It also reports on collaborations across nodes and with federal agencies, new software developed, and educational activities and outcomes. The paper concludes with an evaluation of the ability of the FSS to apply the NCRN's research outcomes and suggests some next steps, as well as the implications of this research-network model for future federal government renewal initiatives.
View Full
Paper PDF
-
Sorting Between and Within Industries: A Testable Model of Assortative Matching
January 2017
Working Paper Number:
CES-17-43
We test Shimer's (2005) theory of the sorting of workers between and within industrial sectors based on directed search with coordination frictions, deliberately maintaining its static general equilibrium framework. We fit the model to sector-specific wage, vacancy and output data, including publicly-available statistics that characterize the distribution of worker and employer wage heterogeneity across sectors. Our empirical method is general and can be applied to a broad class of assignment models. The results indicate that industries are the loci of sorting-more productive workers are employed in more productive industries. The evidence confirm that strong assortative matching can be present even when worker and employer components of wage heterogeneity are weakly correlated.
View Full
Paper PDF
-
Evaluating Race and Hispanic Origin Responses of Medicaid Participants Using Census Data
April 2015
Working Paper Number:
carra-2015-01
Health and health care disparities associated with race or Hispanic origin are complex and continue to challenge researchers and policy makers. With the intention of improving the measurement and monitoring of these disparities, provisions of the Patient Protection and Affordable Care Act (ACA) of 2010 require states to collect, report and analyze data on demographic characteristics of applicants and participants in Medicaid and other federally supported programs. By linking Medicaid records to 2010 Census, American Community Survey, and Census 2000, this new large-scale study examines and documents the extent to which pre-ACA Medicaid administrative records match self-reported race and Hispanic origin in Census data. Linked records allow comparisons between individuals with matching and non-matching race and Hispanic origin data across several demographic, socioeconomic and neighborhood characteristics, such as age, gender, language proficiency, education and Census tract variables. Identification of the groups most likely to have non-matching and missing race and Hispanic origin data in Medicaid relative to Census data can inform strategies to improve the quality of demographic data collected from Medicaid populations.
View Full
Paper PDF
-
SYNTHETIC DATA FOR SMALL AREA ESTIMATION IN THE AMERICAN COMMUNITY SURVEY
April 2013
Working Paper Number:
CES-13-19
Small area estimates provide a critical source of information used to study local populations. Statistical agencies regularly collect data from small areas but are prevented from releasing detailed geographical identifiers in public-use data sets due to disclosure concerns. Alternative data dissemination methods used in practice include releasing summary/aggregate tables, suppressing detailed geographic information in public-use data sets, and accessing restricted data via Research Data Centers. This research examines an alternative method for disseminating microdata that contains more geographical details than are currently being released in public-use data files. Specifically, the method replaces the observed survey values with imputed, or synthetic, values simulated from a hierarchical Bayesian model. Confidentiality protection is enhanced because no actual values are released. The method is demonstrated using restricted data from the 2005-2009 American Community Survey. The analytic validity of the synthetic data is assessed by comparing small area estimates obtained from the synthetic data with those obtained from the observed data.
View Full
Paper PDF
-
More than a Million New American Indians in 2000: Who are They?
March 2013
Working Paper Number:
CES-13-02
Over a million people reported their race as American Indian in the 2000 U.S. Census but did not report that race in the 1990 Census. We investigate three questions related to this extraordinary population change: (1) Which subgroups of American Indians had the greatest numerical growth? (2) Which subgroups had the greatest proportional increase? And (3) is it plausible that all 'new' American Indians reported multiple races in 2000? We use full-count and high-density decennial U.S. census data; adjust for birth, death, and immigration; decompose on age, gender, Latino origin, education, and birth state; and compare the observed American Indian subgroup sizes in 2000 to the sizes expected based on 1990 counts. The largest numerical increases were among non-Latino youth (ages 10-19), non-Latino adult women, and adults with no college degree. Latinos, highly-educated adults, and women have the largest proportionate gains, perhaps indicating that 'American Indian' has special appeal in these groups. We also find evidence that a substantial number of new American Indians reported only American Indian race in 2000, rather than a multiple-race response. This research is relevant to social theorists, race scholars, community members, program evaluators, and the Census Bureau.
View Full
Paper PDF
-
Errors in Survey Reporting and Imputation and Their Effects on Estimates of Food Stamp Program Participation
April 2011
Working Paper Number:
CES-11-14
Benefit receipt in major household surveys is often underreported. This misreporting leads to biased estimates of the economic circumstances of disadvantaged populations, program takeup, and the distributional effects of government programs, and other program effects. We use administrative data on Food Stamp Program (FSP) participation matched to American Community Survey (ACS) and Current Population Survey (CPS) household data. We show that nearly thirty-five percent of true recipient households do not report receipt in the ACS and fifty percent do not report receipt in the CPS. Misreporting, both false negatives and false positives, varies with individual characteristics, leading to complicated biases in FSP analyses. We then directly examine the determinants of program receipt using our combined administrative and survey data. The combined data allow us to examine accurate participation using individual characteristics missing in administrative data. Our results differ from conventional estimates using only survey data, as such estimates understate participation by single parents, non-whites, low income households, and other groups. To evaluate the use of Census Bureau imputed ACS and CPS data, we also examine whether our estimates using survey data alone are closer to those using the accurate combined data when imputed survey observations are excluded. Interestingly, excluding the imputed observations leads to worse ACS estimates, but has less effect on the CPS estimates.
View Full
Paper PDF
-
A Formal Test of Assortative Matching in the Labor Market
November 2009
Working Paper Number:
CES-09-40
We estimate a structural model of job assignment in the presence of coordination frictions due to Shimer (2005). The coordination friction model places restrictions on the joint distribution of worker and firm effects from a linear decomposition of log labor earnings. These restrictions permit estimation of the unobservable ability and productivity differences between workers and their employers as well as the way workers sort into jobs on the basis of these unobservable factors. The estimation is performed on matched employer-employee data from the LEHD program of the U.S. Census Bureau. The estimated correlation between worker and firm effects from the earnings decomposition is close to zero, a finding that is often interpreted as evidence that there is no sorting by comparative advantage in the labor market. Our estimates suggest that his finding actually results from a lack of sufficient heterogeneity in the workforce and available jobs. Workers do sort into jobs on the basis of productive differences, but the effects of sorting are not visible because of the composition of workers and employers.
View Full
Paper PDF
-
The Center for Economic Studies 1982-2007: A Brief History
October 2009
Working Paper Number:
CES-09-35
More than half a century ago, visionaries representing both the Census Bureau and the external research community laid the foundation for the Center for Economic Studies (CES) and the Research Data Center (RDC) system. They saw a clear need for a system meeting the inextricably related requirements of providing more and better information from existing Census Bureau data collections while preserving respondent confidentiality and privacy. CES opened in 1982 to house new longitudinal business databases, develop them further, and make them available to qualified researchers. CES and the RDC system evolved to meet the designers' requirements. Research at CES and the RDCs meets the commitments of the Census Bureau (and, recently, of other agencies) to preserving confidentiality while contributing paradigm-shifting fundamental research in a range of disciplines and up-to-the-minute critical tools for decision-makers.
View Full
Paper PDF
-
NEW DATA FOR DYNAMIC ANALYSIS: THE LONGITUDINAL ESTABLISHMENT AND ENTERPRISE MICRODATA (LEEM) FILE
December 1999
Working Paper Number:
CES-99-18
Until now, research on U.S. business activities over time has been hindered by the lack of accurate and comprehensive longitudinal data. The new Longitudinal Establishment and Enterprise Microdata (LEEM) are tremendously rich data that open up numerous possibilities for dynamic analyses of businesses in the U.S. economy. It is the first nationwide high-quality longitudinal database that covers the majority of employer businesses from all sectors of the economy. Due to the confidential nature of these data, the file is located at the Center for Economic Studies in the U.S. Bureau of the Census. To access the data, researchers must submit an acceptable proposal to CES and become sworn Census researchers. This paper describes the LEEM file, the variables contained on the file, and current uses of the data.
View Full
Paper PDF