-
Measuring Income of the Aged in Household Surveys: Evidence from Linked Administrative Records
June 2024
Working Paper Number:
CES-24-32
Research has shown that household survey estimates of retirement income (defined benefit pensions and defined contribution account withdrawals) suffer from substantial underreporting which biases downward measures of financial well-being among the aged. Using data from both the redesigned 2016 Current Population Survey Annual Social and Economic Supplement (CPS ASEC) and the Health and Retirement Study (HRS), each matched with administrative records, we examine to what extent underreporting of retirement income affects key statistics such as reliance on Social Security benefits and poverty among the aged. We find that underreporting of retirement income is still prevalent in the CPS ASEC. While the HRS does a better job than the CPS ASEC in terms of capturing retirement income, it still falls considerably short compared to administrative records. Consequently, the relative importance of Social Security income remains overstated in household surveys'53 percent of elderly beneficiaries in the CPS ASEC and 49 percent in the HRS rely on Social Security for the majority of their incomes compared to 42 percent in the linked administrative data. The poverty rate for those aged 65 and over is also overstated'8.8 percent in the CPS ASEC and 7.4 percent in the HRS compared to 6.4 percent in the linked administrative data. Our results illustrate the effects of using alternative data sources in producing key statistics from the Social Security Administration's Income of the Aged publication.
View Full
Paper PDF
-
When and Why Does Nonresponse Occur? Comparing the Determinants of Initial Unit Nonresponse and Panel Attrition
September 2023
Working Paper Number:
CES-23-44
Though unit nonresponse threatens data quality in both cross-sectional and panel surveys, little is understood about how initial nonresponse and later panel attrition may be theoretically or empirically distinct phenomena. This study advances current knowledge of the determinants of both unit nonresponse and panel attrition within the context of the U.S. Census Bureau's Survey of Income and Program Participation (SIPP) panel survey, which I link with high-quality federal administrative records, paradata, and geographic data. By exploiting the SIPP's interpenetrated sampling design and relying on cross-classified random effects modeling, this study quantifies the relative effects of sample household, interviewer, and place characteristics on baseline nonresponse and later attrition, addressing a critical gap in the literature. Given the reliance on successful record linkages between survey sample households and federal administrative data in the nonresponse research, this study also undertakes an explicitly spatial analysis of the place-based characteristics associated with successful record linkages in the U.S.
View Full
Paper PDF
-
Finding Needles in Haystacks: Multiple-Imputation Record Linkage Using Machine Learning
November 2021
Working Paper Number:
CES-21-35
This paper considers the problem of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across establishments is highly skewed. To address these difficulties, this paper develops a probabilistic record linkage methodology that combines machine learning (ML) with multiple imputation (MI). This ML-MI methodology is applied to link survey respondents in the Health and Retirement Study to their workplaces in the Census Business Register. The linked data reveal new evidence that non-sampling errors in household survey data are correlated with respondents' workplace characteristics.
View Full
Paper PDF
-
Optimal Probabilistic Record Linkage: Best Practice for Linking Employers in Survey and Administrative Data
March 2019
Working Paper Number:
CES-19-08
This paper illustrates an application of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across firms is highly asymmetric. To address these difficulties, this paper uses a supervised machine learning model to probabilistically link survey respondents in the Health and Retirement Study (HRS) with employers and establishments in the Census Business Register (BR) to create a new data source which we call the CenHRS. Multiple imputation is used to propagate uncertainty from the linkage step into subsequent analyses of the linked data. The linked data reveal new evidence that survey respondents' misreporting and selective nonresponse about employer characteristics are systematically correlated with wages.
View Full
Paper PDF
-
Disclosure Limitation and Confidentiality Protection in Linked Data
January 2018
Working Paper Number:
CES-18-07
Confidentiality protection for linked administrative data is a combination of access modalities and statistical disclosure limitation. We review traditional statistical disclosure limitation methods and newer methods based on synthetic data, input noise infusion and formal privacy. We discuss how these methods are integrated with access modalities by providing three detailed examples. The first example is the linkages in the Health and Retirement Study to Social Security Administration data. The second example is the linkage of the Survey of Income and Program Participation to administrative data from the Internal Revenue Service and the Social Security Administration. The third example is the Longitudinal Employer-Household Dynamics data, which links state unemployment insurance records for workers and firms to a wide variety of censuses and surveys at the U.S. Census Bureau. For examples, we discuss access modalities, disclosure limitation methods, the effectiveness of those methods, and the resulting analytical validity. The final sections discuss recent advances in access modalities for linked administrative data.
View Full
Paper PDF
-
Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Improve the U.S. Statistical System?
January 2017
Authors:
Lars Vilhuber,
John M. Abowd,
Daniel Weinberg,
Jerome P. Reiter,
Matthew D. Shapiro,
Robert F. Belli,
Noel Cressie,
David C. Folch,
Scott H. Holan,
Margaret C. Levenstein,
Kristen M. Olson,
Jolene Smyth,
Leen-Kiat Soh,
Bruce D. Spencer,
Seth E. Spielman,
Christopher K. Wikle
Working Paper Number:
CES-17-59R
The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistical System (FSS), particularly the Census Bureau. The activities to date have covered both fundamental and applied statistical research and have focused at least in part on the training of current and future generations of researchers in skills of relevance to surveys and alternative measurement of economic units, households, and persons. This paper discusses some of the key research findings of the eight nodes, organized into six topics: (1) Improving census and survey data collection methods; (2) Using alternative sources of data; (3) Protecting privacy and confidentiality by improving disclosure avoidance; (4) Using spatial and spatio-temporal statistical modeling to improve estimates; (5) Assessing data cost and quality tradeoffs; and (6) Combining information from multiple sources. It also reports on collaborations across nodes and with federal agencies, new software developed, and educational activities and outcomes. The paper concludes with an evaluation of the ability of the FSS to apply the NCRN's research outcomes and suggests some next steps, as well as the implications of this research-network model for future federal government renewal initiatives.
View Full
Paper PDF
-
The New Lifecycle of Women's Employment: Disappearing Humps, Sagging Middle, Expanding Tops
November 2016
Working Paper Number:
carra-2016-07
The new lifecycle of women's employment is initially high and flat, there is a dip in the middle and a phasing out that is more prolonged than for previous cohorts. The hump is gone, the middle is a bit sagging and the top has greatly expanded. We explore the increase in cumulative work experience for women from the 1930s to the 1970s birth cohorts using the SIPP and the HRS. We investigate the changing labor force impact of a birth event across cohorts and by education and also the impact of taking leave or quitting. We find greatly increased labor force experience across cohorts, far less time out after a birth and greater labor force recovery for those who take paid or unpaid leave. More work experience across the lifecycle is related to the increased employment of women in their older ages.
View Full
Paper PDF
-
Evaluating the Use of Commercial Data to Improve Survey Estimates of Property Taxes
August 2016
Working Paper Number:
carra-2016-06
While commercial data sources offer promise to statistical agencies for use in production of official statistics, challenges can arise as the data are not collected for statistical purposes. This paper evaluates the use of 2008-2010 property tax data from CoreLogic, Inc. (CoreLogic), aggregated from county and township governments from around the country, to improve 2010 American Community Survey (ACS) estimates of property tax amounts for single-family homes. Particularly, the research evaluates the potential to use CoreLogic to reduce respondent burden, to study survey response error and to improve adjustments for survey nonresponse. The research found that the coverage of the CoreLogic data varies between counties as does the correspondence between ACS and CoreLogic property taxes. This geographic variation implies that different approaches toward using CoreLogic are needed in different areas of the country. Further, large differences between CoreLogic and ACS property taxes in certain counties seem to be due to conceptual differences between what is collected in the two data sources. The research examines three counties, Clark County, NV, Philadelphia County, PA and St. Louis County, MO, and compares how estimates would change with different approaches using the CoreLogic data. Mean county property tax estimates are highly sensitive to whether ACS or CoreLogic data are used to construct estimates. Using CoreLogic data in imputation modeling for nonresponse adjustment of ACS estimates modestly improves the predictive power of imputation models, although estimates of county property taxes and property taxes by mortgage status are not very sensitive to the imputation method.
View Full
Paper PDF
-
The Shifting Job Tenure Distribution
January 2016
Working Paper Number:
CES-16-12R
There has been a shift in the U.S. job tenure distribution toward longer-duration jobs since 2000. This change is apparent both in the tenure supplements to the Current Population Survey and in matched employer-employee data. A substantial portion of this shift can be accounted for by the ageing of the workforce and the decline in the entry rate of new employer businesses. This shift is accounted for more by declines in the hiring rate, which are concentrated in the labor market downturns associated with the 2001 and 2007-2009 recessions, rather than declines in separation rates. The increase in average real earnings since 2007 is less than what would be predicted by the shift toward longer-tenure jobs because of declines in tenure-held-constant real earnings. Regression estimates of the returns to job tenure provide no evidence that the shift in the job tenure distribution is being driven by better matches between workers and employers.
View Full
Paper PDF
-
Does the Retirement Consumption Puzzle Differ Across the Distribution?
March 2011
Working Paper Number:
CES-11-09R
Previous research has repeatedly found a puzzling one-time drop in the mean and median of consumption at retirement, contrary to the predictions of the life-cycle hypothesis. However, very little is known as to whether these effects vary across the consumption distribution. This study expands upon the previous work by examining changes in the consumption distribution between the non-retired and the retired using quantile regression techniques on pseudo-cohorts from the cross-sectional data of the 1990-2007 Consumer Expenditure Survey. The results indicate that there are insignificant changes between these groups at the lower end of the consumption distribution, while there are significant decreases at the higher end of this distribution. In addition, these changes in the distribution are gradually larger in magnitude when moving from the lower end to the higher end, which is found using several different measures of consumption. Work-related expenditures are instead shown to decrease uniformly across the consumption distribution. This evidence reveals that there is a progressive distributional component to the retirement consumption puzzle.
View Full
Paper PDF