While commercial data sources offer promise to statistical agencies for use in production of official statistics, challenges can arise as the data are not collected for statistical purposes. This paper evaluates the use of 2008-2010 property tax data from CoreLogic, Inc. (CoreLogic), aggregated from county and township governments from around the country, to improve 2010 American Community Survey (ACS) estimates of property tax amounts for single-family homes. Particularly, the research evaluates the potential to use CoreLogic to reduce respondent burden, to study survey response error and to improve adjustments for survey nonresponse. The research found that the coverage of the CoreLogic data varies between counties as does the correspondence between ACS and CoreLogic property taxes. This geographic variation implies that different approaches toward using CoreLogic are needed in different areas of the country. Further, large differences between CoreLogic and ACS property taxes in certain counties seem to be due to conceptual differences between what is collected in the two data sources. The research examines three counties, Clark County, NV, Philadelphia County, PA and St. Louis County, MO, and compares how estimates would change with different approaches using the CoreLogic data. Mean county property tax estimates are highly sensitive to whether ACS or CoreLogic data are used to construct estimates. Using CoreLogic data in imputation modeling for nonresponse adjustment of ACS estimates modestly improves the predictive power of imputation models, although estimates of county property taxes and property taxes by mortgage status are not very sensitive to the imputation method.
-
SYNTHETIC DATA FOR SMALL AREA ESTIMATION IN THE AMERICAN COMMUNITY SURVEY
April 2013
Working Paper Number:
CES-13-19
Small area estimates provide a critical source of information used to study local populations. Statistical agencies regularly collect data from small areas but are prevented from releasing detailed geographical identifiers in public-use data sets due to disclosure concerns. Alternative data dissemination methods used in practice include releasing summary/aggregate tables, suppressing detailed geographic information in public-use data sets, and accessing restricted data via Research Data Centers. This research examines an alternative method for disseminating microdata that contains more geographical details than are currently being released in public-use data files. Specifically, the method replaces the observed survey values with imputed, or synthetic, values simulated from a hierarchical Bayesian model. Confidentiality protection is enhanced because no actual values are released. The method is demonstrated using restricted data from the 2005-2009 American Community Survey. The analytic validity of the synthetic data is assessed by comparing small area estimates obtained from the synthetic data with those obtained from the observed data.
View Full
Paper PDF
-
Exploring Administrative Records Use for Race and Hispanic Origin Item Non-Response
December 2014
Working Paper Number:
carra-2014-16
Race and Hispanic origin data are required to produce official statistics in the United States. Data collected through the American Community Survey and decennial census address missing data through traditional imputation methods, often relying on information from neighbors. These methods work well if neighbors share similar characteristics, however, the shape and patterns of neighborhoods in the United States are changing. Administrative records may provide more accurate data compared to traditional imputation methods for missing race and Hispanic origin responses. This paper first describes the characteristics of persons with missing demographic data, then assesses the coverage of administrative records data for respondents who do not answer race and Hispanic origin questions in Census data. The paper also discusses the distributional impact of using administrative records race and Hispanic origin data to complete missing responses in a decennial census or survey context.
View Full
Paper PDF
-
Comparing the 2019 American Housing Survey to Contemporary Sources of Property Tax Records: Implications for Survey Efficiency and Quality
June 2022
Working Paper Number:
CES-22-22
Given rising nonresponse rates and concerns about respondent burden, government statistical agencies have been exploring ways to supplement household survey data collection with administrative records and other sources of third-party data. This paper evaluates the potential of property tax assessment records to improve housing surveys by comparing these records to responses from the 2019 American Housing Survey. Leveraging the U.S. Census Bureau's linkage infrastructure, we compute the fraction of AHS housing units that could be matched to a unique property parcel (coverage rate), as well as the extent to which survey and property tax data contain the same information (agreement rate). We analyze heterogeneity in coverage and agreement across states, housing characteristics, and 11 AHS items of interest to housing researchers. Our results suggest that partial replacement of AHS data with property data, targeted toward certain survey items or single-family detached homes, could reduce respondent burden without altering data quality. Further research into partial-replacement designs is needed and should proceed on an item-by-item basis. Our work can guide this research as well as those who wish to conduct independent research with property tax records that is representative of the U.S. housing stock.
View Full
Paper PDF
-
The Impact of Household Surveys on 2020 Census Self-Response
July 2022
Working Paper Number:
CES-22-24
Households who were sampled in 2019 for the American Community Survey (ACS) had lower self-response rates to the 2020 Census. The magnitude varied from -1.5 percentage point for household sampled in January 2019 to -15.1 percent point for households sampled in December 2019. Similar effects are found for the Current Population Survey (CPS) as well.
View Full
Paper PDF
-
When and Why Does Nonresponse Occur? Comparing the Determinants of Initial Unit Nonresponse and Panel Attrition
September 2023
Working Paper Number:
CES-23-44
Though unit nonresponse threatens data quality in both cross-sectional and panel surveys, little is understood about how initial nonresponse and later panel attrition may be theoretically or empirically distinct phenomena. This study advances current knowledge of the determinants of both unit nonresponse and panel attrition within the context of the U.S. Census Bureau's Survey of Income and Program Participation (SIPP) panel survey, which I link with high-quality federal administrative records, paradata, and geographic data. By exploiting the SIPP's interpenetrated sampling design and relying on cross-classified random effects modeling, this study quantifies the relative effects of sample household, interviewer, and place characteristics on baseline nonresponse and later attrition, addressing a critical gap in the literature. Given the reliance on successful record linkages between survey sample households and federal administrative data in the nonresponse research, this study also undertakes an explicitly spatial analysis of the place-based characteristics associated with successful record linkages in the U.S.
View Full
Paper PDF
-
Coverage and Agreement of Administrative Records and 2010 American Community Survey Demographic Data
November 2014
Working Paper Number:
carra-2014-14
The U.S. Census Bureau is researching possible uses of administrative records in decennial census and survey operations. The 2010 Census Match Study and American Community Survey (ACS) Match Study represent recent efforts by the Census Bureau to evaluate the extent to which administrative records provide data on persons and addresses in the 2010 Census and 2010 ACS. The 2010 Census Match Study also examines demographic response data collected in administrative records. Building on this analysis, we match data from the 2010 ACS to federal administrative records and third party data as well as to previous census data and examine administrative records coverage and agreement of ACS age, sex, race, and Hispanic origin responses. We find high levels of coverage and agreement for sex and age responses and variable coverage and agreement across race and Hispanic origin groups. These results are similar to findings from the 2010 Census Match Study.
View Full
Paper PDF
-
Earnings Through the Stages: Using Tax Data to Test for Sources of Error in CPS ASEC Earnings and Inequality Measures
September 2024
Working Paper Number:
CES-24-52
In this paper, I explore the impact of generalized coverage error, item non-response bias, and measurement error on measures of earnings and earnings inequality in the CPS ASEC. I match addresses selected for the CPS ASEC to administrative data from 1040 tax returns. I then compare earnings statistics in the tax data for wage and salary earnings in samples corresponding to seven stages of the CPS ASEC survey production process. I also compare the statistics using the actual survey responses. The statistics I examine include mean earnings, the Gini coefficient, percentile earnings shares, and shares of the survey weight for a range of percentiles. I examine how the accuracy of the statistics calculated using the survey data is affected by including imputed responses for both those who did not respond to the full CPS ASEC and those who did not respond to the earnings question. I find that generalized coverage error and item nonresponse bias are dominated by measurement error, and that an important aspect of measurement error is households reporting no wage and salary earnings in the CPS ASEC when there are such earnings in the tax data. I find that the CPS ASEC sample misses earnings at the high end of the distribution from the initial selection stage and that the final survey weights exacerbate this.
View Full
Paper PDF
-
Errors in Survey Reporting and Imputation and Their Effects on Estimates of Food Stamp Program Participation
April 2011
Working Paper Number:
CES-11-14
Benefit receipt in major household surveys is often underreported. This misreporting leads to biased estimates of the economic circumstances of disadvantaged populations, program takeup, and the distributional effects of government programs, and other program effects. We use administrative data on Food Stamp Program (FSP) participation matched to American Community Survey (ACS) and Current Population Survey (CPS) household data. We show that nearly thirty-five percent of true recipient households do not report receipt in the ACS and fifty percent do not report receipt in the CPS. Misreporting, both false negatives and false positives, varies with individual characteristics, leading to complicated biases in FSP analyses. We then directly examine the determinants of program receipt using our combined administrative and survey data. The combined data allow us to examine accurate participation using individual characteristics missing in administrative data. Our results differ from conventional estimates using only survey data, as such estimates understate participation by single parents, non-whites, low income households, and other groups. To evaluate the use of Census Bureau imputed ACS and CPS data, we also examine whether our estimates using survey data alone are closer to those using the accurate combined data when imputed survey observations are excluded. Interestingly, excluding the imputed observations leads to worse ACS estimates, but has less effect on the CPS estimates.
View Full
Paper PDF
-
Estimating the U.S. Citizen Voting-Age Population (CVAP) Using Blended Survey Data, Administrative Record Data, and Modeling: Technical Report
April 2023
Authors:
J. David Brown,
Danielle H. Sandler,
Lawrence Warren,
Moises Yi,
Misty L. Heggeness,
Joseph L. Schafer,
Matthew Spence,
Marta Murray-Close,
Carl Lieberman,
Genevieve Denoeux,
Lauren Medina
Working Paper Number:
CES-23-21
This report develops a method using administrative records (AR) to fill in responses for nonresponding American Community Survey (ACS) housing units rather than adjusting survey weights to account for selection of a subset of nonresponding housing units for follow-up interviews and for nonresponse bias. The method also inserts AR and modeling in place of edits and imputations for ACS survey citizenship item nonresponses. We produce Citizen Voting-Age Population (CVAP) tabulations using this enhanced CVAP method and compare them to published estimates. The enhanced CVAP method produces a 0.74 percentage point lower citizen share, and it is 3.05 percentage points lower for voting-age Hispanics. The latter result can be partly explained by omissions of voting-age Hispanic noncitizens with unknown legal status from ACS household responses. Weight adjustments may be less effective at addressing nonresponse bias under those conditions.
View Full
Paper PDF
-
An Economist's Primer on Survey Samples
September 2000
Working Paper Number:
CES-00-15
Survey data underlie most empirical work in economics, yet economists typically have little familiarity with survey sample design and its effects on inference. This paper describes how sample designs depart from the simple random sampling model implicit in most econometrics textbooks, points out where the effects of this departure are likely to be greatest, and describes the relationship between design-based estimators developed by survey statisticians and related econometric methods for regression. Its intent is to provide empirical economists with enough background in survey methods to make informed use of design-based estimators. It emphasizes surveys of households (the source of most public-use files), but also considers how surveys of businesses differ. Examples from the National Longitudinal Survey of Youth of 1979 and the Current Population Survey illustrate practical aspects of design-based estimation.
View Full
Paper PDF