Papers Containing Keywords(s): 'discrepancy'
The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.
See Working Papers by Tag(s), Keywords(s), Author(s), or Search Text
Click here to search again
Frequently Occurring Concepts within this Search
John M. Abowd - 3
Viewing papers 1 through 10 of 12
-
Working PaperRevisiting Methods to Assign Responses when Race and Hispanic Origin Reporting are Discrepant Across Administrative Records and Third Party Sources
May 2024
Working Paper Number:
CES-24-26
The Best Race and Ethnicity Administrative Records Composite file ('Best Race file') is an composite file which combines Census, federal, and Third Party Data (TPD) sources and applies business rules to assign race and ethnicity values to person records. The first version of the Best Race administrative records composite was first constructed in 2015 and subsequently updated each year to include more recent vintages, when available, of the data sources originally included in the composite file. Where updates were available for data sources, the most recent information for persons was retained, and the business rules were reapplied to assign a single race and single Hispanic origin value to each person record. The majority of person records on the Best Race file have consistent race and ethnicity information across data sources. Where there are discrepancies in responses across data sources, we apply a series of business rules to assign a single race and ethnicity to each record. To improve the quality of the Best Race administrative records composite, we have begun revising the business rules which were developed several years ago. This paper discusses the original business rules as well as the implemented changes and their impact on the composite file.View Full Paper PDF
-
Working PaperSelf-Employment Income Reporting on Surveys
April 2023
Working Paper Number:
CES-23-19
We examine the relation between administrative income data and survey reports for self-employed and wage-earning respondents from 2000 - 2015. The self-employed report 40 percent more wages and self-employment income in the survey than in tax administrative records; this estimate nets out differences between these two sources that are also shared by wage-earners. We provide evidence that differential reporting incentives are an important explanation of the larger self-employed gap by exploiting a well-known artifact ' self-employed respondents exhibit substantial bunching at the first EITC kink in their administrative records. We do not observe the same behavior in their survey responses even after accounting for survey measurement concerns.View Full Paper PDF
-
Working PaperMethodology on Creating the U.S. Linked Retail Health Clinic (LiRHC) Database
March 2023
Working Paper Number:
CES-23-10
Retail health clinics (RHCs) are a relatively new type of health care setting and understanding the role they play as a source of ambulatory care in the United States is important. To better understand these settings, a joint project by the Census Bureau and National Center for Health Statistics used data science techniques to link together data on RHCs from Convenient Care Association, County Business Patterns Business Register, and National Plan and Provider Enumeration System to create the Linked RHC (LiRHC, pronounced 'lyric') database of locations throughout the United States during the years 2018 to 2020. The matching methodology used to perform this linkage is described, as well as the benchmarking, match statistics, and manual review and quality checks used to assess the resulting matched data. The large majority (81%) of matches received quality scores at or above 75/100, and most matches were linked in the first two (of eight) matching passes, indicating high confidence in the final linked dataset. The LiRHC database contained 2,000 RHCs and found that 97% of these clinics were in metropolitan statistical areas and 950 were in the South region of the United States. Through this collaborative effort, the Census Bureau and National Center for Health Statistics strive to understand how RHCs can potentially impact population health as well as the access and provision of health care services across the nation.View Full Paper PDF
-
Working PaperTotal Error and Variability Measures with Integrated Disclosure Limitation for Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in On The Map
January 2017
Working Paper Number:
CES-17-71
We report results from the rst comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total employment, beginning-of-quarter employment, full-quarter employment, total payroll, and average monthly earnings of full-quarter employees. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace reports as displayed in OnTheMap (OTM). The evaluation is conducted by generating multiple threads of the edit and imputation models used in the LEHD Infrastructure File System. These threads conform to the Rubin (1987) multiple imputation model, with each thread or implicate being the output of formal probability models that address coverage, edit, and imputation errors. Design-based sampling variability and nite population corrections are also included in the evaluation. We derive special formulas for the Rubin total variability and its components that are consistent with the disclosure avoidance system used for QWI and LODES/OTM workplace reports. These formulas allow us to publish the complete set of detailed total quality measures for QWI and LODES. The analysis reveals that the five publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs have quality in the range generally deemed acceptable. Tabulations involving zero, one or two jobs, which are generally suppressed in the QWI and synthesized in LODES, have substantial total variability but their publication in LODES allows the formation of larger custom aggregations, which will in general have the accuracy estimated for tabulations in the QWI based on a similar number of workers.View Full Paper PDF
-
Working PaperA Comparison of Training Modules for Administrative Records Use in Nonresponse Followup Operations: The 2010 Census and the American Community Survey
January 2017
Working Paper Number:
CES-17-47
While modeling work in preparation for the 2020 Census has shown that administrative records can be predictive of Nonresponse Followup (NRFU) enumeration outcomes, there is scope to examine the robustness of the models by using more recent training data. The models deployed for workload removal from the 2015 and 2016 Census Tests were based on associations of the 2010 Census with administrative records. Training the same models with more recent data from the American Community Survey (ACS) can identify any changes in parameter associations over time that might reduce the accuracy of model predictions. Furthermore, more recent training data would allow for the incorporation of new administrative record sources not available in 2010. However, differences in ACS methodology and the smaller sample size may limit its applicability. This paper replicates earlier results and examines model predictions based on the ACS in comparison with NRFU outcomes. The evaluation consists of a comparison of predicted counts and household compositions with actual 2015 NRFU outcomes. The main findings are an overall validation of the methodology using independent data.View Full Paper PDF
-
Working PaperResponse Error & the Medicaid undercount in the CPS
December 2016
Working Paper Number:
carra-2016-11
The Current Population Survey Annual Social and Economic Supplement (CPS ASEC) is an important source for estimates of the uninsured population. Previous research has shown that survey estimates produce an undercount of beneficiaries compared to Medicaid enrollment records. We extend past work by examining the Medicaid undercount in the 2007-2011 CPS ASEC compared to enrollment data from the Medicaid Statistical Information System for calendar years 2006-2010. By linking individuals across datasets, we analyze two types of response error regarding Medicaid enrollment - false negative error and false positive error. We use regression analysis to identify factors associated with these two types of response error in the 2011 CPS ASEC. We find that the Medicaid undercount was between 22 and 31 percent from 2007 to 2011. In 2011, the false negative rate was 40 percent, and 27 percent of Medicaid reports in CPS ASEC were false positives. False negative error is associated with the duration of enrollment in Medicaid, enrollment in Medicare and private insurance, and Medicaid enrollment in the survey year. False positive error is associated with enrollment in Medicare and shared Medicaid coverage in the household. We discuss implications for survey reports of health insurance coverage and for estimating the uninsured population.View Full Paper PDF
-
Working PaperEstimating Measurement Error in SIPP Annual Job Earnings: A Comparison of Census Bureau Survey and SSA Administrative Data
July 2011
Working Paper Number:
CES-11-20
We quantify sources of variation in annual job earnings data collected by the Survey of Income and Program Participation (SIPP) to determine how much of the variation is the result of measurement error. Jobs reported in the SIPP are linked to jobs reported in an administrative database, the Detailed Earnings Records (DER) drawn from the Social Security Administration's Master Earnings File, a universe file of all earnings reported on W-2 tax forms. As a result of the match, each job potentially has two earnings observations per year: survey and administrative. Unlike previous validation studies, both of these earnings measures are viewed as noisy measures of some underlying true amount of annual earnings. While the existence of survey error resulting from respondent mistakes or misinterpretation is widely accepted, the idea that administrative data are also error-prone is new. Possible sources of employer reporting error, employee under-reporting of compensation such as tips, and general differences between how earnings may be reported on tax forms and in surveys, necessitates the discarding of the assumption that administrative data are a true measure of the quantity that the survey was designed to collect. In addition, errors in matching SIPP and DER jobs, a necessary task in any use of administrative data, also contribute to measurement error in both earnings variables. We begin by comparing SIPP and DER earnings for different demographic and education groups of SIPP respondents. We also calculate different measures of changes in earnings for individuals switching jobs. We estimate a standard earnings equation model using SIPP and DER earnings and compare the resulting coefficients. Finally exploiting the presence of individuals with multiple jobs and shared employers over time, we estimate an econometric model that includes random person and firm effects, a common error component shared by SIPP and DER earnings, and two independent error components that represent the variation unique to each earnings measure. We compare the variance components from this model and consider how the DER and SIPP differ across unobservable components.View Full Paper PDF
-
Working PaperTowards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database
February 2011
Working Paper Number:
CES-11-04
In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments\' confidentiality. One approach with the potential for overcoming these risks is to release synthetic data; that is, the released establishment data are simulated from statistical models designed to mimic the distributions of the underlying real microdata. In this article, we describe an application of this strategy to create a public use file for the Longitudinal Business Database, an annual economic census of establishments in the United States comprising more than 20 million records dating back to 1976. The U.S. Bureau of the Census and the Internal Revenue Service recently approved the release of these synthetic microdata for public use, making the synthetic Longitudinal Business Database the first-ever business microdata set publicly released in the United States. We describe how we created the synthetic data, evaluated analytical validity, and assessed disclosure risk.View Full Paper PDF
-
Working PaperComparing Measures of Earnings Instability Based on Survey and Adminstrative Reports
August 2010
Working Paper Number:
CES-10-15
In Celik, Juhn, McCue, and Thompson (2009), we found that estimated levels of earnings instability based on data from the Current Population Survey (CPS) and the Survey of Income and Program Participation (SIPP) were reasonably close to each other and to others' estimates from the Panel Study of Income Dynamics (PSID), but estimates from unemployment insurance (UI) earnings were much larger. Given that the UI data are from administrative records which are often posited to be more accurate than survey reports, this raises concerns that measures based on survey data understate true earnings instability. To address this, we use links between survey samples from the SIPP and UI earnings records in the LEHD database to identify sources of differences in work history and earnings information. Substantial work has been done comparing earnings levels from administrative records to those collected in the SIPP and CPS, but our understanding of earnings instability would benefit from further examination of differences across sources in the properties of changes in earnings. We first compare characteristics of the overall and matched samples to address issues of selection in the matching process. We then compare earnings levels and jobs in the SIPP and LEHD data to identify differences between them. Finally we begin to examine how such differences affect estimates of earnings instability. Our preliminary findings suggest that differences in earnings changes for those in the lower tail of the earnings distribution account for much of the difference in instability estimates.View Full Paper PDF
-
Working PaperExploring Differences in Employment between Household and Establishment Data
April 2009
Working Paper Number:
CES-09-09
Using a large data set that links individual Current Population Survey (CPS) records to employer-reported administrative data, we document substantial discrepancies in basic measures of employment status that persist even after controlling for known definitional differences between the two data sources. We hypothesize that reporting discrepancies should be most prevalent for marginal workers and marginal jobs, and find systematic associations between the incidence of reporting discrepancies and observable person and job characteristics that are consistent with this hypothesis. The paper discusses the implications of the reported findings for both micro and macro labor market analysisView Full Paper PDF