Revisiting Methods to Assign Responses when Race and Hispanic Origin Reporting are Discrepant Across Administrative Records and Third Party Sources
May 2024
Working Paper Number:
CES-24-26
Abstract
Document Tags and Keywords
Keywords
Keywords are automatically generated using KeyBERT, a powerful and innovative
keyword extraction tool that utilizes BERT embeddings to ensure high-quality and contextually relevant
keywords.
By analyzing the content of working papers, KeyBERT identifies terms and phrases that capture the essence of the
text, highlighting the most significant topics and trends. This approach not only enhances searchability but
provides connections that go beyond potentially domain-specific author-defined keywords.
:
data,
statistical,
database,
census data,
ethnicity,
hispanic,
asian,
imputation,
discrepancy,
record,
federal,
matching,
population,
racial,
race,
indian,
native,
enrollment,
datasets,
race census
Tags
Tags are automatically generated using a pretrained language model from spaCy, which excels at
several tasks, including entity tagging.
The model is able to label words and phrases by part-of-speech,
including "organizations." By filtering for frequent words and phrases labeled as "organizations", papers are
identified to contain references to specific institutions, datasets, and other organizations.
:
Current Population Survey,
Housing and Urban Development,
American Community Survey,
Protected Identification Key,
Census Bureau Disclosure Review Board,
2010 Census,
Indian Health Service,
Person Validation System,
Indian Housing Information Center,
Person Identification Validation System,
Some Other Race
Similar Working Papers
Similarity between working papers are determined by an unsupervised neural
network model
know as Doc2Vec.
Doc2Vec is a model that represents entire documents as fixed-length vectors, allowing for the
capture of semantic meaning in a way that relates to the context of words within the document. The model learns to
associate a unique vector with each document while simultaneously learning word vectors, enabling tasks such as
document classification, clustering, and similarity detection by preserving the order and structure of words. The
document vectors are compared using cosine similarity/distance to determine the most similar working papers.
Papers identified with 🔥 are in the top 20% of similarity.
The 10 most similar working papers to the working paper 'Revisiting Methods to Assign Responses when Race and Hispanic Origin Reporting are Discrepant Across Administrative Records and Third Party Sources' are listed below in order of similarity.
-
Working PaperWhen Race and Hispanic Origin Reporting are Discrepant Across Administrative Records and Third Party Sources: Exploring Methods to Assign Responses🔥
December 2015
Working Paper Number:
carra-2015-08
The U.S. Census Bureau is researching uses of administrative records and third party data in survey and decennial census operations. One potential use of administrative records is to utilize these data when race and Hispanic origin responses are missing. When federal and third party administrative records are compiled, race and Hispanic origin responses are not always the same for an individual across sources. We explore different methods to assign one race and one Hispanic response when these responses are discrepant. We also describe the characteristics of individuals with matching, non-matching, and missing race and Hispanic origin data by demographic, household, and contextual variables. We find that minorities, especially Hispanics, are more likely to have non-matching Hispanic origin and race responses in administrative records and third party data compared to the 2010 Census. Minority groups and individuals ages 0-17 are more likely to have missing race or Hispanic origin data in administrative records and third party data. Larger households tend to have more missing race data in administrative records and third party data than smaller households.View Full Paper PDF
-
Working PaperCoverage and Agreement of Administrative Records and 2010 American Community Survey Demographic Data🔥
November 2014
Working Paper Number:
carra-2014-14
The U.S. Census Bureau is researching possible uses of administrative records in decennial census and survey operations. The 2010 Census Match Study and American Community Survey (ACS) Match Study represent recent efforts by the Census Bureau to evaluate the extent to which administrative records provide data on persons and addresses in the 2010 Census and 2010 ACS. The 2010 Census Match Study also examines demographic response data collected in administrative records. Building on this analysis, we match data from the 2010 ACS to federal administrative records and third party data as well as to previous census data and examine administrative records coverage and agreement of ACS age, sex, race, and Hispanic origin responses. We find high levels of coverage and agreement for sex and age responses and variable coverage and agreement across race and Hispanic origin groups. These results are similar to findings from the 2010 Census Match Study.View Full Paper PDF
-
Working PaperEvaluating Race and Hispanic Origin Responses of Medicaid Participants Using Census Data
April 2015
Working Paper Number:
carra-2015-01
Health and health care disparities associated with race or Hispanic origin are complex and continue to challenge researchers and policy makers. With the intention of improving the measurement and monitoring of these disparities, provisions of the Patient Protection and Affordable Care Act (ACA) of 2010 require states to collect, report and analyze data on demographic characteristics of applicants and participants in Medicaid and other federally supported programs. By linking Medicaid records to 2010 Census, American Community Survey, and Census 2000, this new large-scale study examines and documents the extent to which pre-ACA Medicaid administrative records match self-reported race and Hispanic origin in Census data. Linked records allow comparisons between individuals with matching and non-matching race and Hispanic origin data across several demographic, socioeconomic and neighborhood characteristics, such as age, gender, language proficiency, education and Census tract variables. Identification of the groups most likely to have non-matching and missing race and Hispanic origin data in Medicaid relative to Census data can inform strategies to improve the quality of demographic data collected from Medicaid populations.View Full Paper PDF
-
Working PaperReporting of Indian Health Service Coverage in the American Community Survey
May 2018
Working Paper Number:
carra-2018-04
Response error in surveys affects the quality of data which are relied on for numerous research and policy purposes. We use linked survey and administrative records data to examine reporting of a particular item in the American Community Survey (ACS) - health coverage among American Indians and Alaska Natives (AIANs) through the Indian Health Service (IHS). We compare responses to the IHS portion of the 2014 ACS health insurance question to whether or not individuals are in the 2014 IHS Patient Registration data. We evaluate the extent to which individuals misreport their IHS coverage in the ACS as well as the characteristics associated with misreporting. We also assess whether the ACS estimates of AIANs with IHS coverage represent an undercount. Our results will be of interest to researchers who rely on survey responses in general and specifically the ACS health insurance question. Moreover, our analysis contributes to the literature on using administrative records to measure components of survey error.View Full Paper PDF
-
Working PaperEstimating the U.S. Citizen Voting-Age Population (CVAP) Using Blended Survey Data, Administrative Record Data, and Modeling: Technical Report
April 2023
Working Paper Number:
CES-23-21
This report develops a method using administrative records (AR) to fill in responses for nonresponding American Community Survey (ACS) housing units rather than adjusting survey weights to account for selection of a subset of nonresponding housing units for follow-up interviews and for nonresponse bias. The method also inserts AR and modeling in place of edits and imputations for ACS survey citizenship item nonresponses. We produce Citizen Voting-Age Population (CVAP) tabulations using this enhanced CVAP method and compare them to published estimates. The enhanced CVAP method produces a 0.74 percentage point lower citizen share, and it is 3.05 percentage points lower for voting-age Hispanics. The latter result can be partly explained by omissions of voting-age Hispanic noncitizens with unknown legal status from ACS household responses. Weight adjustments may be less effective at addressing nonresponse bias under those conditions.View Full Paper PDF
-
Working PaperExploring Administrative Records Use for Race and Hispanic Origin Item Non-Response
December 2014
Working Paper Number:
carra-2014-16
Race and Hispanic origin data are required to produce official statistics in the United States. Data collected through the American Community Survey and decennial census address missing data through traditional imputation methods, often relying on information from neighbors. These methods work well if neighbors share similar characteristics, however, the shape and patterns of neighborhoods in the United States are changing. Administrative records may provide more accurate data compared to traditional imputation methods for missing race and Hispanic origin responses. This paper first describes the characteristics of persons with missing demographic data, then assesses the coverage of administrative records data for respondents who do not answer race and Hispanic origin questions in Census data. The paper also discusses the distributional impact of using administrative records race and Hispanic origin data to complete missing responses in a decennial census or survey context.View Full Paper PDF
-
Working PaperDetermination of the 2020 U.S. Citizen Voting Age Population (CVAP) Using Administrative Records and Statistical Methodology Technical Report
October 2020
Working Paper Number:
CES-20-33
This report documents the efforts of the Census Bureau's Citizen Voting-Age Population (CVAP) Internal Expert Panel (IEP) and Technical Working Group (TWG) toward the use of multiple data sources to produce block-level statistics on the citizen voting-age population for use in enforcing the Voting Rights Act. It describes the administrative, survey, and census data sources used, and the four approaches developed for combining these data to produce CVAP estimates. It also discusses other aspects of the estimation process, including how records were linked across the multiple data sources, and the measures taken to protect the confidentiality of the data.View Full Paper PDF
-
Working PaperUnderstanding the Quality of Alternative Citizenship Data Sources for the 2020 Census
August 2018
Working Paper Number:
CES-18-38R
This paper examines the quality of citizenship data in self-reported survey responses compared to administrative records and evaluates options for constructing an accurate count of resident U.S. citizens. Person-level discrepancies between survey-collected citizenship data and administrative records are more pervasive than previously reported in studies comparing survey and administrative data aggregates. Our results imply that survey-sourced citizenship data produce significantly lower estimates of the noncitizen share of the population than would be produced from currently available administrative records; both the survey-sourced and administrative data have shortcomings that could contribute to this difference. Our evidence is consistent with noncitizen respondents misreporting their own citizenship status and failing to report that of other household members. At the same time, currently available administrative records may miss some naturalizations and capture others with a delay. The evidence in this paper also suggests that adding a citizenship question to the 2020 Census would lead to lower self-response rates in households potentially containing noncitizens, resulting in higher fieldwork costs and a lower-quality population count.View Full Paper PDF
-
Working PaperResponse Error & the Medicaid undercount in the CPS
December 2016
Working Paper Number:
carra-2016-11
The Current Population Survey Annual Social and Economic Supplement (CPS ASEC) is an important source for estimates of the uninsured population. Previous research has shown that survey estimates produce an undercount of beneficiaries compared to Medicaid enrollment records. We extend past work by examining the Medicaid undercount in the 2007-2011 CPS ASEC compared to enrollment data from the Medicaid Statistical Information System for calendar years 2006-2010. By linking individuals across datasets, we analyze two types of response error regarding Medicaid enrollment - false negative error and false positive error. We use regression analysis to identify factors associated with these two types of response error in the 2011 CPS ASEC. We find that the Medicaid undercount was between 22 and 31 percent from 2007 to 2011. In 2011, the false negative rate was 40 percent, and 27 percent of Medicaid reports in CPS ASEC were false positives. False negative error is associated with the duration of enrollment in Medicaid, enrollment in Medicare and private insurance, and Medicaid enrollment in the survey year. False positive error is associated with enrollment in Medicare and shared Medicaid coverage in the household. We discuss implications for survey reports of health insurance coverage and for estimating the uninsured population.View Full Paper PDF
-
Working PaperWhere Are Your Parents? Exploring Potential Bias in Administrative Records on Children
March 2024
Working Paper Number:
CES-24-18
This paper examines potential bias in the Census Household Composition Key's (CHCK) probabilistic parent-child linkages. By linking CHCK data to the American Community Survey (ACS), we reveal disparities in parent-child linkages among specific demographic groups and find that characteristics of children that can and cannot be linked to the CHCK vary considerably from the larger population. In particular, we find that children from low-income, less educated households and of Hispanic origin are less likely to be linked to a mother or a father in the CHCK. We also highlight some data considerations when using the CHCK.View Full Paper PDF