CREAT: Census Research Exploration and Analysis Tool

Papers Containing Keywords(s): 'record'

The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.
Click here to search again

Frequently Occurring Concepts within this Search

Internal Revenue Service - 24

Social Security Administration - 22

Protected Identification Key - 22

American Community Survey - 21

Center for Economic Studies - 18

Service Annual Survey - 17

Social Security Number - 15

Person Validation System - 14

National Science Foundation - 13

2010 Census - 13

Census Bureau Disclosure Review Board - 12

North American Industry Classification System - 12

Personally Identifiable Information - 11

Research Data Center - 11

Person Identification Validation System - 10

Social Security - 10

Longitudinal Business Database - 10

Indian Health Service - 9

Master Address File - 9

Standard Industrial Classification - 9

Administrative Records - 9

Center for Administrative Records Research and Applications - 9

Longitudinal Employer Household Dynamics - 8

Current Population Survey - 8

County Business Patterns - 8

Employer Identification Numbers - 8

Business Register - 8

Decennial Census - 7

Department of Housing and Urban Development - 7

Indian Housing Information Center - 7

Housing and Urban Development - 7

Some Other Race - 7

Bureau of Labor Statistics - 7

Economic Census - 7

Federal Statistical Research Data Center - 7

Disclosure Review Board - 6

Quarterly Workforce Indicators - 6

Individual Taxpayer Identification Numbers - 6

Survey of Income and Program Participation - 6

Business Dynamics Statistics - 6

SSA Numident - 6

National Opinion Research Center - 6

Computer Assisted Telephone Interviews and Computer Assisted Personal Interviews - 5

Computer Assisted Personal Interview - 5

CATI - 5

Quarterly Census of Employment and Wages - 5

Census Bureau Person Identification Validation System - 5

Census Numident - 5

Census Bureau Business Register - 5

Annual Survey of Manufactures - 5

MAFID - 5

Cornell University - 5

Medicaid Services - 5

Postal Service - 4

Census Bureau Master Address File - 4

Standard Statistical Establishment List - 4

Company Organization Survey - 4

Centers for Medicare - 4

Chicago Census Research Data Center - 4

Unemployment Insurance - 4

Center for Administrative Records Research - 4

Census of Manufactures - 4

PIKed - 4

Sloan Foundation - 3

Supplemental Nutrition Assistance Program - 3

Census Edited File - 3

Census Household Composition Key - 3

University of Chicago - 3

National Center for Health Statistics - 3

Office of Management and Budget - 3

1940 Census - 3

Department of Economics - 3

Alfred P Sloan Foundation - 3

University of Michigan - 3

COVID-19 - 3

Metropolitan Statistical Area - 3

Longitudinal Research Database - 3

Minnesota Population Center - 3

Local Employment Dynamics - 3

Duke University - 3

data - 36

survey - 26

datasets - 24

respondent - 21

microdata - 20

census bureau - 20

census data - 17

matching - 17

data census - 16

agency - 15

database - 15

report - 13

population - 12

statistical - 12

imputation - 11

records census - 10

irs - 10

census records - 10

linkage - 10

matched - 10

identifier - 9

disclosure - 8

federal - 8

use census - 8

census use - 8

census research - 8

ethnicity - 7

hispanic - 7

estimating - 7

coverage - 7

ssa - 7

department - 7

quarterly - 7

information - 7

confidentiality - 6

privacy - 6

publicly - 6

employed - 6

census survey - 6

citizen - 6

business data - 6

aggregate - 6

sector - 6

firms census - 6

census file - 6

1040 - 5

enrollment - 5

employee - 5

filing - 5

census employment - 5

census linked - 5

residence - 5

payroll - 5

enterprise - 5

longitudinal - 5

analysis - 5

associate - 5

survey data - 5

public - 4

minority - 4

ethnic - 4

job - 4

incorporated - 4

employment data - 4

workforce - 4

race - 4

sampling - 4

discrepancy - 4

race census - 4

linked census - 4

resident - 4

census responses - 4

census 2020 - 4

sample - 4

reporting - 4

employ - 4

employment statistics - 4

researcher - 4

research - 4

2010 census - 4

statistical disclosure - 4

model - 4

industrial - 4

statistical agencies - 4

surveys censuses - 3

tenure - 3

assessed - 3

native - 3

state - 3

migration - 3

migrant - 3

medicare - 3

medicaid - 3

recession - 3

establishments data - 3

manufacturing - 3

statistician - 3

financial - 3

demography - 3

inference - 3

ancestry - 3

econometric - 3

earnings - 3

Viewing papers 1 through 10 of 51


  • Working Paper

    The Privacy-Protected Gridded Environmental Impacts Frame

    December 2024

    Working Paper Number:

    CES-24-74

    This paper introduces the Gridded Environmental Impacts Frame (Gridded EIF), a novel privacy-protected dataset derived from the U.S. Census Bureau's confidential Environmental Impacts Frame (EIF) microdata infrastructure. The EIF combines comprehensive administrative records and survey data on the U.S. population with high-resolution geospatial information on environmental hazards. While access to the EIF is restricted due to the confidential nature of the underlying data, the Gridded EIF offers a broader research community the opportunity to glean insights from the data while preserving confidentiality. We describe the data and privacy protection process, and offer guidance on appropriate usage, presenting practical applications.
    View Full Paper PDF
  • Working Paper

    Comparison of Child Reporting in the American Community Survey and Federal Income Tax Returns Based on California Birth Records

    September 2024

    Authors: Gloria G. Aldana

    Working Paper Number:

    CES-24-55

    This paper takes advantage of administrative records from California, a state with a large child population and a significant historical undercount of children in Census Bureau data, dependent information in the Internal Revenue Service (IRS) Form 1040 records, and the American Community Survey to characterize undercounted children and compare child reporting. While IRS Form 1040 records offer potential utility for adjusting child undercounting in Census Bureau surveys, this analysis finds overlapping reporting issues among various demographic and economic groups. Specifically, older children, those of Non-Hispanic Black mothers and Hispanic mothers, children or parents with lower English proficiency, children whose mothers did not complete high school, and families with lower income-to-poverty ratio were less frequently reported in IRS 1040 records than other groups. Therefore, using IRS 1040 dependent records may have limitations for accurately representing populations with characteristics associated with the undercount of children in surveys.
    View Full Paper PDF
  • Working Paper

    Revisions to the LEHD Establishment Imputation Procedure and Applications to Administrative Job Frame

    September 2024

    Working Paper Number:

    CES-24-51

    The Census Bureau is developing a 'job frame' to provide detailed job-level employment data across the U.S. through linked administrative records such as unemployment insurance and IRS W-2 filings. This working paper summarizes the research conducted by the job frame development team on modifying and extending the LEHD Unit-to-Worker (U2W) imputation procedure for the job frame prototype. It provides a conceptual overview of the U2W imputation method, highlighting key challenges and tradeoffs in its current application. The paper then presents four imputation methodologies and evaluates their performance in areas such as establishment assignment accuracy, establishment size matching, and job separation rates. The results show that all methodologies perform similarly in assigning workers to the correct establishment. Non-spell-based methodologies excel in matching establishment sizes, while spell-based methodologies perform better in accurately tracking separation rates.
    View Full Paper PDF
  • Working Paper

    Gradient Boosting to Address Statistical Problems Arising from Non-Linkage of Census Bureau Datasets

    June 2024

    Working Paper Number:

    CES-24-27

    This article introduces the twangRDC package, which contains functions to address non-linkage in US Census Bureau datasets. The Census Bureau's Person Identification Validation System facilitates data linkage by assigning unique person identifiers to federal, third party, decennial census, and survey data. Not all records in these datasets can be linked to the reference file and as such not all records will be assigned an identifier. This article is a tutorial for using the twangRDC to generate nonresponse weights to account for non-linkage of person records across US Census Bureau datasets.
    View Full Paper PDF
  • Working Paper

    Revisiting Methods to Assign Responses when Race and Hispanic Origin Reporting are Discrepant Across Administrative Records and Third Party Sources

    May 2024

    Authors: James M. Noon

    Working Paper Number:

    CES-24-26

    The Best Race and Ethnicity Administrative Records Composite file ('Best Race file') is an composite file which combines Census, federal, and Third Party Data (TPD) sources and applies business rules to assign race and ethnicity values to person records. The first version of the Best Race administrative records composite was first constructed in 2015 and subsequently updated each year to include more recent vintages, when available, of the data sources originally included in the composite file. Where updates were available for data sources, the most recent information for persons was retained, and the business rules were reapplied to assign a single race and single Hispanic origin value to each person record. The majority of person records on the Best Race file have consistent race and ethnicity information across data sources. Where there are discrepancies in responses across data sources, we apply a series of business rules to assign a single race and ethnicity to each record. To improve the quality of the Best Race administrative records composite, we have begun revising the business rules which were developed several years ago. This paper discusses the original business rules as well as the implemented changes and their impact on the composite file.
    View Full Paper PDF
  • Working Paper

    Producing U.S. Population Statistics Using Multiple Administrative Sources

    November 2023

    Working Paper Number:

    CES-23-58

    We identify several challenges encountered when constructing U.S. administrative record-based (AR-based) population estimates for 2020. Though the AR estimates are higher than the 2020 Census at the national level, they are over 15 percent lower in 5 percent of counties, suggesting that locational accuracy can be improved. Other challenges include how to achieve comprehensive coverage, maintain consistent coverage across time, filter out nonresidents and people not alive on the reference date, uncover missing links across person and address records, and predict demographic characteristics when multiple ones are reported or when they are missing. We discuss several ways of addressing these issues, e.g., building in redundancy with more sources, linking children to their parents' addresses, and conducting additional record linkage for people without Social Security Numbers and for addresses not initially linked to the Census Bureau's Master Address File. We discuss modeling to predict lower levels of geography for people lacking those geocodes, the probability that a person is a U.S. resident on the reference date, the probability that an address is the person's residence on the reference date, and the probability a person is in each demographic characteristic category. Regression results illustrate how many of these challenges and solutions affect the AR county population estimates.
    View Full Paper PDF
  • Working Paper

    Methodology on Creating the U.S. Linked Retail Health Clinic (LiRHC) Database

    March 2023

    Working Paper Number:

    CES-23-10

    Retail health clinics (RHCs) are a relatively new type of health care setting and understanding the role they play as a source of ambulatory care in the United States is important. To better understand these settings, a joint project by the Census Bureau and National Center for Health Statistics used data science techniques to link together data on RHCs from Convenient Care Association, County Business Patterns Business Register, and National Plan and Provider Enumeration System to create the Linked RHC (LiRHC, pronounced 'lyric') database of locations throughout the United States during the years 2018 to 2020. The matching methodology used to perform this linkage is described, as well as the benchmarking, match statistics, and manual review and quality checks used to assess the resulting matched data. The large majority (81%) of matches received quality scores at or above 75/100, and most matches were linked in the first two (of eight) matching passes, indicating high confidence in the final linked dataset. The LiRHC database contained 2,000 RHCs and found that 97% of these clinics were in metropolitan statistical areas and 950 were in the South region of the United States. Through this collaborative effort, the Census Bureau and National Center for Health Statistics strive to understand how RHCs can potentially impact population health as well as the access and provision of health care services across the nation.
    View Full Paper PDF
  • Working Paper

    Full Report of the Comparisons of Administrative Record Rosters to Census Self-Responses and NRFU Household Member Responses

    March 2023

    Working Paper Number:

    CES-23-08

    One of the U.S. Census Bureau's innovations in the 2020 U.S. Census was the use of administrative records (AR) to create household rosters for enumerating some addresses when a self response was not available but high-quality ARs were. The goal was to reduce the cost of fieldwork during the Nonresponse Followup operation (NRFU). The original plan had NRFU beginning in mid-May and continuing through late July 2020. However, the COVID-19 pandemic forced the delay of NRFU and caused the Internal Revenue Service to postpone the income tax filing deadline, resulting in an interruption in the delivery of ARs to the U.S. Census Bureau. The delays were not anticipated when U.S. Census Bureau staff conducted the research on AR enumeration with the 2010 Census data in preparation for the 2020 Census or during the fine tuning of plans for using ARs during the 2018 End-to-End Census Test. These circumstances raised questions about whether the quality of the AR household rosters was high enough for use in enumeration. To aid in investigating the concern about the quality of the AR rosters, our analyses compared AR rosters to self-response rosters and NRFU household member responses at addresses where both ARs and a self-response were available.
    View Full Paper PDF
  • Working Paper

    Finding Needles in Haystacks: Multiple-Imputation Record Linkage Using Machine Learning

    November 2021

    Working Paper Number:

    CES-21-35

    This paper considers the problem of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across establishments is highly skewed. To address these difficulties, this paper develops a probabilistic record linkage methodology that combines machine learning (ML) with multiple imputation (MI). This ML-MI methodology is applied to link survey respondents in the Health and Retirement Study to their workplaces in the Census Business Register. The linked data reveal new evidence that non-sampling errors in household survey data are correlated with respondents' workplace characteristics.
    View Full Paper PDF
  • Working Paper

    Developing Content for the Management and Organizational Practices Survey-Hospitals (MOPS-HP)

    September 2021

    Working Paper Number:

    CES-21-25

    Nationally representative U.S. hospital data does not exist on management practices, which have been shown to be related to both clinical and financial performance using past data collected in the World Management Survey (WMS). This paper describes the U.S. Census Bureau's development of content for the Management and Organizational Practices Survey Hospitals (MOPS-HP) that is similar to data collected in the MOPS conducted for the manufacturing sector in 2010 and 2015 and the 2009 WMS. Findings from cognitive testing interviews with 18 chief nursing officers and 13 chief financial officers at 30 different hospitals across 7 states and the District of Columbia led to using industry-tested terminology, to confirming chief nursing officers as MOPS-HP respondents and their ability to provide recall data, and to eliminating questions that tested poorly. Hospital data collected in the MOPS-HP would be the first nationally representative data on management practices with queries on clinical key performance indicators, financial and hospital-wide patient care goals, addressing patient care problems, clinical team interactions and staffing, standardized clinical protocols, and incentives for medical record documentation. The MOPS-HP's purpose is not to collect COVID-19 pandemic information; however, data measuring hospital management practices prior to and during the COVID-19 pandemic are a byproduct of the survey's one-year recall period (2019 and 2020).
    View Full Paper PDF