CREAT: Census Research Exploration and Analysis Tool

Papers Containing Keywords(s): 'assessed'

The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.
Click here to search again

Frequently Occurring Concepts within this Search

Internal Revenue Service - 15

Current Population Survey - 14

Social Security Administration - 12

American Community Survey - 12

Protected Identification Key - 11

Person Validation System - 10

Social Security - 9

Census Bureau Disclosure Review Board - 8

Employer Identification Numbers - 8

Social Security Number - 8

Center for Economic Studies - 7

Survey of Income and Program Participation - 7

Master Address File - 7

2010 Census - 7

Personally Identifiable Information - 7

Housing and Urban Development - 6

Supplemental Nutrition Assistance Program - 6

Decennial Census - 6

Computer Assisted Personal Interview - 6

Disclosure Review Board - 6

Department of Housing and Urban Development - 5

Business Register - 5

W-2 - 5

Research Data Center - 5

Census Bureau Person Identification Validation System - 4

Metropolitan Statistical Area - 4

Temporary Assistance for Needy Families - 4

Social Science Research Institute - 4

Service Annual Survey - 4

Longitudinal Employer Household Dynamics - 4

Census Bureau Business Register - 3

Disability Insurance - 3

American Housing Survey - 3

MAF-ARF - 3

Bureau of Labor Statistics - 3

Some Other Race - 3

North American Industry Classification System - 3

National Center for Health Statistics - 3

Data Management System - 3

Person Identification Validation System - 3

Individual Taxpayer Identification Numbers - 3

Administrative Records - 3

University of Chicago - 3

Chicago Census Research Data Center - 3

National Opinion Research Center - 3

Center for Administrative Records Research and Applications - 3

Viewing papers 1 through 10 of 21


  • Working Paper

    The Design of Sampling Strata for the National Household Food Acquisition and Purchase Survey

    February 2025

    Working Paper Number:

    CES-25-13

    The National Household Food Acquisition and Purchase Survey (FoodAPS), sponsored by the United States Department of Agriculture's (USDA) Economic Research Service (ERS) and Food and Nutrition Service (FNS), examines the food purchasing behavior of various subgroups of the U.S. population. These subgroups include participants in the Supplemental Nutrition Assistance Program (SNAP) and the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC), as well as households who are eligible for but don't participate in these programs. Participants in these social protection programs constitute small proportions of the U.S. population; obtaining an adequate number of such participants in a survey would be challenging absent stratified sampling to target SNAP and WIC participating households. This document describes how the U.S. Census Bureau (which is planning to conduct future versions of the FoodAPS survey on behalf of USDA) created sampling strata to flag the FoodAPS targeted subpopulations using machine learning applications in linked survey and administrative data. We describe the data, modeling techniques, and how well the sampling flags target low-income households and households receiving WIC and SNAP benefits. We additionally situate these efforts in the nascent literature on the use of big data and machine learning for the improvement of survey efficiency.
    View Full Paper PDF
  • Working Paper

    Incorporating Administrative Data in Survey Weights for the 2018-2022 Survey of Income and Program Participation

    October 2024

    Working Paper Number:

    CES-24-58

    Response rates to the Survey of Income and Program Participation (SIPP) have declined over time, raising the potential for nonresponse bias in survey estimates. A potential solution is to leverage administrative data from government agencies and third-party data providers when constructing survey weights. In this paper, we modify various parts of the SIPP weighting algorithm to incorporate such data. We create these new weights for the 2018 through 2022 SIPP panels and examine how the new weights affect survey estimates. Our results show that before weighting adjustments, SIPP respondents in these panels have higher socioeconomic status than the general population. Existing weighting procedures reduce many of these differences. Comparing SIPP estimates between the production weights and the administrative data-based weights yields changes that are not uniform across the joint income and program participation distribution. Unlike other Census Bureau household surveys, there is no large increase in nonresponse bias in SIPP due to the COVID-19 Pandemic. In summary, the magnitude and sign of nonresponse bias in SIPP is complicated, and the existing weighting procedures may change the sign of nonresponse bias for households with certain incomes and program benefit statuses.
    View Full Paper PDF
  • Working Paper

    Gradient Boosting to Address Statistical Problems Arising from Non-Linkage of Census Bureau Datasets

    June 2024

    Working Paper Number:

    CES-24-27

    This article introduces the twangRDC package, which contains functions to address non-linkage in US Census Bureau datasets. The Census Bureau's Person Identification Validation System facilitates data linkage by assigning unique person identifiers to federal, third party, decennial census, and survey data. Not all records in these datasets can be linked to the reference file and as such not all records will be assigned an identifier. This article is a tutorial for using the twangRDC to generate nonresponse weights to account for non-linkage of person records across US Census Bureau datasets.
    View Full Paper PDF
  • Working Paper

    Incorporating Administrative Data in Survey Weights for the Basic Monthly Current Population Survey

    January 2024

    Working Paper Number:

    CES-24-02

    Response rates to the Current Population Survey (CPS) have declined over time, raising the potential for nonresponse bias in key population statistics. A potential solution is to leverage administrative data from government agencies and third-party data providers when constructing survey weights. In this paper, we take two approaches. First, we use administrative data to build a non-parametric nonresponse adjustment step while leaving the calibration to population estimates unchanged. Second, we use administratively linked data in the calibration process, matching income data from the Internal Return Service and state agencies, demographic data from the Social Security Administration and the decennial census, and industry data from the Census Bureau's Business Register to both responding and nonresponding households. We use the matched data in the household nonresponse adjustment of the CPS weighting algorithm, which changes the weights of respondents to account for differential nonresponse rates among subpopulations. After running the experimental weighting algorithm, we compare estimates of the unemployment rate and labor force participation rate between the experimental weights and the production weights. Before March 2020, estimates of the labor force participation rates using the experimental weights are 0.2 percentage points higher than the original estimates, with minimal effect on unemployment rate. After March 2020, the new labor force participation rates are similar, but the unemployment rate is about 0.2 percentage points higher in some months during the height of COVID-related interviewing restrictions. These results are suggestive that if there is any nonresponse bias present in the CPS, the magnitude is comparable to the typical margin of error of the unemployment rate estimate. Additionally, the results are overall similar across demographic groups and states, as well as using alternative weighting methodology. Finally, we discuss how our estimates compare to those from earlier papers that calculate estimates of bias in key CPS labor force statistics. This paper is for research purposes only. No changes to production are being implemented at this time.
    View Full Paper PDF
  • Working Paper

    The 2010 Census Confidentiality Protections Failed, Here's How and Why

    December 2023

    Working Paper Number:

    CES-23-63

    Using only 34 published tables, we reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records. Using the 38-bin age variable tabulated at the census block level, at most 20.1% of reconstructed records can differ from their confidential source on even a single value for these five variables. Using only published data, an attacker can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. The tabular publications in Summary File 1 thus have prohibited disclosure risk similar to the unreleased confidential microdata. Reidentification studies confirm that an attacker can, within blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with nonmodal characteristics) with 95% accuracy, the same precision as the confidential data achieve and far greater than statistical baselines. The flaw in the 2010 Census framework was the assumption that aggregation prevented accurate microdata reconstruction, justifying weaker disclosure limitation methods than were applied to 2010 Census public microdata. The framework used for 2020 Census publications defends against attacks that are based on reconstruction, as we also demonstrate here. Finally, we show that alternatives to the 2020 Census Disclosure Avoidance System with similar accuracy (enhanced swapping) also fail to protect confidentiality, and those that partially defend against reconstruction attacks (incomplete suppression implementations) destroy the primary statutory use case: data for redistricting all legislatures in the country in compliance with the 1965 Voting Rights Act.
    View Full Paper PDF
  • Working Paper

    Methodology on Creating the U.S. Linked Retail Health Clinic (LiRHC) Database

    March 2023

    Working Paper Number:

    CES-23-10

    Retail health clinics (RHCs) are a relatively new type of health care setting and understanding the role they play as a source of ambulatory care in the United States is important. To better understand these settings, a joint project by the Census Bureau and National Center for Health Statistics used data science techniques to link together data on RHCs from Convenient Care Association, County Business Patterns Business Register, and National Plan and Provider Enumeration System to create the Linked RHC (LiRHC, pronounced 'lyric') database of locations throughout the United States during the years 2018 to 2020. The matching methodology used to perform this linkage is described, as well as the benchmarking, match statistics, and manual review and quality checks used to assess the resulting matched data. The large majority (81%) of matches received quality scores at or above 75/100, and most matches were linked in the first two (of eight) matching passes, indicating high confidence in the final linked dataset. The LiRHC database contained 2,000 RHCs and found that 97% of these clinics were in metropolitan statistical areas and 950 were in the South region of the United States. Through this collaborative effort, the Census Bureau and National Center for Health Statistics strive to understand how RHCs can potentially impact population health as well as the access and provision of health care services across the nation.
    View Full Paper PDF
  • Working Paper

    National Experimental Wellbeing Statistics - Version 1

    February 2023

    Working Paper Number:

    CES-23-04

    This is the U.S. Census Bureau's first release of the National Experimental Wellbeing Statistics (NEWS) project. The NEWS project aims to produce the best possible estimates of income and poverty given all available survey and administrative data. We link survey, decennial census, administrative, and third-party data to address measurement error in income and poverty statistics. We estimate improved (pre-tax money) income and poverty statistics for 2018 by addressing several possible sources of bias documented in prior research. We address biases from 1) unit nonresponse through improved weights, 2) missing income information in both survey and administrative data through improved imputation, and 3) misreporting by combining or replacing survey responses with administrative information. Reducing survey error substantially affects key measures of well-being: We estimate median household income is 6.3 percent higher than in survey estimates, and poverty is 1.1 percentage points lower. These changes are driven by subpopulations for which survey error is particularly relevant. For house holders aged 65 and over, median household income is 27.3 percent higher and poverty is 3.3 percentage points lower than in survey estimates. We do not find a significant impact on median household income for householders under 65 or on child poverty. Finally, we discuss plans for future releases: addressing other potential sources of bias, releasing additional years of statistics, extending the income concepts measured, and including smaller geographies such as state and county.
    View Full Paper PDF
  • Working Paper

    Introducing the Medical Expenditure Panel Survey-Insurance Component with Administrative Records (MEPS-ICAR): Description, Data Construction Methodology, and Quality Assessment

    August 2022

    Working Paper Number:

    CES-22-29

    This report introduces a new dataset, the Medical Expenditure Panel Survey-Insurance Component with Administrative Records (MEPS-ICAR), consisting of MEPS-IC survey data on establishments and their health insurance benefits packages linked to Decennial Census data and administrative tax records on MEPS-IC establishments' workforces. These data include new measures of the characteristics of MEPS-IC establishments' parent firms, employee turnover, the full distribution of MEPS-IC workers' personal and family incomes, the geographic locations where those workers live, and improved workforce demographic detail. Next, this report details the methods used for producing the MEPS-ICAR. Broadly, the linking process begins by matching establishments' parent firms to their workforces using identifiers appearing in tax records. The linking process concludes by matching establishments to their own workforces by identifying the subset of their parent firm's workforce that best matches the expected size, total payroll, and residential geographic distribution of the establishment's workforce. Finally, this report presents statistics characterizing the match rate and the MEPS-ICAR data itself. Key results include that match rates are consistently high (exceeding 90%) across nearly all data subgroups and that the matched data exhibit a reasonable distribution of employment, payroll, and worker commute distances relative to expectations and external benchmarks. Notably, employment measures derived from tax records, but not used in the match itself, correspond with high fidelity to the employment levels that establishments report in the MEPS-IC. Cumulatively, the construction of the MEPS-ICAR significantly expands the capabilities of the MEPS-IC and presents many opportunities for analysts.
    View Full Paper PDF
  • Working Paper

    Determination of the 2020 U.S. Citizen Voting Age Population (CVAP) Using Administrative Records and Statistical Methodology Technical Report

    October 2020

    Working Paper Number:

    CES-20-33

    This report documents the efforts of the Census Bureau's Citizen Voting-Age Population (CVAP) Internal Expert Panel (IEP) and Technical Working Group (TWG) toward the use of multiple data sources to produce block-level statistics on the citizen voting-age population for use in enforcing the Voting Rights Act. It describes the administrative, survey, and census data sources used, and the four approaches developed for combining these data to produce CVAP estimates. It also discusses other aspects of the estimation process, including how records were linked across the multiple data sources, and the measures taken to protect the confidentiality of the data.
    View Full Paper PDF
  • Working Paper

    Nonemployer Statistics by Demographics (NES-D): Exploring Longitudinal Consistency and Sub-national Estimates

    December 2019

    Working Paper Number:

    CES-19-34

    Until recently, the quinquennial Survey of Business Owners (SBO) was the only source of information for U.S. employer and nonemployer businesses by owner demographic characteristics such as race, ethnicity, sex and veteran status. Now, however, the Nonemployer Statistics by Demographics series (NES-D) will replace the SBO's nonemployer component with reliable, and more frequent (annual) business demographic estimates with no additional respondent burden, and at lower imputation rates and costs. NES-D is not a survey; rather, it exploits existing administrative and census records to assign demographic characteristics to the universe of approximately 25 million (as of 2016) nonemployer businesses. Although only in the second year of its research phase, NES-D is rapidly moving towards production, with a planned prototype or experimental version release of 2017 nonemployer data in 2020, followed by annual releases of the series. After the first year of research, we released a working paper (Luque et al., 2019) that assessed the viability of estimating nonemployer demographics exclusively with administrative records (AR) and census data. That paper used one year of data (2015) to produce preliminary tabulations of business counts at the national level. This year we expand that research in multiple ways by: i) examining the longitudinal consistency of administrative and census records coverage, and of our AR-based demographics estimates, ii) evaluating further coverage from additional data sources, iii) exploring estimates at the sub-national level, iv) exploring estimates by industrial sector, v) examining demographics estimates of business receipts as well as of counts, and vi) implementing imputation of missing demographic values. Our current results are consistent with the main findings in Luque et al. (2019), and show that high coverage and demographic assignment rates are not the exception, but the norm. Specifically, we find that AR coverage rates are high and stable over time for each of the three years we examine, 2014-2016. We are able to identify owners for approximately 99 percent of nonemployer businesses (excluding C-corporations), 92 to 93 percent of identified nonemployer owners have no missing demographics, and only about 1 percent are missing three or more demographic characteristics in each of the three years. We also find that our demographics estimates are stable over time, with expected small annual changes that are consistent with underlying population trends in the U.S.. Due to data limitations, these results do not include C-corporations, which represent only 2 percent of nonemployer businesses and 4 percent of receipts. Without added respondent burden and at lower imputation rates and costs, NES-D will provide high-quality business demographics estimates at a higher frequency (annual vs. every 5 years) than the SBO.
    View Full Paper PDF