CREAT: Census Research Exploration and Analysis Tool

Papers Containing Keywords(s): 'census data'

The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.
Click here to search again

Frequently Occurring Concepts within this Search

American Community Survey - 42

Internal Revenue Service - 34

Protected Identification Key - 34

Current Population Survey - 31

Social Security Number - 30

Social Security Administration - 30

2010 Census - 29

Decennial Census - 27

Center for Economic Studies - 25

Census Bureau Disclosure Review Board - 21

Master Address File - 20

Bureau of Labor Statistics - 19

Longitudinal Employer Household Dynamics - 19

Disclosure Review Board - 19

Person Validation System - 19

Research Data Center - 19

Survey of Income and Program Participation - 17

Service Annual Survey - 17

Social Security - 16

Cornell University - 15

National Science Foundation - 15

North American Industry Classification System - 14

Business Register - 14

Employer Identification Numbers - 13

Personally Identifiable Information - 13

Standard Statistical Establishment List - 13

Standard Industrial Classification - 13

1940 Census - 12

Economic Census - 12

Housing and Urban Development - 12

Person Identification Validation System - 12

Administrative Records - 12

Federal Statistical Research Data Center - 12

MAFID - 11

Metropolitan Statistical Area - 11

Some Other Race - 10

SSA Numident - 10

American Housing Survey - 10

Longitudinal Business Database - 10

Department of Housing and Urban Development - 9

Supplemental Nutrition Assistance Program - 9

Census Numident - 9

Office of Management and Budget - 9

Individual Taxpayer Identification Numbers - 9

Alfred P Sloan Foundation - 9

Federal Tax Information - 9

Quarterly Workforce Indicators - 8

Quarterly Census of Employment and Wages - 8

Census Bureau Business Register - 8

Indian Health Service - 8

Annual Survey of Manufactures - 8

Census Edited File - 7

County Business Patterns - 7

Medicaid Services - 7

Computer Assisted Personal Interview - 7

American Economic Association - 7

Ordinary Least Squares - 7

National Opinion Research Center - 7

Unemployment Insurance - 7

Census Bureau Person Identification Validation System - 6

Core Based Statistical Area - 6

Cornell Institute for Social and Economic Research - 6

Business Dynamics Statistics - 6

Bureau of Economic Analysis - 6

Center for Administrative Records Research and Applications - 6

LEHD Program - 5

Employment History File - 5

Employer Characteristics File - 5

Individual Characteristics File - 5

Local Employment Dynamics - 5

Centers for Medicare - 5

Data Management System - 5

Temporary Assistance for Needy Families - 5

Indian Housing Information Center - 5

Statistics Canada - 5

Special Sworn Status - 5

Business Employment Dynamics - 5

PIKed - 5

Census 2000 - 5

Business Master File - 5

Business Register Bridge - 5

American Statistical Association - 5

Federal Reserve Bank - 5

Financial, Insurance and Real Estate Industries - 5

CDF - 4

Composite Person Record - 4

MAF-ARF - 4

Cumulative Density Function - 4

W-2 - 4

Social Science Research Institute - 4

National Longitudinal Survey of Youth - 4

Postal Service - 4

Department of Homeland Security - 4

Urban Institute - 4

University of Maryland - 4

Bureau of Labor - 4

Sloan Foundation - 4

Successor Predecessor File - 4

National Institute on Aging - 4

National Center for Health Statistics - 4

Securities and Exchange Commission - 4

National Bureau of Economic Research - 4

Establishment Micro Properties - 4

Agency for Healthcare Research and Quality - 4

Company Organization Survey - 3

Office of Personnel Management - 3

Department of Agriculture - 3

Census Bureau Master Address File - 3

Adjusted Gross Income - 3

Master Beneficiary Record - 3

Disability Insurance - 3

Census Household Composition Key - 3

General Education Development - 3

New England County Metropolitan - 3

Public Use Micro Sample - 3

Computer Assisted Telephone Interviews and Computer Assisted Personal Interviews - 3

Health and Retirement Study - 3

CATI - 3

Department of Justice - 3

Citizenship and Immigration Services - 3

Yale University - 3

Department of Health and Human Services - 3

National Institutes of Health - 3

Geographic Information Systems - 3

Small Business Administration - 3

Longitudinal Research Database - 3

Harvard University - 3

Journal of Labor Economics - 3

North American Industry Classi - 3

Chicago Census Research Data Center - 3

Census of Manufactures - 3

Economic Research Service - 3

Minnesota Population Center - 3

Organization for Economic Cooperation and Development - 3

Department of Labor - 3

General Accounting Office - 3

Permanent Plant Number - 3

Medical Expenditure Panel Survey - 3

survey - 43

population - 37

census bureau - 36

data census - 32

respondent - 31

data - 26

use census - 21

resident - 20

agency - 18

statistical - 17

record - 17

ethnicity - 15

census survey - 15

datasets - 15

microdata - 15

citizen - 15

census research - 15

report - 14

economic census - 14

hispanic - 13

research census - 13

residence - 13

residential - 12

housing - 12

census use - 12

minority - 11

disparity - 11

neighborhood - 11

information census - 10

payroll - 10

estimating - 10

census records - 10

assessed - 9

workforce - 9

2010 census - 9

census linked - 9

matching - 9

database - 9

disadvantaged - 9

census responses - 8

employed - 8

irs - 8

poverty - 8

metropolitan - 8

census years - 8

records census - 8

immigrant - 8

socioeconomic - 8

longitudinal - 8

work census - 7

employ - 7

censuses surveys - 7

linked census - 7

imputation - 7

employee - 7

census file - 7

percentile - 6

census employment - 6

provided census - 6

sampling - 6

household surveys - 6

coverage - 6

race - 6

linkage - 6

racial - 6

enrollment - 6

race census - 6

employer household - 6

ethnic - 6

migration - 6

expenditure - 6

employment data - 5

employment statistics - 5

employee data - 5

medicaid - 5

prevalence - 5

urban - 5

geographic - 5

identifier - 5

federal - 5

disclosure - 5

family - 5

immigration - 5

confidentiality - 5

rural - 5

enterprise - 5

statistician - 5

longitudinal employer - 5

ancestry - 5

migrant - 5

census business - 5

labor - 5

aging - 5

decade - 4

ssa - 4

survey households - 4

urbanization - 4

district - 4

native - 4

bias - 4

census household - 4

tax - 4

unemployed - 4

survey income - 4

analysis - 4

information - 4

state - 4

privacy - 4

quarterly - 4

recession - 4

researcher - 4

research - 4

employment dynamics - 4

revenue - 4

aggregate - 4

matched - 4

white - 4

census disclosure - 3

census 2020 - 3

eligible - 3

population survey - 3

country - 3

city - 3

geography - 3

urbanized - 3

impact - 3

environmental - 3

amenity - 3

intergenerational - 3

black - 3

estimator - 3

citizenship - 3

1040 - 3

segregation - 3

child - 3

assessing - 3

yearly - 3

business data - 3

businesses census - 3

geographically - 3

suburb - 3

community - 3

workplace - 3

worker - 3

clerical - 3

surveys censuses - 3

residing - 3

firms census - 3

study - 3

demography - 3

migrating - 3

suburbanization - 3

associate - 3

econometric - 3

Viewing papers 1 through 10 of 73


  • Working Paper

    A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census

    August 2025

    Working Paper Number:

    CES-25-57

    For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level'individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all of the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act.
    View Full Paper PDF
  • Working Paper

    LODES Design and Methodology Report: Methodology Version 7

    August 2025

    Working Paper Number:

    CES-25-52

    The purpose of this report is to document the important features of Version 7 of the LEHD Origin-Destination Employment Statistics (LODES) processing system. This includes data sources, data processing methodology, confidentiality protection methodology, some quality measures, and a high-level description of the published data. The intended audience for this document includes LODES data users, Local Employment Dynamics (LED) Partnership members, U.S. Census Bureau management, program quality auditors, and current and future research and development staff members.
    View Full Paper PDF
  • Working Paper

    The Design of Sampling Strata for the National Household Food Acquisition and Purchase Survey

    February 2025

    Working Paper Number:

    CES-25-13

    The National Household Food Acquisition and Purchase Survey (FoodAPS), sponsored by the United States Department of Agriculture's (USDA) Economic Research Service (ERS) and Food and Nutrition Service (FNS), examines the food purchasing behavior of various subgroups of the U.S. population. These subgroups include participants in the Supplemental Nutrition Assistance Program (SNAP) and the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC), as well as households who are eligible for but don't participate in these programs. Participants in these social protection programs constitute small proportions of the U.S. population; obtaining an adequate number of such participants in a survey would be challenging absent stratified sampling to target SNAP and WIC participating households. This document describes how the U.S. Census Bureau (which is planning to conduct future versions of the FoodAPS survey on behalf of USDA) created sampling strata to flag the FoodAPS targeted subpopulations using machine learning applications in linked survey and administrative data. We describe the data, modeling techniques, and how well the sampling flags target low-income households and households receiving WIC and SNAP benefits. We additionally situate these efforts in the nascent literature on the use of big data and machine learning for the improvement of survey efficiency.
    View Full Paper PDF
  • Working Paper

    Applying Current Core Based Statistical Area Standards to Historical Census Data, 1940-2020

    January 2025

    Authors: Todd Gardner

    Working Paper Number:

    CES-25-10

    In the middle of the twentieth century, the Bureau of the Budget, in conjunction with the Census Bureau and other federal statistical agencies, introduced a widely used unit of statistical geography, the county-based Standard Metropolitan Area. Metropolitan definitions since then have been generally regarded as comparable, but methodological changes have resulted in comparability issues, particularly among the largest and most complex metro areas. With the 2000 census came an effort to simplify the rules for defining metro areas. This study attempts to gather all available historical geographic and commuting data to apply the current rules for defining metro areas to create comparable statistical geography covering the period from 1940 to 2020. The changes that accompanied the 2000 census also brought a new category, "Micropolitan Statistical Areas," which established a metro hierarchy. This research expands on this approach, using a more elaborate hierarchy based on the size of urban cores. The areas as delineated in this paper provide a consistent set of statistical geography that can be used in a wide variety of applications.
    View Full Paper PDF
  • Working Paper

    The Census Historical Environmental Impacts Frame

    October 2024

    Working Paper Number:

    CES-24-66

    The Census Bureau's Environmental Impacts Frame (EIF) is a microdata infrastructure that combines individual-level information on residence, demographics, and economic characteristics with environmental amenities and hazards from 1999 through the present day. To better understand the long-run consequences and intergenerational effects of exposure to a changing environment, we expand the EIF by extending it backward to 1940. The Historical Environmental Impacts Frame (HEIF) combines the Census Bureau's historical administrative data, publicly available 1940 address information from the 1940 Decennial Census, and historical environmental data. This paper discusses the creation of the HEIF as well as the unique challenges that arise with using the Census Bureau's historical administrative data.
    View Full Paper PDF
  • Working Paper

    Nonresponse and Coverage Bias in the Household Pulse Survey: Evidence from Administrative Data

    October 2024

    Working Paper Number:

    CES-24-60

    The Household Pulse Survey (HPS) conducted by the U.S. Census Bureau is a unique survey that provided timely data on the effects of the COVID-19 Pandemic on American households and continues to provide data on other emergent social and economic issues. Because the survey has a response rate in the single digits and only has an online response mode, there are concerns about nonresponse and coverage bias. In this paper, we match administrative data from government agencies and third-party data to HPS respondents to examine how representative they are of the U.S. population. For comparison, we create a benchmark of American Community Survey (ACS) respondents and nonrespondents and include the ACS respondents as another point of reference. Overall, we find that the HPS is less representative of the U.S. population than the ACS. However, performance varies across administrative variables, and the existing weighting adjustments appear to greatly improve the representativeness of the HPS. Additionally, we look at household characteristics by their email domain to examine the effects on coverage from limiting email messages in 2023 to addresses from the contact frame with at least 90% deliverability rates, finding no clear change in the representativeness of the HPS afterwards.
    View Full Paper PDF
  • Working Paper

    Gradient Boosting to Address Statistical Problems Arising from Non-Linkage of Census Bureau Datasets

    June 2024

    Working Paper Number:

    CES-24-27

    This article introduces the twangRDC package, which contains functions to address non-linkage in US Census Bureau datasets. The Census Bureau's Person Identification Validation System facilitates data linkage by assigning unique person identifiers to federal, third party, decennial census, and survey data. Not all records in these datasets can be linked to the reference file and as such not all records will be assigned an identifier. This article is a tutorial for using the twangRDC to generate nonresponse weights to account for non-linkage of person records across US Census Bureau datasets.
    View Full Paper PDF
  • Working Paper

    Revisiting Methods to Assign Responses when Race and Hispanic Origin Reporting are Discrepant Across Administrative Records and Third Party Sources

    May 2024

    Authors: James M. Noon

    Working Paper Number:

    CES-24-26

    The Best Race and Ethnicity Administrative Records Composite file ('Best Race file') is an composite file which combines Census, federal, and Third Party Data (TPD) sources and applies business rules to assign race and ethnicity values to person records. The first version of the Best Race administrative records composite was first constructed in 2015 and subsequently updated each year to include more recent vintages, when available, of the data sources originally included in the composite file. Where updates were available for data sources, the most recent information for persons was retained, and the business rules were reapplied to assign a single race and single Hispanic origin value to each person record. The majority of person records on the Best Race file have consistent race and ethnicity information across data sources. Where there are discrepancies in responses across data sources, we apply a series of business rules to assign a single race and ethnicity to each record. To improve the quality of the Best Race administrative records composite, we have begun revising the business rules which were developed several years ago. This paper discusses the original business rules as well as the implemented changes and their impact on the composite file.
    View Full Paper PDF
  • Working Paper

    Where Are Your Parents? Exploring Potential Bias in Administrative Records on Children

    March 2024

    Working Paper Number:

    CES-24-18

    This paper examines potential bias in the Census Household Composition Key's (CHCK) probabilistic parent-child linkages. By linking CHCK data to the American Community Survey (ACS), we reveal disparities in parent-child linkages among specific demographic groups and find that characteristics of children that can and cannot be linked to the CHCK vary considerably from the larger population. In particular, we find that children from low-income, less educated households and of Hispanic origin are less likely to be linked to a mother or a father in the CHCK. We also highlight some data considerations when using the CHCK.
    View Full Paper PDF
  • Working Paper

    The Changing Nature of Pollution, Income, and Environmental Inequality in the United States

    January 2024

    Working Paper Number:

    CES-24-04

    This paper uses administrative tax records linked to Census demographic data and high-resolution measures of fine small particulate (PM2.5) exposure to study the evolution of the Black-White pollution exposure gap over the past 40 years. In doing so, we focus on the various ways in which income may have contributed to these changes using a statistical decomposition. We decompose the overall change in the Black-White PM2.5 exposure gap into (1) components that stem from rank-preserving compression in the overall pollution distribution and (2) changes that stem from a reordering of Black and White households within the pollution distribution. We find a significant narrowing of the Black-White PM2.5 exposure gap over this time period that is overwhelmingly driven by rank-preserving changes rather than positional changes. However, the relative positions of Black and White households at the upper end of the pollution distribution have meaningfully shifted in the most recent years.
    View Full Paper PDF