CREAT: Census Research Exploration and Analysis Tool

Papers Containing Keywords(s): 'data'

The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.
Click here to search again

Frequently Occurring Concepts within this Search

National Science Foundation - 36

Internal Revenue Service - 36

American Community Survey - 35

Center for Economic Studies - 35

Social Security Administration - 29

Service Annual Survey - 29

Research Data Center - 27

Current Population Survey - 24

Protected Identification Key - 22

Bureau of Labor Statistics - 22

Longitudinal Employer Household Dynamics - 20

North American Industry Classification System - 20

Cornell University - 20

Survey of Income and Program Participation - 19

Census Bureau Disclosure Review Board - 18

2010 Census - 18

Decennial Census - 17

Economic Census - 17

Social Security Number - 16

Person Validation System - 16

Master Address File - 16

Business Register - 16

Longitudinal Business Database - 16

Social Security - 15

Employer Identification Numbers - 14

Standard Industrial Classification - 14

Quarterly Workforce Indicators - 13

Disclosure Review Board - 12

Center for Administrative Records Research and Applications - 12

Special Sworn Status - 12

Person Identification Validation System - 11

Personally Identifiable Information - 11

Administrative Records - 11

Bureau of Economic Analysis - 11

Housing and Urban Development - 10

Census Bureau Business Register - 10

Alfred P Sloan Foundation - 10

Annual Survey of Manufactures - 10

Longitudinal Research Database - 10

National Opinion Research Center - 10

Department of Housing and Urban Development - 9

Indian Health Service - 9

National Center for Health Statistics - 9

Standard Statistical Establishment List - 9

County Business Patterns - 9

Business Dynamics Statistics - 9

Chicago Census Research Data Center - 9

MAFID - 8

SSA Numident - 8

Federal Statistical Research Data Center - 8

Computer Assisted Personal Interview - 7

Statistics Canada - 7

Quarterly Census of Employment and Wages - 7

Metropolitan Statistical Area - 7

Duke University - 7

American Statistical Association - 7

Public Use Micro Sample - 7

Census Bureau Master Address File - 6

Individual Taxpayer Identification Numbers - 6

Indian Housing Information Center - 6

Agency for Healthcare Research and Quality - 6

American Housing Survey - 6

Company Organization Survey - 6

Unemployment Insurance - 6

Medicaid Services - 6

Census of Manufactures - 6

Postal Service - 6

LEHD Program - 6

Supplemental Nutrition Assistance Program - 5

Sloan Foundation - 5

Census Numident - 5

Census Bureau Person Identification Validation System - 5

Some Other Race - 5

National Institute on Aging - 5

University of Michigan - 5

Small Business Administration - 5

Office of Management and Budget - 5

Cornell Institute for Social and Economic Research - 5

PIKed - 5

University of Chicago - 5

American Economic Association - 5

Federal Reserve Bank - 5

National Bureau of Economic Research - 5

Local Employment Dynamics - 5

Permanent Plant Number - 5

Journal of Economic Literature - 5

Ordinary Least Squares - 4

1940 Census - 4

W-2 - 4

Census Edited File - 4

National Institutes of Health - 4

Health and Retirement Study - 4

National Longitudinal Survey of Youth - 4

Census of Manufacturing Firms - 4

Probability Density Function - 4

Minnesota Population Center - 4

Center for Administrative Records Research - 4

Organization for Economic Cooperation and Development - 4

Characteristics of Business Owners - 4

Total Factor Productivity - 4

Federal Insurance Contribution Act - 3

Social and Economic Supplement - 3

ASEC - 3

Adjusted Gross Income - 3

Temporary Assistance for Needy Families - 3

Geographic Information Systems - 3

Department of Economics - 3

COVID-19 - 3

National Income and Product Accounts - 3

Bureau of Labor - 3

Centers for Medicare - 3

Census Bureau Longitudinal Business Database - 3

Centers for Disease Control and Prevention - 3

Employer Characteristics File - 3

Department of Health and Human Services - 3

National Research Council - 3

Computer Assisted Telephone Interviews and Computer Assisted Personal Interviews - 3

CATI - 3

Census Bureau Center for Economic Studies - 3

Census 2000 - 3

Office of Personnel Management - 3

Census Bureau Business Dynamics Statistics - 3

COMPUSTAT - 3

Securities and Exchange Commission - 3

survey - 53

respondent - 44

statistical - 43

microdata - 41

datasets - 38

census bureau - 36

record - 36

agency - 35

data census - 31

census data - 26

estimating - 23

population - 23

database - 23

report - 22

disclosure - 19

analysis - 19

statistician - 17

confidentiality - 16

matching - 16

research - 16

survey data - 15

information - 15

privacy - 15

imputation - 15

researcher - 15

aggregate - 14

use census - 12

census survey - 12

census research - 12

statistical agencies - 12

payroll - 11

study - 11

estimation - 10

earnings - 10

sampling - 10

sample - 10

coverage - 10

public - 10

records census - 10

linkage - 10

employee - 10

workforce - 10

research census - 10

publicly - 9

resident - 9

identifier - 9

census records - 9

quarterly - 9

economic census - 9

business data - 9

matched - 9

economist - 9

sector - 9

2010 census - 8

assessed - 8

federal - 8

statistical disclosure - 8

employed - 8

enterprise - 8

longitudinal - 8

employment data - 8

employee data - 8

ssa - 7

census years - 7

residential - 7

residence - 7

household surveys - 7

reporting - 7

census use - 7

aggregation - 7

inference - 7

associate - 7

econometric - 7

estimator - 6

enrollment - 6

irs - 6

income data - 6

ethnicity - 6

race census - 6

census employment - 6

department - 6

work census - 6

information census - 6

recession - 6

surveys censuses - 6

percentile - 6

censuses surveys - 6

sale - 6

expenditure - 6

employ - 6

model - 6

census file - 6

industrial - 6

minority - 5

salary - 5

census linked - 5

citizen - 5

provided census - 5

race - 5

state - 5

housing - 5

assessing - 5

housing survey - 5

establishments data - 5

market - 5

analyst - 5

social - 5

worker - 5

manufacturing - 5

macroeconomic - 5

average - 4

survey income - 4

population survey - 4

census disclosure - 4

income individuals - 4

tax - 4

taxpayer - 4

geographic - 4

linked census - 4

survey households - 4

hispanic - 4

census 2020 - 4

home - 4

imputation model - 4

incorporated - 4

policymakers - 4

gdp - 4

employment statistics - 4

establishment - 4

investment - 4

labor - 4

trend - 4

earner - 3

household income - 3

1040 - 3

environmental - 3

impact - 3

disparity - 3

discrepancy - 3

racial - 3

empirical - 3

classification - 3

prevalence - 3

apartment - 3

unobserved - 3

organizational - 3

acquisition - 3

economic statistics - 3

classifying - 3

employer household - 3

imputed - 3

ancestry - 3

ethnic - 3

bias - 3

census responses - 3

worker demographics - 3

production - 3

manufacturer - 3

inventory - 3

employment dynamics - 3

workforce indicators - 3

classified - 3

measures employment - 3

employment measures - 3

company - 3

Viewing papers 1 through 10 of 94


  • Working Paper

    Earnings Measurement Error, Nonresponse and Administrative Mismatch in the CPS

    July 2025

    Working Paper Number:

    CES-25-48

    Using the Current Population Survey Annual Social and Economic Supplement matched to Social Security Administration Detailed Earnings Records, we link observations across consecutive years to investigate a relationship between item nonresponse and measurement error in the earnings questions. Linking individuals across consecutive years allows us to observe switching from response to nonresponse and vice versa. We estimate OLS, IV, and finite mixture models that allow for various assumptions separately for men and women. We find that those who respond in both years of the survey exhibit less measurement error than those who respond in one year. Our findings suggest a trade-off between survey response and data quality that should be considered by survey designers, data collectors, and data users.
    View Full Paper PDF
  • Working Paper

    Potential Bias When Using Administrative Data to Measure the Family Income of School-Aged Children

    January 2025

    Working Paper Number:

    CES-25-03

    Researchers and practitioners increasingly rely on administrative data sources to measure family income. However, administrative data sources are often incomplete in their coverage of the population, giving rise to potential bias in family income measures, particularly if coverage deficiencies are not well understood. We focus on the school-aged child population, due to its particular import to research and policy, and because of the unique challenges of linking children to family income information. We find that two of the most significant administrative sources of family income information that permit linking of children and parents'IRS Form 1040 and SNAP participation records'usefully complement each other, potentially reducing coverage bias when used together. In a case study considering how best to measure economic disadvantage rates in the public school student population, we demonstrate the sensitivity of family income statistics to assumptions about individuals who do not appear in administrative data sources.
    View Full Paper PDF
  • Working Paper

    The Privacy-Protected Gridded Environmental Impacts Frame

    December 2024

    Working Paper Number:

    CES-24-74

    This paper introduces the Gridded Environmental Impacts Frame (Gridded EIF), a novel privacy-protected dataset derived from the U.S. Census Bureau's confidential Environmental Impacts Frame (EIF) microdata infrastructure. The EIF combines comprehensive administrative records and survey data on the U.S. population with high-resolution geospatial information on environmental hazards. While access to the EIF is restricted due to the confidential nature of the underlying data, the Gridded EIF offers a broader research community the opportunity to glean insights from the data while preserving confidentiality. We describe the data and privacy protection process, and offer guidance on appropriate usage, presenting practical applications.
    View Full Paper PDF
  • Working Paper

    The Census Historical Environmental Impacts Frame

    October 2024

    Working Paper Number:

    CES-24-66

    The Census Bureau's Environmental Impacts Frame (EIF) is a microdata infrastructure that combines individual-level information on residence, demographics, and economic characteristics with environmental amenities and hazards from 1999 through the present day. To better understand the long-run consequences and intergenerational effects of exposure to a changing environment, we expand the EIF by extending it backward to 1940. The Historical Environmental Impacts Frame (HEIF) combines the Census Bureau's historical administrative data, publicly available 1940 address information from the 1940 Decennial Census, and historical environmental data. This paper discusses the creation of the HEIF as well as the unique challenges that arise with using the Census Bureau's historical administrative data.
    View Full Paper PDF
  • Working Paper

    Nonresponse and Coverage Bias in the Household Pulse Survey: Evidence from Administrative Data

    October 2024

    Working Paper Number:

    CES-24-60

    The Household Pulse Survey (HPS) conducted by the U.S. Census Bureau is a unique survey that provided timely data on the effects of the COVID-19 Pandemic on American households and continues to provide data on other emergent social and economic issues. Because the survey has a response rate in the single digits and only has an online response mode, there are concerns about nonresponse and coverage bias. In this paper, we match administrative data from government agencies and third-party data to HPS respondents to examine how representative they are of the U.S. population. For comparison, we create a benchmark of American Community Survey (ACS) respondents and nonrespondents and include the ACS respondents as another point of reference. Overall, we find that the HPS is less representative of the U.S. population than the ACS. However, performance varies across administrative variables, and the existing weighting adjustments appear to greatly improve the representativeness of the HPS. Additionally, we look at household characteristics by their email domain to examine the effects on coverage from limiting email messages in 2023 to addresses from the contact frame with at least 90% deliverability rates, finding no clear change in the representativeness of the HPS afterwards.
    View Full Paper PDF
  • Working Paper

    Gradient Boosting to Address Statistical Problems Arising from Non-Linkage of Census Bureau Datasets

    June 2024

    Working Paper Number:

    CES-24-27

    This article introduces the twangRDC package, which contains functions to address non-linkage in US Census Bureau datasets. The Census Bureau's Person Identification Validation System facilitates data linkage by assigning unique person identifiers to federal, third party, decennial census, and survey data. Not all records in these datasets can be linked to the reference file and as such not all records will be assigned an identifier. This article is a tutorial for using the twangRDC to generate nonresponse weights to account for non-linkage of person records across US Census Bureau datasets.
    View Full Paper PDF
  • Working Paper

    Revisiting Methods to Assign Responses when Race and Hispanic Origin Reporting are Discrepant Across Administrative Records and Third Party Sources

    May 2024

    Authors: James M. Noon

    Working Paper Number:

    CES-24-26

    The Best Race and Ethnicity Administrative Records Composite file ('Best Race file') is an composite file which combines Census, federal, and Third Party Data (TPD) sources and applies business rules to assign race and ethnicity values to person records. The first version of the Best Race administrative records composite was first constructed in 2015 and subsequently updated each year to include more recent vintages, when available, of the data sources originally included in the composite file. Where updates were available for data sources, the most recent information for persons was retained, and the business rules were reapplied to assign a single race and single Hispanic origin value to each person record. The majority of person records on the Best Race file have consistent race and ethnicity information across data sources. Where there are discrepancies in responses across data sources, we apply a series of business rules to assign a single race and ethnicity to each record. To improve the quality of the Best Race administrative records composite, we have begun revising the business rules which were developed several years ago. This paper discusses the original business rules as well as the implemented changes and their impact on the composite file.
    View Full Paper PDF
  • Working Paper

    An In-Depth Examination of Requirements for Disclosure Risk Assessment

    October 2023

    Working Paper Number:

    CES-23-49

    The use of formal privacy to protect the confidentiality of responses in the 2020 Decennial Census of Population and Housing has triggered renewed interest and debate over how to measure the disclosure risks and societal benefits of the published data products. Following long-established precedent in economics and statistics, we argue that any proposal for quantifying disclosure risk should be based on pre-specified, objective criteria. Such criteria should be used to compare methodologies to identify those with the most desirable properties. We illustrate this approach, using simple desiderata, to evaluate the absolute disclosure risk framework, the counterfactual framework underlying differential privacy, and prior-to-posterior comparisons. We conclude that satisfying all the desiderata is impossible, but counterfactual comparisons satisfy the most while absolute disclosure risk satisfies the fewest. Furthermore, we explain that many of the criticisms levied against differential privacy would be levied against any technology that is not equivalent to direct, unrestricted access to confidential data. Thus, more research is needed, but in the near-term, the counterfactual approach appears best-suited for privacy-utility analysis.
    View Full Paper PDF
  • Working Paper

    Some Open Questions on Multiple-Source Extensions of Adaptive-Survey Design Concepts and Methods

    February 2023

    Working Paper Number:

    CES-23-03

    Adaptive survey design is a framework for making data-driven decisions about survey data collection operations. This paper discusses open questions related to the extension of adaptive principles and capabilities when capturing data from multiple data sources. Here, the concept of 'design' encompasses the focused allocation of resources required for the production of high-quality statistical information in a sustainable and cost-effective way. This conceptual framework leads to a discussion of six groups of issues including: (i) the goals for improvement through adaptation; (ii) the design features that are available for adaptation; (iii) the auxiliary data that may be available for informing adaptation; (iv) the decision rules that could guide adaptation; (v) the necessary systems to operationalize adaptation; and (vi) the quality, cost, and risk profiles of the proposed adaptations (and how to evaluate them). A multiple data source environment creates significant opportunities, but also introduces complexities that are a challenge in the production of high-quality statistical information.
    View Full Paper PDF
  • Working Paper

    The Impact of Household Surveys on 2020 Census Self-Response

    July 2022

    Working Paper Number:

    CES-22-24

    Households who were sampled in 2019 for the American Community Survey (ACS) had lower self-response rates to the 2020 Census. The magnitude varied from -1.5 percentage point for household sampled in January 2019 to -15.1 percent point for households sampled in December 2019. Similar effects are found for the Current Population Survey (CPS) as well.
    View Full Paper PDF