CREAT: Census Research Exploration and Analysis Tool

Papers Containing Keywords(s): 'statistical'

The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.
Click here to search again

Frequently Occurring Concepts within this Search

National Science Foundation - 39

Center for Economic Studies - 36

Internal Revenue Service - 32

Cornell University - 31

Bureau of Labor Statistics - 30

American Community Survey - 27

Current Population Survey - 26

Social Security Administration - 23

North American Industry Classification System - 23

Census Bureau Disclosure Review Board - 21

Longitudinal Employer Household Dynamics - 21

Survey of Income and Program Participation - 20

Standard Industrial Classification - 18

Research Data Center - 18

Service Annual Survey - 17

Longitudinal Business Database - 16

Bureau of Economic Analysis - 15

Alfred P Sloan Foundation - 15

Longitudinal Research Database - 15

Employer Identification Numbers - 14

Economic Census - 14

Annual Survey of Manufactures - 13

Quarterly Workforce Indicators - 13

Business Register - 13

Social Security Number - 12

Ordinary Least Squares - 12

Federal Statistical Research Data Center - 12

County Business Patterns - 12

Metropolitan Statistical Area - 11

2010 Census - 11

Quarterly Census of Employment and Wages - 11

Special Sworn Status - 11

Protected Identification Key - 10

Disclosure Review Board - 10

Decennial Census - 10

Census of Manufactures - 10

Total Factor Productivity - 10

Social Security - 9

Office of Management and Budget - 9

Business Dynamics Statistics - 9

Unemployment Insurance - 9

National Longitudinal Survey of Youth - 8

Statistics Canada - 8

National Center for Health Statistics - 8

Cornell Institute for Social and Economic Research - 8

LEHD Program - 8

National Bureau of Economic Research - 7

Standard Statistical Establishment List - 7

Public Use Micro Sample - 7

Duke University - 7

American Statistical Association - 7

Chicago Census Research Data Center - 7

Census Bureau Longitudinal Business Database - 7

Detailed Earnings Records - 6

Person Validation System - 6

Master Address File - 6

Federal Reserve Bank - 6

Local Employment Dynamics - 6

PSID - 6

1940 Census - 5

Census Edited File - 5

Characteristics of Business Owners - 5

Housing and Urban Development - 5

Person Identification Validation System - 5

W-2 - 5

Computer Assisted Personal Interview - 5

National Academy of Sciences - 5

Personally Identifiable Information - 5

Census of Manufacturing Firms - 5

National Institutes of Health - 5

Department of Commerce - 5

Sloan Foundation - 5

Department of Labor - 5

Permanent Plant Number - 5

Department of Economics - 4

United States Census Bureau - 4

Some Other Race - 4

Comparison of Putative Reidentification Rates - 4

Social and Economic Supplement - 4

Financial, Insurance and Real Estate Industries - 4

Health and Retirement Study - 4

Census Bureau Business Register - 4

American Economic Association - 4

Small Business Administration - 4

Individual Characteristics File - 4

National Health Interview Survey - 4

Bureau of Labor - 4

National Institute on Aging - 4

Summary Earnings Records - 4

Company Organization Survey - 4

Journal of Economic Literature - 4

MAFID - 3

Federal Insurance Contribution Act - 3

Postal Service - 3

Department of Housing and Urban Development - 3

Supplemental Nutrition Assistance Program - 3

Agency for Healthcare Research and Quality - 3

Urban Institute - 3

American Housing Survey - 3

Temporary Assistance for Needy Families - 3

Department of Agriculture - 3

University of Michigan - 3

Employer Characteristics File - 3

Centers for Disease Control and Prevention - 3

North American Industry Classi - 3

Securities and Exchange Commission - 3

Cobb-Douglas - 3

Review of Economics and Statistics - 3

Organization for Economic Cooperation and Development - 3

University of Maryland - 3

survey - 44

data - 43

respondent - 37

estimating - 31

agency - 27

microdata - 27

population - 26

report - 25

census bureau - 25

statistician - 23

datasets - 20

data census - 18

census data - 17

estimation - 17

aggregate - 17

analysis - 16

disclosure - 16

economist - 16

earnings - 15

confidentiality - 15

percentile - 14

database - 14

privacy - 14

researcher - 14

use census - 12

record - 12

public - 12

econometric - 12

statistical agencies - 12

longitudinal - 12

estimator - 11

payroll - 11

employed - 11

workforce - 11

quarterly - 11

research - 11

salary - 10

aggregation - 10

information - 10

employ - 10

economic census - 9

labor - 9

imputation - 9

statistical disclosure - 9

publicly - 9

recession - 9

study - 9

employee - 9

sector - 8

federal - 8

manufacturing - 8

expenditure - 8

market - 8

resident - 8

macroeconomic - 8

inference - 8

sample - 7

census disclosure - 7

survey data - 7

sampling - 7

census survey - 7

socioeconomic - 7

labor statistics - 7

research census - 7

enterprise - 7

sale - 7

censuses surveys - 7

census research - 7

employee data - 7

assessed - 6

trend - 6

census employment - 6

production - 6

company - 6

analyst - 6

empirical - 6

establishment - 6

industrial - 6

minority - 5

disadvantaged - 5

average - 5

2010 census - 5

prevalence - 5

discrepancy - 5

gdp - 5

poverty - 5

income data - 5

revenue - 5

employment statistics - 5

census years - 5

social - 5

reporting - 5

model - 5

incorporated - 5

business data - 5

yearly - 5

aging - 5

growth - 5

ethnicity - 4

hispanic - 4

census responses - 4

ssa - 4

survey income - 4

demand - 4

household surveys - 4

enrollment - 4

mobility - 4

income year - 4

citizen - 4

regression - 4

policymakers - 4

assessing - 4

employment data - 4

metropolitan - 4

tenure - 4

measure - 4

employer household - 4

corporation - 4

bias - 3

decade - 3

population survey - 3

matching - 3

racial - 3

intergenerational - 3

residence - 3

eligibility - 3

regressing - 3

irs - 3

information census - 3

corporate - 3

productivity measures - 3

unobserved - 3

competitor - 3

surveys censuses - 3

economic statistics - 3

department - 3

linked census - 3

employment count - 3

imputation model - 3

work census - 3

regressors - 3

coverage - 3

produce - 3

family - 3

state - 3

establishments data - 3

longitudinal employer - 3

workforce indicators - 3

poorer - 3

worker - 3

merger - 3

productivity growth - 3

classified - 3

industrial classification - 3

classification - 3

classifying - 3

Viewing papers 1 through 10 of 92


  • Working Paper

    Revisiting the Unintended Consequences of Ban the Box

    August 2025

    Working Paper Number:

    CES-25-58

    Ban-the-Box (BTB) policies intend to help formerly incarcerated individuals find employment by delaying when employers can ask about criminal records. We revisit the finding in Doleac and Hansen (2020) that BTB causes statistical discrimination against minority men. We correct miscoded BTB laws and show that estimates from the Current Population Survey (CPS) remain quantitatively similar, while those from the American Community Survey (ACS) now fail to reject the null hypothesis of no effect of BTB on employment. In contrast to the published estimates, these ACS results are statistically significantly different from the CPS results, indicating a lack of robustness across datasets. We do not find evidence that these differences are due to sample composition or survey weights. There is limited evidence that these divergent results are explained by the different frequencies of these surveys. Differences in sample sizes may also lead to different estimates; the ACS has a much larger sample and more statistical power to detect effects near the corrected CPS estimates.
    View Full Paper PDF
  • Working Paper

    A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census

    August 2025

    Working Paper Number:

    CES-25-57

    For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level'individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all of the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act.
    View Full Paper PDF
  • Working Paper

    Earnings Measurement Error, Nonresponse and Administrative Mismatch in the CPS

    July 2025

    Working Paper Number:

    CES-25-48

    Using the Current Population Survey Annual Social and Economic Supplement matched to Social Security Administration Detailed Earnings Records, we link observations across consecutive years to investigate a relationship between item nonresponse and measurement error in the earnings questions. Linking individuals across consecutive years allows us to observe switching from response to nonresponse and vice versa. We estimate OLS, IV, and finite mixture models that allow for various assumptions separately for men and women. We find that those who respond in both years of the survey exhibit less measurement error than those who respond in one year. Our findings suggest a trade-off between survey response and data quality that should be considered by survey designers, data collectors, and data users.
    View Full Paper PDF
  • Working Paper

    Tapping Business and Household Surveys to Sharpen Our View of Work from Home

    June 2025

    Working Paper Number:

    CES-25-36

    Timely business-level measures of work from home (WFH) are scarce for the U.S. economy. We review prior survey-based efforts to quantify the incidence and character of WFH and describe new questions that we developed and fielded for the Business Trends and Outlook Survey (BTOS). Drawing on more than 150,000 firm-level responses to the BTOS, we obtain four main findings. First, nearly a third of businesses have employees who work from home, with tremendous variation across sectors. The share of businesses with WFH employees is nearly ten times larger in the Information sector than in Accommodation and Food Services. Second, employees work from home about 1 day per week, on average, and businesses expect similar WFH levels in five years. Third, feasibility aside, businesses' largest concern with WFH relates to productivity. Seven percent of businesses find that onsite work is more productive, while two percent find that WFH is more productive. Fourth, there is a low level of tracking and monitoring of WFH activities, with 70% of firms reporting they do not track employee days in the office and 75% reporting they do not monitor employees when they work from home. These lessons serve as a starting point for enhancing WFH-related content in the American Community Survey and other household surveys.
    View Full Paper PDF
  • Working Paper

    Revisiting Methods to Assign Responses when Race and Hispanic Origin Reporting are Discrepant Across Administrative Records and Third Party Sources

    May 2024

    Authors: James M. Noon

    Working Paper Number:

    CES-24-26

    The Best Race and Ethnicity Administrative Records Composite file ('Best Race file') is an composite file which combines Census, federal, and Third Party Data (TPD) sources and applies business rules to assign race and ethnicity values to person records. The first version of the Best Race administrative records composite was first constructed in 2015 and subsequently updated each year to include more recent vintages, when available, of the data sources originally included in the composite file. Where updates were available for data sources, the most recent information for persons was retained, and the business rules were reapplied to assign a single race and single Hispanic origin value to each person record. The majority of person records on the Best Race file have consistent race and ethnicity information across data sources. Where there are discrepancies in responses across data sources, we apply a series of business rules to assign a single race and ethnicity to each record. To improve the quality of the Best Race administrative records composite, we have begun revising the business rules which were developed several years ago. This paper discusses the original business rules as well as the implemented changes and their impact on the composite file.
    View Full Paper PDF
  • Working Paper

    Mobility, Opportunity, and Volatility Statistics (MOVS): Infrastructure Files and Public Use Data

    April 2024

    Working Paper Number:

    CES-24-23

    Federal statistical agencies and policymakers have identified a need for integrated systems of household and personal income statistics. This interest marks a recognition that aggregated measures of income, such as GDP or average income growth, tell an incomplete story that may conceal large gaps in well-being between different types of individuals and families. Until recently, longitudinal income data that are rich enough to calculate detailed income statistics and include demographic characteristics, such as race and ethnicity, have not been available. The Mobility, Opportunity, and Volatility Statistics project (MOVS) fills this gap in comprehensive income statistics. Using linked demographic and tax records on the population of U.S. working-age adults, the MOVS project defines households and calculates household income, applying an equivalence scale to create a personal income concept, and then traces the progress of individuals' incomes over time. We then output a set of intermediate statistics by race-ethnicity group, sex, year, base-year state of residence, and base-year income decile. We select the intermediate statistics most useful in developing more complex intragenerational income mobility measures, such as transition matrices, income growth curves, and variance-based volatility statistics. We provide these intermediate statistics as part of a publicly released data tool with downloadable flat files and accompanying documentation. This paper describes the data build process and the output files, including a brief analysis highlighting the structure and content of our main statistics.
    View Full Paper PDF
  • Working Paper

    The Changing Nature of Pollution, Income, and Environmental Inequality in the United States

    January 2024

    Working Paper Number:

    CES-24-04

    This paper uses administrative tax records linked to Census demographic data and high-resolution measures of fine small particulate (PM2.5) exposure to study the evolution of the Black-White pollution exposure gap over the past 40 years. In doing so, we focus on the various ways in which income may have contributed to these changes using a statistical decomposition. We decompose the overall change in the Black-White PM2.5 exposure gap into (1) components that stem from rank-preserving compression in the overall pollution distribution and (2) changes that stem from a reordering of Black and White households within the pollution distribution. We find a significant narrowing of the Black-White PM2.5 exposure gap over this time period that is overwhelmingly driven by rank-preserving changes rather than positional changes. However, the relative positions of Black and White households at the upper end of the pollution distribution have meaningfully shifted in the most recent years.
    View Full Paper PDF
  • Working Paper

    Incorporating Administrative Data in Survey Weights for the Basic Monthly Current Population Survey

    January 2024

    Working Paper Number:

    CES-24-02

    Response rates to the Current Population Survey (CPS) have declined over time, raising the potential for nonresponse bias in key population statistics. A potential solution is to leverage administrative data from government agencies and third-party data providers when constructing survey weights. In this paper, we take two approaches. First, we use administrative data to build a non-parametric nonresponse adjustment step while leaving the calibration to population estimates unchanged. Second, we use administratively linked data in the calibration process, matching income data from the Internal Return Service and state agencies, demographic data from the Social Security Administration and the decennial census, and industry data from the Census Bureau's Business Register to both responding and nonresponding households. We use the matched data in the household nonresponse adjustment of the CPS weighting algorithm, which changes the weights of respondents to account for differential nonresponse rates among subpopulations. After running the experimental weighting algorithm, we compare estimates of the unemployment rate and labor force participation rate between the experimental weights and the production weights. Before March 2020, estimates of the labor force participation rates using the experimental weights are 0.2 percentage points higher than the original estimates, with minimal effect on unemployment rate. After March 2020, the new labor force participation rates are similar, but the unemployment rate is about 0.2 percentage points higher in some months during the height of COVID-related interviewing restrictions. These results are suggestive that if there is any nonresponse bias present in the CPS, the magnitude is comparable to the typical margin of error of the unemployment rate estimate. Additionally, the results are overall similar across demographic groups and states, as well as using alternative weighting methodology. Finally, we discuss how our estimates compare to those from earlier papers that calculate estimates of bias in key CPS labor force statistics. This paper is for research purposes only. No changes to production are being implemented at this time.
    View Full Paper PDF
  • Working Paper

    A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report

    December 2023

    Working Paper Number:

    CES-23-63R

    For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level'individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all of the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act. You are reading the full technical report. For the summary paper see https://doi.org/10.1162/99608f92.4a1ebf70.
    View Full Paper PDF
  • Working Paper

    Collaborative Micro-productivity Project: Establishment-Level Productivity Dataset, 1972-2020

    December 2023

    Working Paper Number:

    CES-23-65

    We describe the process for building the Collaborative Micro-productivity Project (CMP) microdata and calculating establishment-level productivity numbers. The documentation is for version 7 and the data cover the years 1972-2020. These data have been used in numerous research papers and are used to create the experimental public-use data product Dispersion Statistics on Productivity (DiSP).
    View Full Paper PDF