CREAT: Census Research Exploration and Analysis Tool

Papers Containing Keywords(s): 'data'

The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.
Click here to search again

Frequently Occurring Concepts within this Search

National Science Foundation - 36

Internal Revenue Service - 36

American Community Survey - 35

Center for Economic Studies - 35

Social Security Administration - 29

Service Annual Survey - 29

Research Data Center - 27

Current Population Survey - 24

Protected Identification Key - 22

Bureau of Labor Statistics - 22

Longitudinal Employer Household Dynamics - 20

North American Industry Classification System - 20

Cornell University - 20

Survey of Income and Program Participation - 19

Census Bureau Disclosure Review Board - 18

2010 Census - 18

Decennial Census - 17

Economic Census - 17

Social Security Number - 16

Person Validation System - 16

Master Address File - 16

Business Register - 16

Longitudinal Business Database - 16

Social Security - 15

Employer Identification Numbers - 14

Standard Industrial Classification - 14

Quarterly Workforce Indicators - 13

Disclosure Review Board - 12

Center for Administrative Records Research and Applications - 12

Special Sworn Status - 12

Person Identification Validation System - 11

Personally Identifiable Information - 11

Administrative Records - 11

Bureau of Economic Analysis - 11

Housing and Urban Development - 10

Census Bureau Business Register - 10

Alfred P Sloan Foundation - 10

Annual Survey of Manufactures - 10

Longitudinal Research Database - 10

National Opinion Research Center - 10

Department of Housing and Urban Development - 9

Indian Health Service - 9

National Center for Health Statistics - 9

Standard Statistical Establishment List - 9

County Business Patterns - 9

Business Dynamics Statistics - 9

Chicago Census Research Data Center - 9

MAFID - 8

SSA Numident - 8

Federal Statistical Research Data Center - 8

Computer Assisted Personal Interview - 7

Statistics Canada - 7

Quarterly Census of Employment and Wages - 7

Metropolitan Statistical Area - 7

Duke University - 7

American Statistical Association - 7

Public Use Micro Sample - 7

Census Bureau Master Address File - 6

Individual Taxpayer Identification Numbers - 6

Indian Housing Information Center - 6

Agency for Healthcare Research and Quality - 6

American Housing Survey - 6

Company Organization Survey - 6

DOB - 6

Unemployment Insurance - 6

Medicaid Services - 6

Census of Manufactures - 6

Postal Service - 6

LEHD Program - 6

Supplemental Nutrition Assistance Program - 5

Sloan Foundation - 5

Census Numident - 5

Census Bureau Person Identification Validation System - 5

Some Other Race - 5

National Institute on Aging - 5

University of Michigan - 5

Small Business Administration - 5

Office of Management and Budget - 5

Cornell Institute for Social and Economic Research - 5

PIKed - 5

University of Chicago - 5

American Economic Association - 5

Federal Reserve Bank - 5

National Bureau of Economic Research - 5

Local Employment Dynamics - 5

Permanent Plant Number - 5

Journal of Economic Literature - 5

Ordinary Least Squares - 4

1940 Census - 4

W-2 - 4

Census Edited File - 4

National Institutes of Health - 4

Health and Retirement Study - 4

National Longitudinal Survey of Youth - 4

Census of Manufacturing Firms - 4

Probability Density Function - 4

Minnesota Population Center - 4

Center for Administrative Records Research - 4

Organization for Economic Cooperation and Development - 4

Characteristics of Business Owners - 4

Total Factor Productivity - 4

Federal Insurance Contribution Act - 3

Social and Economic Supplement - 3

ASEC - 3

Adjusted Gross Income - 3

Temporary Assistance for Needy Families - 3

Geographic Information Systems - 3

Department of Economics - 3

COVID-19 - 3

National Income and Product Accounts - 3

Bureau of Labor - 3

Centers for Medicare - 3

Census Bureau Longitudinal Business Database - 3

Centers for Disease Control and Prevention - 3

Employer Characteristics File - 3

Department of Health and Human Services - 3

National Research Council - 3

Computer Assisted Telephone Interviews and Computer Assisted Personal Interviews - 3

CATI - 3

Census Bureau Center for Economic Studies - 3

Census 2000 - 3

Office of Personnel Management - 3

Census Bureau Business Dynamics Statistics - 3

COMPUSTAT - 3

Securities and Exchange Commission - 3

survey - 53

respondent - 44

statistical - 43

microdata - 41

datasets - 38

census bureau - 36

record - 36

agency - 35

data census - 31

census data - 26

estimating - 23

population - 23

database - 23

report - 22

disclosure - 19

analysis - 19

statistician - 17

confidentiality - 16

matching - 16

research - 16

survey data - 15

information - 15

privacy - 15

imputation - 15

researcher - 15

aggregate - 14

use census - 12

census survey - 12

census research - 12

statistical agencies - 12

payroll - 11

study - 11

estimation - 10

earnings - 10

sampling - 10

sample - 10

coverage - 10

public - 10

records census - 10

linkage - 10

employee - 10

workforce - 10

research census - 10

publicly - 9

resident - 9

identifier - 9

census records - 9

quarterly - 9

economic census - 9

business data - 9

matched - 9

economist - 9

sector - 9

2010 census - 8

assessed - 8

federal - 8

statistical disclosure - 8

employed - 8

enterprise - 8

longitudinal - 8

employment data - 8

employee data - 8

ssa - 7

census years - 7

residential - 7

residence - 7

household surveys - 7

reporting - 7

census use - 7

aggregation - 7

inference - 7

associate - 7

econometric - 7

estimator - 6

enrollment - 6

irs - 6

income data - 6

ethnicity - 6

race census - 6

census employment - 6

department - 6

work census - 6

information census - 6

recession - 6

surveys censuses - 6

percentile - 6

censuses surveys - 6

sale - 6

expenditure - 6

employ - 6

model - 6

census file - 6

industrial - 6

minority - 5

salary - 5

census linked - 5

citizen - 5

provided census - 5

race - 5

state - 5

housing - 5

assessing - 5

housing survey - 5

establishments data - 5

market - 5

analyst - 5

social - 5

worker - 5

manufacturing - 5

macroeconomic - 5

average - 4

survey income - 4

population survey - 4

census disclosure - 4

income individuals - 4

tax - 4

taxpayer - 4

geographic - 4

linked census - 4

survey households - 4

hispanic - 4

census 2020 - 4

home - 4

individuals census - 4

imputation model - 4

incorporated - 4

policymakers - 4

gdp - 4

employment statistics - 4

establishment - 4

investment - 4

labor - 4

trend - 4

earner - 3

household income - 3

1040 - 3

environmental - 3

impact - 3

disparity - 3

discrepancy - 3

racial - 3

empirical - 3

classification - 3

prevalence - 3

apartment - 3

unobserved - 3

organizational - 3

acquisition - 3

economic statistics - 3

classifying - 3

employer household - 3

imputed - 3

ancestry - 3

ethnic - 3

bias - 3

census responses - 3

worker demographics - 3

production - 3

manufacturer - 3

inventory - 3

employment dynamics - 3

workforce indicators - 3

classified - 3

measures employment - 3

employment measures - 3

firm data - 3

company - 3

Viewing papers 61 through 70 of 94


  • Working Paper

    LOOKING BACK ON THREE YEARS OF USING THE SYNTHETIC LBD BETA

    February 2014

    Working Paper Number:

    CES-14-11

    Distributions of business data are typically much more skewed than those for household or individual data and public knowledge of the underlying units is greater. As a results, national statistical offices (NSOs) rarely release establishment or firm-level business microdata due to the risk to respondent confidentiality. One potential approach for overcoming these risks is to release synthetic data where the establishment data are simulated from statistical models designed to mimic the distributions of the real underlying microdata. The US Census Bureau's Center for Economic Studies in collaboration with Duke University, the National Institute of Statistical Sciences, and Cornell University made available a synthetic public use file for the Longitudinal Business Database (LBD) comprising more than 20 million records for all business establishment with paid employees dating back to 1976. The resulting product, dubbed the SynLBD, was released in 2010 and is the first-ever comprehensive business microdata set publicly released in the United States including data on establishments employment and payroll, birth and death years, and industrial classification. This pa- per documents the scope of projects that have requested and used the SynLBD.
    View Full Paper PDF
  • Working Paper

    EXPANDING THE ROLE OF SYNTHETIC DATA AT THE U.S. CENSUS BUREAU

    February 2014

    Working Paper Number:

    CES-14-10

    National Statistical offices (NSOs) create official statistics from data collected from survey respondents, government administrative records and other sources. The raw source data is usually considered to be confidential. In the case of the U.S. Census Bureau, confidentiality of survey and administrative records microdata is mandated by statute, and this mandate to protect confidentiality is often at odds with the needs of users to extract as much information from the data as possible. Traditional disclosure protection techniques result in official data products that do not fully utilize the information content of the underlying microdata. Typically, these products take the form of simple aggregate tabulations. In a few cases anonymized public- use micro samples are made available, but these face a growing risk of re-identification by the increasing amounts of information about individuals and firms available in the public domain. One approach for overcoming these risks is to release products based on synthetic data where values are simulated from statistical models designed to mimic the (joint) distributions of the underlying microdata. We discuss re- cent Census Bureau work to develop and deploy such products. We discuss the benefits and challenges involved with extending the scope of synthetic data products in official statistics.
    View Full Paper PDF
  • Working Paper

    A METHOD OF CORRECTING FOR MISREPORTING APPLIED TO THE FOOD STAMP PROGRAM

    May 2013

    Authors: Nikolas Mittag

    Working Paper Number:

    CES-13-28

    Survey misreporting is known to be pervasive and bias common statistical analyses. In this paper, I first use administrative data on SNAP receipt and amounts linked to American Community Survey data from New York State to show that survey data can misrepresent the program in important ways. For example, more than 1.4 billion dollars received are not reported in New York State alone. 46 percent of dollars received by house- holds with annual income above the poverty line are not reported in the survey data, while only 19 percent are missing below the poverty line. Standard corrections for measurement error cannot remove these biases. I then develop a method to obtain consistent estimates by combining parameter estimates from the linked data with publicly available data. This conditional density method recovers the correct estimates using public use data only, which solves the problem that access to linked administrative data is usually restricted. I examine the degree to which this approach can be used to extrapolate across time and geography, in order to solve the problem that validation data is often based on a convenience sample. I present evidence from within New York State that the extent of heterogeneity is small enough to make extrapolation work well across both time and geography. Extrapolation to the entire U.S. yields substantive differences to survey data and reduces deviations from official aggregates by a factor of 4 to 9 compared to survey aggregates.
    View Full Paper PDF
  • Working Paper

    SYNTHETIC DATA FOR SMALL AREA ESTIMATION IN THE AMERICAN COMMUNITY SURVEY

    April 2013

    Working Paper Number:

    CES-13-19

    Small area estimates provide a critical source of information used to study local populations. Statistical agencies regularly collect data from small areas but are prevented from releasing detailed geographical identifiers in public-use data sets due to disclosure concerns. Alternative data dissemination methods used in practice include releasing summary/aggregate tables, suppressing detailed geographic information in public-use data sets, and accessing restricted data via Research Data Centers. This research examines an alternative method for disseminating microdata that contains more geographical details than are currently being released in public-use data files. Specifically, the method replaces the observed survey values with imputed, or synthetic, values simulated from a hierarchical Bayesian model. Confidentiality protection is enhanced because no actual values are released. The method is demonstrated using restricted data from the 2005-2009 American Community Survey. The analytic validity of the synthetic data is assessed by comparing small area estimates obtained from the synthetic data with those obtained from the observed data.
    View Full Paper PDF
  • Working Paper

    Dynamically Consistent Noise Infusion and Partially Synthetic Data as Confidentiality Protection Measures for Related Time Series

    July 2012

    Working Paper Number:

    CES-12-13

    The Census Bureau's Quarterly Workforce Indicators (QWI) provide detailed quarterly statistics on employment measures such as worker and job flows, tabulated by worker characteristics in various combinations. The data are released for several levels of NAICS industries and geography, the lowest aggregation of the latter being counties. Disclosure avoidance methods are required to protect the information about individuals and businesses that contribute to the underlying data. The QWI disclosure avoidance mechanism we describe here relies heavily on the use of noise infusion through a permanent multiplicative noise distortion factor, used for magnitudes, counts, differences and ratios. There is minimal suppression and no complementary suppressions. To our knowledge, the release in 2003 of the QWI was the first large-scale use of noise infusion in any official statistical product. We show that the released statistics are analytically valid along several critical dimensions { measures are unbiased and time series properties are preserved. We provide an analysis of the degree to which confidentiality is protected. Furthermore, we show how the judicious use of synthetic data, injected into the tabulation process, can completely eliminate suppressions, maintain analytical validity, and increase the protection of the underlying confidential data.
    View Full Paper PDF
  • Working Paper

    Newly Recovered Microdata on U.S. Manufacturing Plants from the 1950s and 1960s: Some Early Glimpses

    September 2011

    Working Paper Number:

    CES-11-29

    Longitudinally-linked microdata on U.S. manufacturing plants are currently available to researchers for 1963, 1967, and 1972-2009. In this paper, we provide a first look at recently recovered manufacturing microdata files from the 1950s and 1960s. We describe their origins and background, discuss their contents, and begin to explore their sample coverage. We also begin to examine whether the available establishment identifier(s) allow record linking. Our preliminary analyses suggest that longitudinally-linked Annual Survey of Manufactures microdata from the mid-1950s through the present ' containing 16 years of additional data ' appears possible though challenging. While a great deal of work remains, we see tremendous value in extending the manufacturing microdata series back into time. With these data, new lines of research become possible and many others can be revisited.
    View Full Paper PDF
  • Working Paper

    Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database

    February 2011

    Working Paper Number:

    CES-11-04

    In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments\' confidentiality. One approach with the potential for overcoming these risks is to release synthetic data; that is, the released establishment data are simulated from statistical models designed to mimic the distributions of the underlying real microdata. In this article, we describe an application of this strategy to create a public use file for the Longitudinal Business Database, an annual economic census of establishments in the United States comprising more than 20 million records dating back to 1976. The U.S. Bureau of the Census and the Internal Revenue Service recently approved the release of these synthetic microdata for public use, making the synthetic Longitudinal Business Database the first-ever business microdata set publicly released in the United States. We describe how we created the synthetic data, evaluated analytical validity, and assessed disclosure risk.
    View Full Paper PDF
  • Working Paper

    The Center for Economic Studies 1982-2007: A Brief History

    October 2009

    Authors: B.K. Atrostic

    Working Paper Number:

    CES-09-35

    More than half a century ago, visionaries representing both the Census Bureau and the external research community laid the foundation for the Center for Economic Studies (CES) and the Research Data Center (RDC) system. They saw a clear need for a system meeting the inextricably related requirements of providing more and better information from existing Census Bureau data collections while preserving respondent confidentiality and privacy. CES opened in 1982 to house new longitudinal business databases, develop them further, and make them available to qualified researchers. CES and the RDC system evolved to meet the designers' requirements. Research at CES and the RDCs meets the commitments of the Census Bureau (and, recently, of other agencies) to preserving confidentiality while contributing paradigm-shifting fundamental research in a range of disciplines and up-to-the-minute critical tools for decision-makers.
    View Full Paper PDF
  • Working Paper

    Resolving the Tension Between Access and Confidentiality: Past Experience and Future Plans at the U.S. Census Bureau

    September 2009

    Working Paper Number:

    CES-09-33

    This paper provides an historical context for access to U.S. Federal statistical data with a primary focus on the U.S. Census Bureau. We review the various modes used by the Census Bureau to make data available to users, and highlight the costs and benefits associated with each. We highlight some of the specific improvements underway or under consideration at the Census Bureau to better serve its data users, as well as discuss the broad strategies employed by statistical agencies to respond to the challenges of data access.
    View Full Paper PDF
  • Working Paper

    Concording U.S. Harmonized System Categories Over Time

    May 2009

    Working Paper Number:

    CES-09-11

    This paper: outlines an algorithm for concording U.S. ten-digit Harmonized System export and import codes over time; describes the concordances we construct for 1989 to 2004; and provides Stata code that can be used to construct similar concordances for arbitrary beginning and ending years from 1989 to 2007.
    View Full Paper PDF