CREAT: Census Research Exploration and Analysis Tool

Papers Containing Keywords(s): 'record'

The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.
Click here to search again

Frequently Occurring Concepts within this Search

Internal Revenue Service - 24

Social Security Administration - 22

Protected Identification Key - 22

American Community Survey - 21

Center for Economic Studies - 18

Service Annual Survey - 17

Social Security Number - 15

Person Validation System - 14

National Science Foundation - 13

2010 Census - 13

Census Bureau Disclosure Review Board - 12

North American Industry Classification System - 12

Personally Identifiable Information - 11

Research Data Center - 11

Person Identification Validation System - 10

Social Security - 10

Longitudinal Business Database - 10

Indian Health Service - 9

Master Address File - 9

Standard Industrial Classification - 9

Administrative Records - 9

Center for Administrative Records Research and Applications - 9

Longitudinal Employer Household Dynamics - 8

Current Population Survey - 8

County Business Patterns - 8

Employer Identification Numbers - 8

Business Register - 8

Decennial Census - 7

Department of Housing and Urban Development - 7

Indian Housing Information Center - 7

Housing and Urban Development - 7

Some Other Race - 7

Bureau of Labor Statistics - 7

Economic Census - 7

Federal Statistical Research Data Center - 7

Disclosure Review Board - 6

Quarterly Workforce Indicators - 6

Individual Taxpayer Identification Numbers - 6

Survey of Income and Program Participation - 6

Business Dynamics Statistics - 6

SSA Numident - 6

National Opinion Research Center - 6

Computer Assisted Telephone Interviews and Computer Assisted Personal Interviews - 5

Computer Assisted Personal Interview - 5

CATI - 5

Quarterly Census of Employment and Wages - 5

Census Bureau Person Identification Validation System - 5

Census Numident - 5

Census Bureau Business Register - 5

Annual Survey of Manufactures - 5

MAFID - 5

Cornell University - 5

Medicaid Services - 5

Postal Service - 4

Census Bureau Master Address File - 4

Standard Statistical Establishment List - 4

Company Organization Survey - 4

Centers for Medicare - 4

Chicago Census Research Data Center - 4

Unemployment Insurance - 4

Center for Administrative Records Research - 4

Census of Manufactures - 4

PIKed - 4

Sloan Foundation - 3

Supplemental Nutrition Assistance Program - 3

Census Edited File - 3

Census Household Composition Key - 3

University of Chicago - 3

National Center for Health Statistics - 3

Office of Management and Budget - 3

1940 Census - 3

Department of Economics - 3

Alfred P Sloan Foundation - 3

University of Michigan - 3

COVID-19 - 3

Metropolitan Statistical Area - 3

Longitudinal Research Database - 3

Minnesota Population Center - 3

Local Employment Dynamics - 3

Duke University - 3

data - 36

survey - 26

datasets - 24

respondent - 21

microdata - 20

census bureau - 20

census data - 17

matching - 17

data census - 16

agency - 15

database - 15

report - 13

population - 12

statistical - 12

imputation - 11

records census - 10

irs - 10

census records - 10

linkage - 10

matched - 10

identifier - 9

disclosure - 8

federal - 8

use census - 8

census use - 8

census research - 8

ethnicity - 7

hispanic - 7

estimating - 7

coverage - 7

ssa - 7

department - 7

quarterly - 7

information - 7

confidentiality - 6

privacy - 6

publicly - 6

employed - 6

census survey - 6

citizen - 6

business data - 6

aggregate - 6

sector - 6

firms census - 6

census file - 6

1040 - 5

enrollment - 5

employee - 5

filing - 5

census employment - 5

census linked - 5

residence - 5

payroll - 5

enterprise - 5

longitudinal - 5

analysis - 5

associate - 5

survey data - 5

public - 4

minority - 4

ethnic - 4

job - 4

incorporated - 4

employment data - 4

workforce - 4

race - 4

sampling - 4

discrepancy - 4

race census - 4

linked census - 4

resident - 4

census responses - 4

census 2020 - 4

sample - 4

reporting - 4

employ - 4

employment statistics - 4

researcher - 4

research - 4

2010 census - 4

statistical disclosure - 4

model - 4

industrial - 4

statistical agencies - 4

surveys censuses - 3

tenure - 3

assessed - 3

native - 3

state - 3

migration - 3

migrant - 3

medicare - 3

medicaid - 3

recession - 3

establishments data - 3

manufacturing - 3

statistician - 3

financial - 3

demography - 3

inference - 3

ancestry - 3

econometric - 3

earnings - 3

Viewing papers 41 through 50 of 51


  • Working Paper

    IMPROVING THE SYNTHETIC LONGITUDINAL BUSINESS DATABASE

    February 2014

    Working Paper Number:

    CES-14-12

    In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments' confidentiality. Agencies potentially can manage these risks by releasing synthetic microdata, i.e., individual establishment records simulated from statistical models de- signed to mimic the joint distribution of the underlying observed data. Previously, we used this approach to generate a public-use version'now available for public use'of the U. S. Census Bureau's Longitudinal Business Database (LBD), a longitudinal cen- sus of establishments dating back to 1976. While the synthetic LBD has proven to be a useful product, we now seek to improve and expand it by using new synthesis models and adding features. This article describes our efforts to create the second generation of the SynLBD, including synthesis procedures that we believe could be replicated in other contexts.
    View Full Paper PDF
  • Working Paper

    LOOKING BACK ON THREE YEARS OF USING THE SYNTHETIC LBD BETA

    February 2014

    Working Paper Number:

    CES-14-11

    Distributions of business data are typically much more skewed than those for household or individual data and public knowledge of the underlying units is greater. As a results, national statistical offices (NSOs) rarely release establishment or firm-level business microdata due to the risk to respondent confidentiality. One potential approach for overcoming these risks is to release synthetic data where the establishment data are simulated from statistical models designed to mimic the distributions of the real underlying microdata. The US Census Bureau's Center for Economic Studies in collaboration with Duke University, the National Institute of Statistical Sciences, and Cornell University made available a synthetic public use file for the Longitudinal Business Database (LBD) comprising more than 20 million records for all business establishment with paid employees dating back to 1976. The resulting product, dubbed the SynLBD, was released in 2010 and is the first-ever comprehensive business microdata set publicly released in the United States including data on establishments employment and payroll, birth and death years, and industrial classification. This pa- per documents the scope of projects that have requested and used the SynLBD.
    View Full Paper PDF
  • Working Paper

    EXPANDING THE ROLE OF SYNTHETIC DATA AT THE U.S. CENSUS BUREAU

    February 2014

    Working Paper Number:

    CES-14-10

    National Statistical offices (NSOs) create official statistics from data collected from survey respondents, government administrative records and other sources. The raw source data is usually considered to be confidential. In the case of the U.S. Census Bureau, confidentiality of survey and administrative records microdata is mandated by statute, and this mandate to protect confidentiality is often at odds with the needs of users to extract as much information from the data as possible. Traditional disclosure protection techniques result in official data products that do not fully utilize the information content of the underlying microdata. Typically, these products take the form of simple aggregate tabulations. In a few cases anonymized public- use micro samples are made available, but these face a growing risk of re-identification by the increasing amounts of information about individuals and firms available in the public domain. One approach for overcoming these risks is to release products based on synthetic data where values are simulated from statistical models designed to mimic the (joint) distributions of the underlying microdata. We discuss re- cent Census Bureau work to develop and deploy such products. We discuss the benefits and challenges involved with extending the scope of synthetic data products in official statistics.
    View Full Paper PDF
  • Working Paper

    A COMPARISON OF PERSON-REPORTED INDUSTRY TO EMPLOYER-REPORTED INDUSTRY IN SURVEY AND ADMINISTRATIVE DATA

    September 2013

    Working Paper Number:

    CES-13-47

    The Census Bureau collects industry information through surveys and administrative data and creates associated public-use statistics. In this paper, we compare person-reported industry in the American Community Survey (ACS) to employer-reported industry from the Quarterly Census of Employment and Wages (QCEW) that is part of the Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) program. This research provides necessary information on the use of administrative data as a supplement to survey data industry information, and the findings will be useful for anyone using industry information from either source. Our project is part of a larger effort to compare information on jobs from household survey data to employer-reported information. This research is the first to compare ACS job data to firm-based administrative data. We find an overall industry sector match rate of 75 percent, and a 61 percent match rate at the 4-digit Census Industry Code (CIC) level. Industry match rates vary by sector and by whether industry sector is classified using ACS or LEHD industry information. The educational services and health care and social assistance sectors have among the highest match rates. The management of companies and enterprises sector has the lowest match rate, using either ACS-reported or LEHD-reported sector. For individuals with imputed industry data, the industry sector match rate is only 14 percent. Our findings suggest that the industry distribution and the sample in a particular industry sector will differ depending on whether ACS or LEHD data are used.
    View Full Paper PDF
  • Working Paper

    An Analysis of Sample Selection and the Reliability of Using Short-term Earnings Averages in SIPP-SSA Matched Data

    December 2011

    Working Paper Number:

    CES-11-39

    In this paper, we document the extent to which the sample of the Survey of Income and Program Participation that is matched to the Social Security Administration's administrative earnings records is nationally representative. We conclude that the match bias is small, so selection is not a serious concern. The matched sample over-represents individuals who are wealthy, who have financial assets or who have received a government-transfer and under-represents individuals who attrited from the SIPP. We use this matched sample to examine the relationship between short-term averages of earnings from the SIPP earnings and average lifetime earnings from the administrative records. Our estimates suggest that using short averages of earnings may understate the effects of permanent income on particular outcomes of interest.
    View Full Paper PDF
  • Working Paper

    Newly Recovered Microdata on U.S. Manufacturing Plants from the 1950s and 1960s: Some Early Glimpses

    September 2011

    Working Paper Number:

    CES-11-29

    Longitudinally-linked microdata on U.S. manufacturing plants are currently available to researchers for 1963, 1967, and 1972-2009. In this paper, we provide a first look at recently recovered manufacturing microdata files from the 1950s and 1960s. We describe their origins and background, discuss their contents, and begin to explore their sample coverage. We also begin to examine whether the available establishment identifier(s) allow record linking. Our preliminary analyses suggest that longitudinally-linked Annual Survey of Manufactures microdata from the mid-1950s through the present ' containing 16 years of additional data ' appears possible though challenging. While a great deal of work remains, we see tremendous value in extending the manufacturing microdata series back into time. With these data, new lines of research become possible and many others can be revisited.
    View Full Paper PDF
  • Working Paper

    Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database

    February 2011

    Working Paper Number:

    CES-11-04

    In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments\' confidentiality. One approach with the potential for overcoming these risks is to release synthetic data; that is, the released establishment data are simulated from statistical models designed to mimic the distributions of the underlying real microdata. In this article, we describe an application of this strategy to create a public use file for the Longitudinal Business Database, an annual economic census of establishments in the United States comprising more than 20 million records dating back to 1976. The U.S. Bureau of the Census and the Internal Revenue Service recently approved the release of these synthetic microdata for public use, making the synthetic Longitudinal Business Database the first-ever business microdata set publicly released in the United States. We describe how we created the synthetic data, evaluated analytical validity, and assessed disclosure risk.
    View Full Paper PDF
  • Working Paper

    Concording U.S. Harmonized System Categories Over Time

    May 2009

    Working Paper Number:

    CES-09-11

    This paper: outlines an algorithm for concording U.S. ten-digit Harmonized System export and import codes over time; describes the concordances we construct for 1989 to 2004; and provides Stata code that can be used to construct similar concordances for arbitrary beginning and ending years from 1989 to 2007.
    View Full Paper PDF
  • Working Paper

    Distribution Preserving Statistical Disclosure Limitation

    September 2006

    Working Paper Number:

    tp-2006-04

    One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with confidential data replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate them. We present two practical methods of generating synthetic values when the imputer has only limited information about the true data generating process. One is applicable when the true likelihood is known up to a monotone transformation. The second requires only limited knowledge of the true likelihood, but nevertheless preserves the conditional distribution of the confidential data, up to sampling error, on arbitrary subdomains. Our method maximizes data utility and minimizes incremental disclosure risk up to posterior uncertainty in the imputation model and sampling error in the estimated transformation. We validate the approach with a simulation and application to a large linked employer-employee database.
    View Full Paper PDF
  • Working Paper

    NEW DATA FOR DYNAMIC ANALYSIS: THE LONGITUDINAL ESTABLISHMENT AND ENTERPRISE MICRODATA (LEEM) FILE

    December 1999

    Authors: Alicia Robb

    Working Paper Number:

    CES-99-18

    Until now, research on U.S. business activities over time has been hindered by the lack of accurate and comprehensive longitudinal data. The new Longitudinal Establishment and Enterprise Microdata (LEEM) are tremendously rich data that open up numerous possibilities for dynamic analyses of businesses in the U.S. economy. It is the first nationwide high-quality longitudinal database that covers the majority of employer businesses from all sectors of the economy. Due to the confidential nature of these data, the file is located at the Center for Economic Studies in the U.S. Bureau of the Census. To access the data, researchers must submit an acceptable proposal to CES and become sworn Census researchers. This paper describes the LEEM file, the variables contained on the file, and current uses of the data.
    View Full Paper PDF