CREAT: Census Research Exploration and Analysis Tool

Papers Containing Keywords(s): 'data'

The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.
Click here to search again

Frequently Occurring Concepts within this Search

National Science Foundation - 36

Internal Revenue Service - 36

American Community Survey - 35

Center for Economic Studies - 35

Social Security Administration - 29

Service Annual Survey - 29

Research Data Center - 27

Current Population Survey - 24

Protected Identification Key - 22

Bureau of Labor Statistics - 22

Longitudinal Employer Household Dynamics - 20

North American Industry Classification System - 20

Cornell University - 20

Survey of Income and Program Participation - 19

Census Bureau Disclosure Review Board - 18

2010 Census - 18

Decennial Census - 17

Economic Census - 17

Social Security Number - 16

Person Validation System - 16

Master Address File - 16

Business Register - 16

Longitudinal Business Database - 16

Social Security - 15

Employer Identification Numbers - 14

Standard Industrial Classification - 14

Quarterly Workforce Indicators - 13

Disclosure Review Board - 12

Center for Administrative Records Research and Applications - 12

Special Sworn Status - 12

Person Identification Validation System - 11

Personally Identifiable Information - 11

Administrative Records - 11

Bureau of Economic Analysis - 11

Housing and Urban Development - 10

Census Bureau Business Register - 10

Alfred P Sloan Foundation - 10

Annual Survey of Manufactures - 10

Longitudinal Research Database - 10

National Opinion Research Center - 10

Department of Housing and Urban Development - 9

Indian Health Service - 9

National Center for Health Statistics - 9

Standard Statistical Establishment List - 9

County Business Patterns - 9

Business Dynamics Statistics - 9

Chicago Census Research Data Center - 9

MAFID - 8

SSA Numident - 8

Federal Statistical Research Data Center - 8

Computer Assisted Personal Interview - 7

Statistics Canada - 7

Quarterly Census of Employment and Wages - 7

Metropolitan Statistical Area - 7

Duke University - 7

American Statistical Association - 7

Public Use Micro Sample - 7

Census Bureau Master Address File - 6

Individual Taxpayer Identification Numbers - 6

Indian Housing Information Center - 6

Agency for Healthcare Research and Quality - 6

American Housing Survey - 6

Company Organization Survey - 6

DOB - 6

Unemployment Insurance - 6

Medicaid Services - 6

Census of Manufactures - 6

Postal Service - 6

LEHD Program - 6

Supplemental Nutrition Assistance Program - 5

Sloan Foundation - 5

Census Numident - 5

Census Bureau Person Identification Validation System - 5

Some Other Race - 5

National Institute on Aging - 5

University of Michigan - 5

Small Business Administration - 5

Office of Management and Budget - 5

Cornell Institute for Social and Economic Research - 5

PIKed - 5

University of Chicago - 5

American Economic Association - 5

Federal Reserve Bank - 5

National Bureau of Economic Research - 5

Local Employment Dynamics - 5

Permanent Plant Number - 5

Journal of Economic Literature - 5

Ordinary Least Squares - 4

1940 Census - 4

W-2 - 4

Census Edited File - 4

National Institutes of Health - 4

Health and Retirement Study - 4

National Longitudinal Survey of Youth - 4

Census of Manufacturing Firms - 4

Probability Density Function - 4

Minnesota Population Center - 4

Center for Administrative Records Research - 4

Organization for Economic Cooperation and Development - 4

Characteristics of Business Owners - 4

Total Factor Productivity - 4

Federal Insurance Contribution Act - 3

Social and Economic Supplement - 3

ASEC - 3

Adjusted Gross Income - 3

Temporary Assistance for Needy Families - 3

Geographic Information Systems - 3

Department of Economics - 3

COVID-19 - 3

National Income and Product Accounts - 3

Bureau of Labor - 3

Centers for Medicare - 3

Census Bureau Longitudinal Business Database - 3

Centers for Disease Control and Prevention - 3

Employer Characteristics File - 3

Department of Health and Human Services - 3

National Research Council - 3

Computer Assisted Telephone Interviews and Computer Assisted Personal Interviews - 3

CATI - 3

Census Bureau Center for Economic Studies - 3

Census 2000 - 3

Office of Personnel Management - 3

Census Bureau Business Dynamics Statistics - 3

COMPUSTAT - 3

Securities and Exchange Commission - 3

survey - 53

respondent - 44

statistical - 43

microdata - 41

datasets - 38

census bureau - 36

record - 36

agency - 35

data census - 31

census data - 26

estimating - 23

population - 23

database - 23

report - 22

disclosure - 19

analysis - 19

statistician - 17

confidentiality - 16

matching - 16

research - 16

survey data - 15

information - 15

privacy - 15

imputation - 15

researcher - 15

aggregate - 14

use census - 12

census survey - 12

census research - 12

statistical agencies - 12

payroll - 11

study - 11

estimation - 10

earnings - 10

sampling - 10

sample - 10

coverage - 10

public - 10

records census - 10

linkage - 10

employee - 10

workforce - 10

research census - 10

publicly - 9

resident - 9

identifier - 9

census records - 9

quarterly - 9

economic census - 9

business data - 9

matched - 9

economist - 9

sector - 9

2010 census - 8

assessed - 8

federal - 8

statistical disclosure - 8

employed - 8

enterprise - 8

longitudinal - 8

employment data - 8

employee data - 8

ssa - 7

census years - 7

residential - 7

residence - 7

household surveys - 7

reporting - 7

census use - 7

aggregation - 7

inference - 7

associate - 7

econometric - 7

estimator - 6

enrollment - 6

irs - 6

income data - 6

ethnicity - 6

race census - 6

census employment - 6

department - 6

work census - 6

information census - 6

recession - 6

surveys censuses - 6

percentile - 6

censuses surveys - 6

sale - 6

expenditure - 6

employ - 6

model - 6

census file - 6

industrial - 6

minority - 5

salary - 5

census linked - 5

citizen - 5

provided census - 5

race - 5

state - 5

housing - 5

assessing - 5

housing survey - 5

establishments data - 5

market - 5

analyst - 5

social - 5

worker - 5

manufacturing - 5

macroeconomic - 5

average - 4

survey income - 4

population survey - 4

census disclosure - 4

income individuals - 4

tax - 4

taxpayer - 4

geographic - 4

linked census - 4

survey households - 4

hispanic - 4

census 2020 - 4

home - 4

individuals census - 4

imputation model - 4

incorporated - 4

policymakers - 4

gdp - 4

employment statistics - 4

establishment - 4

investment - 4

labor - 4

trend - 4

earner - 3

household income - 3

1040 - 3

environmental - 3

impact - 3

disparity - 3

discrepancy - 3

racial - 3

empirical - 3

classification - 3

prevalence - 3

apartment - 3

unobserved - 3

organizational - 3

acquisition - 3

economic statistics - 3

classifying - 3

employer household - 3

imputed - 3

ancestry - 3

ethnic - 3

bias - 3

census responses - 3

worker demographics - 3

production - 3

manufacturer - 3

inventory - 3

employment dynamics - 3

workforce indicators - 3

classified - 3

measures employment - 3

employment measures - 3

firm data - 3

company - 3

Viewing papers 51 through 60 of 94


  • Working Paper

    Person Matching in Historical Files using the Census Bureau's Person Validation System

    September 2014

    Working Paper Number:

    carra-2014-11

    The recent release of the 1940 Census manuscripts enables the creation of longitudinal data spanning the whole of the twentieth century. Linked historical and contemporary data would allow unprecedented analyses of the causes and consequences of health, demographic, and economic change. The Census Bureau is uniquely equipped to provide high quality linkages of person records across datasets. This paper summarizes the linkage techniques employed by the Census Bureau and discusses utilization of these techniques to append protected identification keys to the 1940 Census.
    View Full Paper PDF
  • Working Paper

    Evaluation of Commercial School and Teacher Lists to Enhance Survey Frames

    July 2014

    Working Paper Number:

    carra-2014-07

    This report summarizes the potential for teacher lists obtained from commercial vendors for enhancing sampling frames for the National Teacher and Principal Survey (NTPS). We investigate three separate vendor lists, and compare coverage rates across a range of school and teacher characteristics. Across all vendors, coverage rates are higher for regular, non-charter schools. Vendor A stands out as having higher coverage rates than the other two, and we recommend further evaluating Vendor A's teacher lists during the upcoming 2014-2015 NTPS Field Test.
    View Full Paper PDF
  • Working Paper

    2010 American Community Survey Match Study

    July 2014

    Working Paper Number:

    carra-2014-03

    Using administrative records data from federal government agencies and commercial sources, the 2010 ACS Match Study measures administrative records coverage of 2010 ACS addresses, persons, and persons at addresses at different levels of geography as well as by demographic characteristics and response mode. The 2010 ACS Match Study represents a continuation of the research undertaken in the 2010 Census Match Study, the first national-level evaluation of administrative records data coverage. Preliminary results indicate that administrative records provide substantial coverage for addresses and persons in the 2010 ACS (92.7 and 92.1 percent respectively), and less extensive though substantial coverage, for person-address pairs (74.3 percent). In addition, some variation in address, person and/or person-address coverage is found across demographic and response mode groups. This research informs future uses of administrative records in survey and decennial census operations to address the increasing costs of data collection and declining response rates.
    View Full Paper PDF
  • Working Paper

    Estimating Record Linkage False Match Rate for the Person Identification Validation System

    July 2014

    Working Paper Number:

    carra-2014-02

    The Census Bureau Person Identification Validation System (PVS) assigns unique person identifiers to federal, commercial, census, and survey data to facilitate linkages across files. PVS uses probabilistic matching to assign a unique Census Bureau identifier for each person. This paper presents a method to measure the false match rate in PVS following the approach of Belin and Rubin (1995). The Belin and Rubin methodology requires truth data to estimate a mixture model. The parameters from the mixture model are used to obtain point estimates of the false match rate for each of the PVS search modules. The truth data requirement is satisfied by the unique access the Census Bureau has to high quality name, date of birth, address and Social Security (SSN) data. Truth data are quickly created for the Belin and Rubin model and do not involve a clerical review process. These truth data are used to create estimates for the Belin and Rubin parameters, making the approach more feasible. Both observed and modeled false match rates are computed for all search modules in federal administrative records data and commercial data.
    View Full Paper PDF
  • Working Paper

    The Person Identification Validation System (PVS): Applying the Center for Administrative Records Research and Applications' (CARRA) Record Linkage Software

    July 2014

    Working Paper Number:

    carra-2014-01

    The Census Bureau's Person Identification Validation System (PVS) assigns unique person identifiers to federal, commercial, census, and survey data to facilitate linkages across and within files. PVS uses probabilistic matching to assign a unique Census Bureau identifier for each person. The PVS matches incoming files to reference files created with data from the Social Security Administration (SSA) Numerical Identification file, and SSA data with addresses obtained from federal files. This paper describes the PVS methodology from editing input data to creating the final file.
    View Full Paper PDF
  • Working Paper

    Comparison of Survey, Federal, and Commercial Address Data Quality

    June 2014

    Authors: Quentin Brummet

    Working Paper Number:

    carra-2014-06

    This report summarizes matching of survey, commercial, and administrative records housing units to the Census Bureau Master Address File (MAF). We document overall MAF match rates in each data set and evaluate differences in match rates across a variety of housing characteristics. Results show that over 90 percent of records in survey data from the American Housing Survey (AHS) match to the MAF. Commercial data from CoreLogic matches at much lower rates, in part due to missing address information and poor match rates for multi-unit buildings. MAF match rates for administrative records from the Department of Housing and Urban Development are also high, and open the possibility of using this information in surveys such as the AHS.
    View Full Paper PDF
  • Working Paper

    The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey

    April 2014

    Working Paper Number:

    carra-2014-08

    Record linkage across survey and administrative records sources can greatly enrich data and improve their quality. The linkage can reduce respondent burden and nonresponse follow-up costs. This is particularly important in an era of declining survey response rates and tight budgets. Record linkage also creates statistical bias, however. The U.S. Census Bureau links person records through its Person Identification Validation System (PVS), assigning each record a Protected Identification Key (PIK). It is not possible to reliably assign a PIK to every record, either due to insufficient identifying information or because the information does not uniquely match any of the administrative records used in the person validation process. Non-random ability to assign a PIK can potentially inject bias into statistics using linked data. This paper studies the nature of this bias using the 2009 and 2010 American Community Survey (ACS). The ACS is well-suited for this analysis, as it contains a rich set of person characteristics that can describe the bias. We estimate probit models for whether a record is assigned a PIK. The results suggest that young children, minorities, residents of group quarters, immigrants, recent movers, low-income individuals, and non-employed individuals are less likely to receive a PIK using 2009 ACS. Changes to the PVS process in 2010 significantly addressed the young children deficit, attenuated the other biases, and increased the validated records share from 88.1 to 92.6 percent (person-weighted).
    View Full Paper PDF
  • Working Paper

    FIRM AGE AND SIZE IN THE LONGITUDINAL EMPLOYER-HOUSEHOLD DYNAMICS DATA

    March 2014

    Working Paper Number:

    CES-14-16

    The Census Bureau's Quarterly Workforce Dynamics (QWI) and OnTheMap now provide detailed workforce statistics by employer age and size. These data allow a first look at the demographics of workers at small and young businesses as well as detailed analysis of how hiring, turnover, job creation/destruction vary throughout a firm's lifespan. Both the QWI and OnTheMap are tabulated from the Longitudinal Employer-Household Dynamics (LEHD) linked employer-employee data. Firm age and size information was added to the LEHD data through integration of Business Dynamics Statistics (BDS) microdata into the LEHD jobs frame. This paper describes how these two new firm characteristics were added to the microdata and how they are tabulated in QWI and OnTheMap
    View Full Paper PDF
  • Working Paper

    A FIRST STEP TOWARDS A GERMAN SYNLBD: CONSTRUCTING A GERMAN LONGITUDINAL BUSINESS DATABASE

    February 2014

    Working Paper Number:

    CES-14-13

    One major criticism against the use of synthetic data has been that the efforts necessary to generate useful synthetic data are so in- tense that many statistical agencies cannot afford them. We argue many lessons in this evolving field have been learned in the early years of synthetic data generation, and can be used in the development of new synthetic data products, considerably reducing the required in- vestments. The final goal of the project described in this paper will be to evaluate whether synthetic data algorithms developed in the U.S. to generate a synthetic version of the Longitudinal Business Database (LBD) can easily be transferred to generate a similar data product for other countries. We construct a German data product with infor- mation comparable to the LBD - the German Longitudinal Business Database (GLBD) - that is generated from different administrative sources at the Institute for Employment Research, Germany. In a fu- ture step, the algorithms developed for the synthesis of the LBD will be applied to the GLBD. Extensive evaluations will illustrate whether the algorithms provide useful synthetic data without further adjustment. The ultimate goal of the project is to provide access to multiple synthetic datasets similar to the SynLBD at Cornell to enable comparative studies between countries. The Synthetic GLBD is a first step towards that goal.
    View Full Paper PDF
  • Working Paper

    IMPROVING THE SYNTHETIC LONGITUDINAL BUSINESS DATABASE

    February 2014

    Working Paper Number:

    CES-14-12

    In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments' confidentiality. Agencies potentially can manage these risks by releasing synthetic microdata, i.e., individual establishment records simulated from statistical models de- signed to mimic the joint distribution of the underlying observed data. Previously, we used this approach to generate a public-use version'now available for public use'of the U. S. Census Bureau's Longitudinal Business Database (LBD), a longitudinal cen- sus of establishments dating back to 1976. While the synthetic LBD has proven to be a useful product, we now seek to improve and expand it by using new synthesis models and adding features. This article describes our efforts to create the second generation of the SynLBD, including synthesis procedures that we believe could be replicated in other contexts.
    View Full Paper PDF