CREAT: Census Research Exploration and Analysis Tool

Papers Containing Keywords(s): 'data'

The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.
Click here to search again

Frequently Occurring Concepts within this Search

National Science Foundation - 36

Internal Revenue Service - 36

American Community Survey - 35

Center for Economic Studies - 35

Social Security Administration - 29

Service Annual Survey - 29

Research Data Center - 27

Current Population Survey - 24

Protected Identification Key - 22

Bureau of Labor Statistics - 22

Longitudinal Employer Household Dynamics - 20

North American Industry Classification System - 20

Cornell University - 20

Survey of Income and Program Participation - 19

Census Bureau Disclosure Review Board - 18

2010 Census - 18

Decennial Census - 17

Economic Census - 17

Social Security Number - 16

Person Validation System - 16

Master Address File - 16

Business Register - 16

Longitudinal Business Database - 16

Social Security - 15

Employer Identification Numbers - 14

Standard Industrial Classification - 14

Quarterly Workforce Indicators - 13

Disclosure Review Board - 12

Center for Administrative Records Research and Applications - 12

Special Sworn Status - 12

Person Identification Validation System - 11

Personally Identifiable Information - 11

Administrative Records - 11

Bureau of Economic Analysis - 11

Housing and Urban Development - 10

Census Bureau Business Register - 10

Alfred P Sloan Foundation - 10

Annual Survey of Manufactures - 10

Longitudinal Research Database - 10

National Opinion Research Center - 10

Department of Housing and Urban Development - 9

Indian Health Service - 9

National Center for Health Statistics - 9

Standard Statistical Establishment List - 9

County Business Patterns - 9

Business Dynamics Statistics - 9

Chicago Census Research Data Center - 9

MAFID - 8

SSA Numident - 8

Federal Statistical Research Data Center - 8

Computer Assisted Personal Interview - 7

Statistics Canada - 7

Quarterly Census of Employment and Wages - 7

Metropolitan Statistical Area - 7

Duke University - 7

American Statistical Association - 7

Public Use Micro Sample - 7

Census Bureau Master Address File - 6

Individual Taxpayer Identification Numbers - 6

Indian Housing Information Center - 6

Agency for Healthcare Research and Quality - 6

American Housing Survey - 6

Company Organization Survey - 6

DOB - 6

Unemployment Insurance - 6

Medicaid Services - 6

Census of Manufactures - 6

Postal Service - 6

LEHD Program - 6

Supplemental Nutrition Assistance Program - 5

Sloan Foundation - 5

Census Numident - 5

Census Bureau Person Identification Validation System - 5

Some Other Race - 5

National Institute on Aging - 5

University of Michigan - 5

Small Business Administration - 5

Office of Management and Budget - 5

Cornell Institute for Social and Economic Research - 5

PIKed - 5

University of Chicago - 5

American Economic Association - 5

Federal Reserve Bank - 5

National Bureau of Economic Research - 5

Local Employment Dynamics - 5

Permanent Plant Number - 5

Journal of Economic Literature - 5

Ordinary Least Squares - 4

1940 Census - 4

W-2 - 4

Census Edited File - 4

National Institutes of Health - 4

Health and Retirement Study - 4

National Longitudinal Survey of Youth - 4

Census of Manufacturing Firms - 4

Probability Density Function - 4

Minnesota Population Center - 4

Center for Administrative Records Research - 4

Organization for Economic Cooperation and Development - 4

Characteristics of Business Owners - 4

Total Factor Productivity - 4

Federal Insurance Contribution Act - 3

Social and Economic Supplement - 3

ASEC - 3

Adjusted Gross Income - 3

Temporary Assistance for Needy Families - 3

Geographic Information Systems - 3

Department of Economics - 3

COVID-19 - 3

National Income and Product Accounts - 3

Bureau of Labor - 3

Centers for Medicare - 3

Census Bureau Longitudinal Business Database - 3

Centers for Disease Control and Prevention - 3

Employer Characteristics File - 3

Department of Health and Human Services - 3

National Research Council - 3

Computer Assisted Telephone Interviews and Computer Assisted Personal Interviews - 3

CATI - 3

Census Bureau Center for Economic Studies - 3

Census 2000 - 3

Office of Personnel Management - 3

Census Bureau Business Dynamics Statistics - 3

COMPUSTAT - 3

Securities and Exchange Commission - 3

survey - 53

respondent - 44

statistical - 43

microdata - 41

datasets - 38

census bureau - 36

record - 36

agency - 35

data census - 31

census data - 26

estimating - 23

population - 23

database - 23

report - 22

disclosure - 19

analysis - 19

statistician - 17

confidentiality - 16

matching - 16

research - 16

survey data - 15

information - 15

privacy - 15

imputation - 15

researcher - 15

aggregate - 14

use census - 12

census survey - 12

census research - 12

statistical agencies - 12

payroll - 11

study - 11

estimation - 10

earnings - 10

sampling - 10

sample - 10

coverage - 10

public - 10

records census - 10

linkage - 10

employee - 10

workforce - 10

research census - 10

publicly - 9

resident - 9

identifier - 9

census records - 9

quarterly - 9

economic census - 9

business data - 9

matched - 9

economist - 9

sector - 9

2010 census - 8

assessed - 8

federal - 8

statistical disclosure - 8

employed - 8

enterprise - 8

longitudinal - 8

employment data - 8

employee data - 8

ssa - 7

census years - 7

residential - 7

residence - 7

household surveys - 7

reporting - 7

census use - 7

aggregation - 7

inference - 7

associate - 7

econometric - 7

estimator - 6

enrollment - 6

irs - 6

income data - 6

ethnicity - 6

race census - 6

census employment - 6

department - 6

work census - 6

information census - 6

recession - 6

surveys censuses - 6

percentile - 6

censuses surveys - 6

sale - 6

expenditure - 6

employ - 6

model - 6

census file - 6

industrial - 6

minority - 5

salary - 5

census linked - 5

citizen - 5

provided census - 5

race - 5

state - 5

housing - 5

assessing - 5

housing survey - 5

establishments data - 5

market - 5

analyst - 5

social - 5

worker - 5

manufacturing - 5

macroeconomic - 5

average - 4

survey income - 4

population survey - 4

census disclosure - 4

income individuals - 4

tax - 4

taxpayer - 4

geographic - 4

linked census - 4

survey households - 4

hispanic - 4

census 2020 - 4

home - 4

individuals census - 4

imputation model - 4

incorporated - 4

policymakers - 4

gdp - 4

employment statistics - 4

establishment - 4

investment - 4

labor - 4

trend - 4

earner - 3

household income - 3

1040 - 3

environmental - 3

impact - 3

disparity - 3

discrepancy - 3

racial - 3

empirical - 3

classification - 3

prevalence - 3

apartment - 3

unobserved - 3

organizational - 3

acquisition - 3

economic statistics - 3

classifying - 3

employer household - 3

imputed - 3

ancestry - 3

ethnic - 3

bias - 3

census responses - 3

worker demographics - 3

production - 3

manufacturer - 3

inventory - 3

employment dynamics - 3

workforce indicators - 3

classified - 3

measures employment - 3

employment measures - 3

firm data - 3

company - 3

Viewing papers 31 through 40 of 94


  • Working Paper

    Developing a Residence Candidate File for Use With Employer-Employee Matched Data

    January 2017

    Working Paper Number:

    CES-17-40

    This paper describes the Longitudinal Employer-Household Dynamics (LEHD) program's ongoing efforts to use administrative records in a predictive model that describes residence locations for workers. This project was motivated by the discontinuation of a residence file produced elsewhere at the U.S. Census Bureau. The goal of the Residence Candidate File (RCF) process is to provide the LEHD Infrastructure Files with residence information that maintains currency with the changing state of administrative sources and represents uncertainty in location as a probability distribution. The discontinued file provided only a single residence per person/year, even when contributing administrative data may have contained multiple residences. This paper describes the motivation for the project, our methodology, the administrative data sources, the model estimation and validation results, and the file specifications. We find that the best prediction of the person-place model provides similar, but superior, accuracy compared with previous methods and performs well for workers in the LEHD jobs frame. We outline possibilities for further improvement in sources and modeling as well as recommendations on how to use the preference weights in downstream processing.
    View Full Paper PDF
  • Working Paper

    Examining Multi-Level Correlates of Suicide by Merging NVDRS and ACS Data

    January 2017

    Working Paper Number:

    CES-17-25

    This paper describes a novel database and an associated suicide event prediction model that surmount longstanding barriers in suicide risk factor research. The database comingles person-level records from the National Violent Death Reporting System (NVDRS) and the American Community Survey (ACS) to establish a case-control study sample that includes all identified suicide cases, while faithfully reflecting general population sociodemographics, in sixteen USA states during the years 2005 2011. It supports a statistical model of individual suicide risk that accommodates person-level factors and the moderation of these factors by their community rates. Named the United States Multi-Level Suicide Data Set (US-MSDS), the database was developed outside the RDC laboratory using publicly available ACS microdata, and reconstructed inside the laboratory using restricted access ACS microdata. Analyses of the latter version yielded findings that largely amplified but also extended those obtained from analyses of the former. This experience shows that the analytic precision achievable using restricted access ACS data can play an important role in conducting social research, although it also indicates that publicly available ACS data have considerable value in conducting preliminary analyses and preparing to use an RDC laboratory. The database development strategy may interest scientists investigating sociodemographic risk factors for other types of low-frequency mortality.
    View Full Paper PDF
  • Working Paper

    R&D, Attrition and Multiple Imputation in BRDIS

    January 2017

    Working Paper Number:

    CES-17-13

    Multiple imputation in business establishment surveys like BRDIS, an annual business survey in which some companies are sampled every year or multiple years, may enhance the estimates of total R&D in addition to helping researchers estimate models with subpopulations of small sample size. Considering a panel of BRDIS companies throughout the years 2008 to 2013 linked to LBD data, this paper uses the conclusions obtained with missing data visualization and other explorations to come up with a strategy to conduct multiple imputation appropriate to address the item nonresponse in R&D expenditures. Because survey design characteristics are behind much of the item and unit nonresponse, multiple imputation of missing data in BRDIS changes the estimates of total R&D significantly and alters the conclusions reached by models of the determinants of R&D investment obtained with complete case analysis.
    View Full Paper PDF
  • Working Paper

    Public-Use vs. Restricted-Use: An Analysis Using the American Community Survey

    January 2017

    Working Paper Number:

    CES-17-12

    Statistical agencies frequently publish microdata that have been altered to protect confidentiality. Such data retain utility for many types of broad analyses but can yield biased or Insufficiently precise results in others. Research access to de-identified versions of the restricted-use data with little or no alteration is often possible, albeit costly and time-consuming. We investigate the the advantages and disadvantages of public-use and restricted-use data from the American Community Survey (ACS) in constructing a wage index. The public-use data used were Public Use Microdata Samples, while the restricted-use data were accessed via a Federal Statistical Research Data Center. We discuss the advantages and disadvantages of each data source and compare estimated CWIs and standard errors at the state and labor market levels.
    View Full Paper PDF
  • Working Paper

    Evaluating the Use of Commercial Data to Improve Survey Estimates of Property Taxes

    August 2016

    Working Paper Number:

    carra-2016-06

    While commercial data sources offer promise to statistical agencies for use in production of official statistics, challenges can arise as the data are not collected for statistical purposes. This paper evaluates the use of 2008-2010 property tax data from CoreLogic, Inc. (CoreLogic), aggregated from county and township governments from around the country, to improve 2010 American Community Survey (ACS) estimates of property tax amounts for single-family homes. Particularly, the research evaluates the potential to use CoreLogic to reduce respondent burden, to study survey response error and to improve adjustments for survey nonresponse. The research found that the coverage of the CoreLogic data varies between counties as does the correspondence between ACS and CoreLogic property taxes. This geographic variation implies that different approaches toward using CoreLogic are needed in different areas of the country. Further, large differences between CoreLogic and ACS property taxes in certain counties seem to be due to conceptual differences between what is collected in the two data sources. The research examines three counties, Clark County, NV, Philadelphia County, PA and St. Louis County, MO, and compares how estimates would change with different approaches using the CoreLogic data. Mean county property tax estimates are highly sensitive to whether ACS or CoreLogic data are used to construct estimates. Using CoreLogic data in imputation modeling for nonresponse adjustment of ACS estimates modestly improves the predictive power of imputation models, although estimates of county property taxes and property taxes by mortgage status are not very sensitive to the imputation method.
    View Full Paper PDF
  • Working Paper

    Playing with Matches: An Assessment of Accuracy in Linked Historical Data

    June 2016

    Working Paper Number:

    carra-2016-05

    This paper evaluates linkage quality achieved by various record linkage techniques used in historical demography. I create benchmark, or truth, data by linking the 2005 Current Population Survey Annual Social and Economic Supplement to the Social Security Administration's Numeric Identification System by Social Security Number. By comparing simulated linkages to the benchmark data, I examine the value added (in terms of number and quality of links) from incorporating text-string comparators, adjusting age, and using a probabilistic matching algorithm. I find that text-string comparators and probabilistic approaches are useful for increasing the linkage rate, but use of text-string comparators may decrease accuracy in some cases. Overall, probabilistic matching offers the best balance between linkage rates and accuracy.
    View Full Paper PDF
  • Working Paper

    Using Partially Synthetic Microdata to Protect Sensitive Cells in Business Statistics

    February 2016

    Working Paper Number:

    CES-16-10

    We describe and analyze a method that blends records from both observed and synthetic microdata into public-use tabulations on establishment statistics. The resulting tables use synthetic data only in potentially sensitive cells. We describe different algorithms, and present preliminary results when applied to the Census Bureau's Business Dynamics Statistics and Synthetic Longitudinal Business Database, highlighting accuracy and protection afforded by the method when compared to existing public-use tabulations (with suppressions).
    View Full Paper PDF
  • Working Paper

    The Management and Organizational Practices Survey (MOPS): Cognitive Testing*

    January 2016

    Working Paper Number:

    CES-16-53

    All Census Bureau surveys must meet quality standards before they can be sent to the public for data collection. This paper outlines the pretesting process that was used to ensure that the Management and Organizational Practices Survey (MOPS) met those standards. The MOPS is the first large survey of management practices at U.S. manufacturing establishments. The first wave of the MOPS, issued for reference year 2010, was subject to internal expert review and two rounds of cognitive interviews. The results of this pretesting were used to make significant changes to the MOPS instrument and ensure that quality data was collected. The second wave of the MOPS, featuring new questions on data in decision making (DDD) and uncertainty and issued for reference year 2015, was subject to two rounds of cognitive interviews and a round of usability testing. This paper illustrates the effort undertaken by the Census Bureau to ensure that all surveys released into the field are of high quality and provides insight into how respondents interpret the MOPS questionnaire for those looking to utilize the MOPS data.
    View Full Paper PDF
  • Working Paper

    Simultaneous Edit-Imputation for Continuous Microdata

    December 2015

    Working Paper Number:

    CES-15-44

    Many statistical organizations collect data that are expected to satisfy linear constraints; as examples, component variables should sum to total variables, and ratios of pairs of variables should be bounded by expert-specified constants. When reported data violate constraints, organizations identify and replace values potentially in error in a process known as edit-imputation. To date, most approaches separate the error localization and imputation steps, typically using optimization methods to identify the variables to change followed by hot deck imputation. We present an approach that fully integrates editing and imputation for continuous microdata under linear constraints. Our approach relies on a Bayesian hierarchical model that includes (i) a flexible joint probability model for the underlying true values of the data with support only on the set of values that satisfy all editing constraints, (ii) a model for latent indicators of the variables that are in error, and (iii) a model for the reported responses for variables in error. We illustrate the potential advantages of the Bayesian editing approach over existing approaches using simulation studies. We apply the model to edit faulty data from the 2007 U.S. Census of Manufactures. Supplementary materials for this article are available online.
    View Full Paper PDF
  • Working Paper

    When Race and Hispanic Origin Reporting are Discrepant Across Administrative Records and Third Party Sources: Exploring Methods to Assign Responses

    December 2015

    Working Paper Number:

    carra-2015-08

    The U.S. Census Bureau is researching uses of administrative records and third party data in survey and decennial census operations. One potential use of administrative records is to utilize these data when race and Hispanic origin responses are missing. When federal and third party administrative records are compiled, race and Hispanic origin responses are not always the same for an individual across sources. We explore different methods to assign one race and one Hispanic response when these responses are discrepant. We also describe the characteristics of individuals with matching, non-matching, and missing race and Hispanic origin data by demographic, household, and contextual variables. We find that minorities, especially Hispanics, are more likely to have non-matching Hispanic origin and race responses in administrative records and third party data compared to the 2010 Census. Minority groups and individuals ages 0-17 are more likely to have missing race or Hispanic origin data in administrative records and third party data. Larger households tend to have more missing race data in administrative records and third party data than smaller households.
    View Full Paper PDF