Papers Containing Keywords(s): 'inference'
The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.
See Working Papers by Tag(s), Keywords(s), Author(s), or Search Text
Click here to search again
Frequently Occurring Concepts within this Search
Viewing papers 1 through 10 of 16
-
Working PaperWhat Caused Racial Disparities in Particulate Exposure to Fall? New Evidence from the Clean Air Act and Satellite-Based Measures of Air Quality
January 2020
Working Paper Number:
CES-20-02
Racial differences in exposure to ambient air pollution have declined significantly in the United States over the past 20 years. This project links restricted-access Census Bureau microdata to newly available, spatially continuous high resolution measures of ambient particulate pollution (PM2.5) to examine the underlying causes and consequences of differences in black-white pollution exposures. We begin by decomposing differences in pollution exposure into components explained by observable population characteristics (e.g., income) versus those that remain unexplained. We then use quantile regression methods to show that a significant portion of the 'unexplained' convergence in black-white pollution exposure can be attributed to differential impacts of the Clean Air Act (CAA) in non-Hispanic African American and non-Hispanic white communities. Areas with larger black populations saw greater CAA-related declines in PM2.5 exposure. We show that the CAA has been the single largest contributor to racial convergence in PM2.5 pollution exposure in the U.S. since 2000 accounting for over 60 percent of the reduction.View Full Paper PDF
-
Working PaperThe Need to Account for Complex Sampling Features when Analyzing Establishment Survey Data: An Illustration using the 2013 Business Research and Development and Innovation Survey (BRDIS)
January 2017
Working Paper Number:
CES-17-62
The importance of correctly accounting for complex sampling features when generating finite population inferences based on complex sample survey data sets has now been clearly established in a variety of fields, including those in both statistical and non statistical domains. Unfortunately, recent studies of analytic error have suggested that many secondary analysts of survey data do not ultimately account for these sampling features when analyzing their data, for a variety of possible reasons (e.g., poor documentation, or a data producer may not provide the information in a publicuse data set). The research in this area has focused exclusively on analyses of household survey data, and individual respondents. No research to date has considered how analysts are approaching the data collected in establishment surveys, and whether published articles advancing science based on analyses of establishment behaviors and outcomes are correctly accounting for complex sampling features. This article presents alternative analyses of real data from the 2013 Business Research and Development and Innovation Survey (BRDIS), and shows that a failure to account for the complex design features of the sample underlying these data can lead to substantial differences in inferences about the target population of establishments for the BRDIS.View Full Paper PDF
-
Working PaperFile Matching with Faulty Continuous Matching Variables
January 2017
Working Paper Number:
CES-17-45
We present LFCMV, a Bayesian file linking methodology designed to link records using continuous matching variables in situations where we do not expect values of these matching variables to agree exactly across matched pairs. The method involves a linking model for the distance between the matching variables of records in one file and the matching variables of their linked records in the second. This linking model is conditional on a vector indicating the links. We specify a mixture model for the distance component of the linking model, as this latent structure allows the distance between matching variables in linked pairs to vary across types of linked pairs. Finally, we specify a model for the linking vector. We describe the Gibbs sampling algorithm for sampling from the posterior distribution of this linkage model and use artificial data to illustrate model performance. We also introduce a linking application using public survey information and data from the U.S. Census of Manufactures and use LFCMV to link the records.View Full Paper PDF
-
Working PaperUsing Partially Synthetic Microdata to Protect Sensitive Cells in Business Statistics
February 2016
Working Paper Number:
CES-16-10
We describe and analyze a method that blends records from both observed and synthetic microdata into public-use tabulations on establishment statistics. The resulting tables use synthetic data only in potentially sensitive cells. We describe different algorithms, and present preliminary results when applied to the Census Bureau's Business Dynamics Statistics and Synthetic Longitudinal Business Database, highlighting accuracy and protection afforded by the method when compared to existing public-use tabulations (with suppressions).View Full Paper PDF
-
Working PaperThe Consequences of Long Term Unemployment: Evidence from Matched Employer-Employee Data*
January 2016
Working Paper Number:
CES-16-40
It is well known that the long-term unemployed fare worse in the labor market than the short-term unemployed, but less clear why this is so. One potential explanation is that the long-term unemployed are 'bad apples' who had poorer prospects from the outset of their spells (heterogeneity). Another is that their bad outcomes are a consequence of the extended unemployment they have experienced (state dependence). We use Current Population Survey (CPS) data on unemployed individuals linked to wage records for the same people to distinguish between these competing explanations. For each person in our sample, we have wage record data that cover the period from 20 quarters before to 11 quarters after the quarter in which the person is observed in the CPS. This gives us rich information about prior and subsequent work histories not available to previous researchers that we use to control for individual heterogeneity that might be affecting subsequent labor market outcomes. Even with these controls in place, we find that unemployment duration has a strongly negative effect on the likelihood of subsequent employment. This finding is inconsistent with the heterogeneity ('bad apple') explanation for why the long-term unemployed fare worse than the short-term unemployed. We also find that longer unemployment durations are associated with lower subsequent earnings, though this is mainly attributable to the long-term unemployed having a lower likelihood of subsequent employment rather than to their having lower earnings once a job is found.View Full Paper PDF
-
Working PaperIntroduction of Head Start and Maternal Labor Supply: Evidence from a Regression Discontinuity Design
January 2016
Working Paper Number:
CES-16-35
I use the non-public decennial censuses in 1970 to investigate the effect of the Head Start program on maternal labor supply and schooling in its early years. I exploit a discontinuity in county-level Head Start funding beginning in the late 1960s to explore differences in countylevel maternal employment and maternal schooling. The results provide suggestive evidence that the more availability of Head Start led to an increase the nursery school enrollment of children and a decrease in maternal labor supply. In addition, the ITT estimates imply a relatively large, negative effect of enrollment on maternal labor supply. However, the estimates are somewhat sensitive to addition of covariates and the standard errors are also large to draw firm inferences.View Full Paper PDF
-
Working PaperSimultaneous Edit-Imputation for Continuous Microdata
December 2015
Working Paper Number:
CES-15-44
Many statistical organizations collect data that are expected to satisfy linear constraints; as examples, component variables should sum to total variables, and ratios of pairs of variables should be bounded by expert-specified constants. When reported data violate constraints, organizations identify and replace values potentially in error in a process known as edit-imputation. To date, most approaches separate the error localization and imputation steps, typically using optimization methods to identify the variables to change followed by hot deck imputation. We present an approach that fully integrates editing and imputation for continuous microdata under linear constraints. Our approach relies on a Bayesian hierarchical model that includes (i) a flexible joint probability model for the underlying true values of the data with support only on the set of values that satisfy all editing constraints, (ii) a model for latent indicators of the variables that are in error, and (iii) a model for the reported responses for variables in error. We illustrate the potential advantages of the Bayesian editing approach over existing approaches using simulation studies. We apply the model to edit faulty data from the 2007 U.S. Census of Manufactures. Supplementary materials for this article are available online.View Full Paper PDF
-
Working PaperEstimation and Inference in Regression Discontinuity Designs with Clustered Sampling
August 2015
Working Paper Number:
carra-2015-06
Regression Discontinuity (RD) designs have become popular in empirical studies due to their attractive properties for estimating causal effects under transparent assumptions. Nonetheless, most popular procedures assume i.i.d. data, which is not reasonable in many common applications. To relax this assumption, we derive the properties of traditional non-parametric estimators in a setting that incorporates potential clustering at the level of the running variable, and propose an accompanying optimal-MSE bandwidth selection rule. Simulation results demonstrate that falsely assuming data are i.i.d. when selecting the bandwidth may lead to the choice of bandwidths that are too small relative to the optimal-MSE bandwidth. Last, we apply our procedure using person-level microdata that exhibits clustering at the census tract level to analyze the impact of the Low-Income Housing Tax Credit program on neighborhood characteristics and low-income housing supply.View Full Paper PDF
-
Working PaperUSING IMPUTATION TECHNIQUES TO EVALUATE STOPPING RULES IN ADAPTIVE SURVEY DESIGN
October 2014
Working Paper Number:
CES-14-40
Adaptive Design methods for social surveys utilize the information from the data as it is collected to make decisions about the sampling design. In some cases, the decision is either to continue or stop the data collection. We evaluate this decision by proposing measures to compare the collected data with follow-up samples. The options are assessed by imputation of the nonrespondents under different missingness scenarios, including Missing Not at Random. The variation in the utility measures is compared to the cost induced by the follow-up sample sizes. We apply the proposed method to the 2007 U.S. Census of Manufacturers.View Full Paper PDF
-
Working PaperA FIRST STEP TOWARDS A GERMAN SYNLBD: CONSTRUCTING A GERMAN LONGITUDINAL BUSINESS DATABASE
February 2014
Working Paper Number:
CES-14-13
One major criticism against the use of synthetic data has been that the efforts necessary to generate useful synthetic data are so in- tense that many statistical agencies cannot afford them. We argue many lessons in this evolving field have been learned in the early years of synthetic data generation, and can be used in the development of new synthetic data products, considerably reducing the required in- vestments. The final goal of the project described in this paper will be to evaluate whether synthetic data algorithms developed in the U.S. to generate a synthetic version of the Longitudinal Business Database (LBD) can easily be transferred to generate a similar data product for other countries. We construct a German data product with infor- mation comparable to the LBD - the German Longitudinal Business Database (GLBD) - that is generated from different administrative sources at the Institute for Employment Research, Germany. In a fu- ture step, the algorithms developed for the synthesis of the LBD will be applied to the GLBD. Extensive evaluations will illustrate whether the algorithms provide useful synthetic data without further adjustment. The ultimate goal of the project is to provide access to multiple synthetic datasets similar to the SynLBD at Cornell to enable comparative studies between countries. The Synthetic GLBD is a first step towards that goal.View Full Paper PDF