The Need to Account for Complex Sampling Features when Analyzing Establishment Survey Data: An Illustration using the 2013 Business Research and Development and Innovation Survey (BRDIS)
January 2017
Working Paper Number:
CES-17-62
Abstract
Document Tags and Keywords
Keywords
Keywords are automatically generated using KeyBERT, a powerful and innovative
keyword extraction tool that utilizes BERT embeddings to ensure high-quality and contextually relevant
keywords.
By analyzing the content of working papers, KeyBERT identifies terms and phrases that capture the essence of the
text, highlighting the most significant topics and trends. This approach not only enhances searchability but
provides connections that go beyond potentially domain-specific author-defined keywords.
:
analysis,
estimating,
data,
statistical,
survey data,
survey,
study,
respondent,
research,
establishment,
innovation,
population,
sampling,
sample,
inference,
housing survey
Tags
Tags are automatically generated using a pretrained language model from spaCy, which excels at
several tasks, including entity tagging.
The model is able to label words and phrases by part-of-speech,
including "organizations." By filtering for frequent words and phrases labeled as "organizations", papers are
identified to contain references to specific institutions, datasets, and other organizations.
:
Service Annual Survey,
National Science Foundation,
Alfred P Sloan Foundation,
International Trade Research Report,
Business Research and Development and Innovation Survey,
National Center for Science and Engineering Statistics
Similar Working Papers
Similarity between working papers are determined by an unsupervised neural
network model
know as Doc2Vec.
Doc2Vec is a model that represents entire documents as fixed-length vectors, allowing for the
capture of semantic meaning in a way that relates to the context of words within the document. The model learns to
associate a unique vector with each document while simultaneously learning word vectors, enabling tasks such as
document classification, clustering, and similarity detection by preserving the order and structure of words. The
document vectors are compared using cosine similarity/distance to determine the most similar working papers.
Papers identified with 🔥 are in the top 20% of similarity.
The 10 most similar working papers to the working paper 'The Need to Account for Complex Sampling Features when Analyzing Establishment Survey Data: An Illustration using the 2013 Business Research and Development and Innovation Survey (BRDIS)' are listed below in order of similarity.
-
Working PaperSYNTHETIC DATA FOR SMALL AREA ESTIMATION IN THE AMERICAN COMMUNITY SURVEY
April 2013
Working Paper Number:
CES-13-19
Small area estimates provide a critical source of information used to study local populations. Statistical agencies regularly collect data from small areas but are prevented from releasing detailed geographical identifiers in public-use data sets due to disclosure concerns. Alternative data dissemination methods used in practice include releasing summary/aggregate tables, suppressing detailed geographic information in public-use data sets, and accessing restricted data via Research Data Centers. This research examines an alternative method for disseminating microdata that contains more geographical details than are currently being released in public-use data files. Specifically, the method replaces the observed survey values with imputed, or synthetic, values simulated from a hierarchical Bayesian model. Confidentiality protection is enhanced because no actual values are released. The method is demonstrated using restricted data from the 2005-2009 American Community Survey. The analytic validity of the synthetic data is assessed by comparing small area estimates obtained from the synthetic data with those obtained from the observed data.View Full Paper PDF
-
Working PaperGrassroots Design Meets Grassroots Innovation: Rural Design Orientation and Firm Performance
March 2024
Working Paper Number:
CES-24-17
The study of grassroots design'applying structured, creative processes to the usability or aesthetics of a product without input from professional design consultancies'remains under investigated. If design comprises a mediation between people and technology whereby technologies are made more accessible or more likely to delight, then the process by which new grassroots inventions are transformed into innovations valued in markets cannot be fully understood. This paper uses U.S. data on the design orientation of respondents in the 2014 Rural Establishment Innovation Survey linked to longitudinal data on the same firms to examine the association between design, innovation, and employment and payroll growth. Findings from the research will inform questions to be investigated in the recently collected 2022 Annual Business Survey (ABS) that for the first time contains a Design module.View Full Paper PDF
-
Working PaperAn In-Depth Examination of Requirements for Disclosure Risk Assessment
October 2023
Working Paper Number:
CES-23-49
The use of formal privacy to protect the confidentiality of responses in the 2020 Decennial Census of Population and Housing has triggered renewed interest and debate over how to measure the disclosure risks and societal benefits of the published data products. Following long-established precedent in economics and statistics, we argue that any proposal for quantifying disclosure risk should be based on pre-specified, objective criteria. Such criteria should be used to compare methodologies to identify those with the most desirable properties. We illustrate this approach, using simple desiderata, to evaluate the absolute disclosure risk framework, the counterfactual framework underlying differential privacy, and prior-to-posterior comparisons. We conclude that satisfying all the desiderata is impossible, but counterfactual comparisons satisfy the most while absolute disclosure risk satisfies the fewest. Furthermore, we explain that many of the criticisms levied against differential privacy would be levied against any technology that is not equivalent to direct, unrestricted access to confidential data. Thus, more research is needed, but in the near-term, the counterfactual approach appears best-suited for privacy-utility analysis.View Full Paper PDF
-
Working PaperR&D, Attrition and Multiple Imputation in BRDIS
January 2017
Working Paper Number:
CES-17-13
Multiple imputation in business establishment surveys like BRDIS, an annual business survey in which some companies are sampled every year or multiple years, may enhance the estimates of total R&D in addition to helping researchers estimate models with subpopulations of small sample size. Considering a panel of BRDIS companies throughout the years 2008 to 2013 linked to LBD data, this paper uses the conclusions obtained with missing data visualization and other explorations to come up with a strategy to conduct multiple imputation appropriate to address the item nonresponse in R&D expenditures. Because survey design characteristics are behind much of the item and unit nonresponse, multiple imputation of missing data in BRDIS changes the estimates of total R&D significantly and alters the conclusions reached by models of the determinants of R&D investment obtained with complete case analysis.View Full Paper PDF
-
Working PaperAn Economist's Primer on Survey Samples
September 2000
Working Paper Number:
CES-00-15
Survey data underlie most empirical work in economics, yet economists typically have little familiarity with survey sample design and its effects on inference. This paper describes how sample designs depart from the simple random sampling model implicit in most econometrics textbooks, points out where the effects of this departure are likely to be greatest, and describes the relationship between design-based estimators developed by survey statisticians and related econometric methods for regression. Its intent is to provide empirical economists with enough background in survey methods to make informed use of design-based estimators. It emphasizes surveys of households (the source of most public-use files), but also considers how surveys of businesses differ. Examples from the National Longitudinal Survey of Youth of 1979 and the Current Population Survey illustrate practical aspects of design-based estimation.View Full Paper PDF
-
Working PaperEffects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Improve the U.S. Statistical System?
January 2017
Working Paper Number:
CES-17-59R
The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistical System (FSS), particularly the Census Bureau. The activities to date have covered both fundamental and applied statistical research and have focused at least in part on the training of current and future generations of researchers in skills of relevance to surveys and alternative measurement of economic units, households, and persons. This paper discusses some of the key research findings of the eight nodes, organized into six topics: (1) Improving census and survey data collection methods; (2) Using alternative sources of data; (3) Protecting privacy and confidentiality by improving disclosure avoidance; (4) Using spatial and spatio-temporal statistical modeling to improve estimates; (5) Assessing data cost and quality tradeoffs; and (6) Combining information from multiple sources. It also reports on collaborations across nodes and with federal agencies, new software developed, and educational activities and outcomes. The paper concludes with an evaluation of the ability of the FSS to apply the NCRN's research outcomes and suggests some next steps, as well as the implications of this research-network model for future federal government renewal initiatives.View Full Paper PDF
-
Working PaperImproving Estimates of Neighborhood Change with Constant Tract Boundaries
May 2022
Working Paper Number:
CES-22-16
Social scientists routinely rely on methods of interpolation to adjust available data to their research needs. This study calls attention to the potential for substantial error in efforts to harmonize data to constant boundaries using standard approaches to areal and population interpolation. We compare estimates from a standard source (the Longitudinal Tract Data Base) to true values calculated by re-aggregating original 2000 census microdata to 2010 tract areas. We then demonstrate an alternative approach that allows the re-aggregated values to be publicly disclosed, using 'differential privacy' (DP) methods to inject random noise to protect confidentiality of the raw data. The DP estimates are considerably more accurate than the interpolated estimates. We also examine conditions under which interpolation is more susceptible to error. This study reveals cause for greater caution in the use of interpolated estimates from any source. Until and unless DP estimates can be publicly disclosed for a wide range of variables and years, research on neighborhood change should routinely examine data for signs of estimation error that may be substantial in a large share of tracts that experienced complex boundary changes.View Full Paper PDF
-
Working PaperGradient Boosting to Address Statistical Problems Arising from Non-Linkage of Census Bureau Datasets
June 2024
Working Paper Number:
CES-24-27
This article introduces the twangRDC package, which contains functions to address non-linkage in US Census Bureau datasets. The Census Bureau's Person Identification Validation System facilitates data linkage by assigning unique person identifiers to federal, third party, decennial census, and survey data. Not all records in these datasets can be linked to the reference file and as such not all records will be assigned an identifier. This article is a tutorial for using the twangRDC to generate nonresponse weights to account for non-linkage of person records across US Census Bureau datasets.View Full Paper PDF
-
Working PaperSome Open Questions on Multiple-Source Extensions of Adaptive-Survey Design Concepts and Methods
February 2023
Working Paper Number:
CES-23-03
Adaptive survey design is a framework for making data-driven decisions about survey data collection operations. This paper discusses open questions related to the extension of adaptive principles and capabilities when capturing data from multiple data sources. Here, the concept of 'design' encompasses the focused allocation of resources required for the production of high-quality statistical information in a sustainable and cost-effective way. This conceptual framework leads to a discussion of six groups of issues including: (i) the goals for improvement through adaptation; (ii) the design features that are available for adaptation; (iii) the auxiliary data that may be available for informing adaptation; (iv) the decision rules that could guide adaptation; (v) the necessary systems to operationalize adaptation; and (vi) the quality, cost, and risk profiles of the proposed adaptations (and how to evaluate them). A multiple data source environment creates significant opportunities, but also introduces complexities that are a challenge in the production of high-quality statistical information.View Full Paper PDF