CREAT: Census Research Exploration and Analysis Tool

Large Plant Data in the LRD: Selection of a Sample for Estimation

March 1999

Working Paper Number:

CES-99-06

Abstract

This paper describes preliminary work with the LRD during our tenure at the Census Bureau as participants in the ASA/NSF/Census Research Program. The objective of the work described here were two-fold. First, we wanted to examine the suitableness of these data for the calculation of plant-level productivity indexes, following procedures typically implemented with time series data. Second, we wanted to select a small number of 2-digit industry groups that would be well suited to the estimation of production functions and systems of factor share equations and factor demand forecasting equations with system-wide techniques. This description of our initial work may be useful to other researchers who are interested in the LRD for the analysis of productivity growth and/or the estimation of systems of factor equations, because the specific results reported in this memo suggest that the data are of good quality, or because the nature of the tasks undertaken provides insight into issues that arise in the analysis of longitudinal establishment data.

Document Tags and Keywords

Keywords Keywords are automatically generated using KeyBERT, a powerful and innovative keyword extraction tool that utilizes BERT embeddings to ensure high-quality and contextually relevant keywords.

By analyzing the content of working papers, KeyBERT identifies terms and phrases that capture the essence of the text, highlighting the most significant topics and trends. This approach not only enhances searchability but provides connections that go beyond potentially domain-specific author-defined keywords.
:
estimating, econometric, estimation, statistical, report, data census, census research, productivity growth, employed, employ, labor, efficiency, statistician, empirical, recession, expenditure, analysis productivity, economic census, tenure, salary, productivity dynamics, census years, labor statistics, census bureau

Tags Tags are automatically generated using a pretrained language model from spaCy, which excels at several tasks, including entity tagging.

The model is able to label words and phrases by part-of-speech, including "organizations." By filtering for frequent words and phrases labeled as "organizations", papers are identified to contain references to specific institutions, datasets, and other organizations.
:
Department of Commerce, Bureau of Labor Statistics, National Science Foundation, Standard Industrial Classification, Census of Manufactures, Longitudinal Research Database, Annual Survey of Manufactures, Internal Revenue Service, Yale University, American Statistical Association, Center for Economic Studies, Columbia University, Bureau of Economic Analysis, Review of Economics and Statistics, Permanent Plant Number, Census Bureau Longitudinal Business Database, Cambridge University Press, Department of Labor, BLS Handbook of Methods

Similar Working Papers Similarity between working papers are determined by an unsupervised neural network model know as Doc2Vec.

Doc2Vec is a model that represents entire documents as fixed-length vectors, allowing for the capture of semantic meaning in a way that relates to the context of words within the document. The model learns to associate a unique vector with each document while simultaneously learning word vectors, enabling tasks such as document classification, clustering, and similarity detection by preserving the order and structure of words. The document vectors are compared using cosine similarity/distance to determine the most similar working papers. Papers identified with 🔥 are in the top 20% of similarity.

The 10 most similar working papers to the working paper 'Large Plant Data in the LRD: Selection of a Sample for Estimation' are listed below in order of similarity.