Testing the Advantages of Using Product Level Data to Create Linkages Across Industrial Coding Systems
October 1993
Working Paper Number:
CES-93-14
Abstract
Document Tags and Keywords
Keywords
Keywords are automatically generated using KeyBERT, a powerful and innovative
keyword extraction tool that utilizes BERT embeddings to ensure high-quality and contextually relevant
keywords.
By analyzing the content of working papers, KeyBERT identifies terms and phrases that capture the essence of the
text, highlighting the most significant topics and trends. This approach not only enhances searchability but
provides connections that go beyond potentially domain-specific author-defined keywords.
:
production,
manufacturing,
statistical,
industrial,
commerce,
product,
commodity,
sector,
classified,
industrial classification,
classification,
classifying
Tags
Tags are automatically generated using a pretrained language model from spaCy, which excels at
several tasks, including entity tagging.
The model is able to label words and phrases by part-of-speech,
including "organizations." By filtering for frequent words and phrases labeled as "organizations", papers are
identified to contain references to specific institutions, datasets, and other organizations.
:
Department of Commerce,
Standard Industrial Classification,
Bureau of Labor Statistics,
Longitudinal Research Database,
Center for Economic Studies,
Bureau of Economic Analysis,
Insurance Information Institute,
North American Free Trade Agreement
Similar Working Papers
Similarity between working papers are determined by an unsupervised neural
network model
know as Doc2Vec.
Doc2Vec is a model that represents entire documents as fixed-length vectors, allowing for the
capture of semantic meaning in a way that relates to the context of words within the document. The model learns to
associate a unique vector with each document while simultaneously learning word vectors, enabling tasks such as
document classification, clustering, and similarity detection by preserving the order and structure of words. The
document vectors are compared using cosine similarity/distance to determine the most similar working papers.
Papers identified with 🔥 are in the top 20% of similarity.
The 10 most similar working papers to the working paper 'Testing the Advantages of Using Product Level Data to Create Linkages Across Industrial Coding Systems' are listed below in order of similarity.
-
Working PaperManufacturing Establishments Reclassified Into New Industries: The Effect Of Survey Design Rules
November 1992
Working Paper Number:
CES-92-14
Establishment reclassification occurs when an establishment classified in one industry in one year is reclassified into another industry in another year. Because of survey design rules at the Census Bureau these reclassifications occur systematically over time, and affect the industry-level time series of output and employment. The evidence shows that reclassified establishments occur most often in two distinct years over the life of a sample panel. Switches are not only numerous in these years, they also contribute significantly to measured industry change in industry output and employment. The problem is that reclassifications are not necessarily processed in the year that they occur. The survey rules restrict most change to certain years. The effect of these rules is evidenced by looking at the variance across industry growth rates which increases greatly in these two years. Whatever the reason for reclassifying an establishment, the way the switches are processed raises the possibility of measurement errors in the industry level statistics. Researchers and policymakers relying upon observations in annual changes in industry statistics should be aware of these systematic discontinuities, discrepancies and potential data distortions.View Full Paper PDF
-
Working PaperThe Role of Industry Classification in the Estimation of Research and Development Expenditures
November 2014
Working Paper Number:
CES-14-45
This paper uses data from the National Science Foundation's surveys on business research and development (R&D) expenditures that have been linked with data from the Census Bureau's Longitudinal Business Database to produce consistent NAICS-based R&D time-series data based on the main product produced by the firm for 1976 to 2008.The results show that R&D spending has shifted away from domestic manufacturing industries in recent years. This is due in part to a shift in U.S. payrolls away from manufacturing establishments for R&D-performing firms.These findings support the notion of an increasingly fragmented production system for R&D-intensive manufacturing firms, whereby U.S. firms control output and provide intellectual property inputs in the form of R&D, but production takes place outside of the firms' U.S. establishments.View Full Paper PDF
-
Working PaperAllocation of Company Research and Development Expenditures to Industries Using a Tobit Model
November 2015
Working Paper Number:
CES-15-42
This paper uses Census microdata and a regression-based approach to assign multi-division firms' pre-2008 Research and Development (R&D) expenditures to more than one industry. Since multi-division firms conduct R&D in more than one industry, assigning R&D to corresponding industries provides a more accurate representation of where R&D actually takes place and provides a consistent time-series with the National Science Foundation R&D by line of business information. Firm R&D is allocated to industries on the basis of observed industry payroll, as befits the historic importance of payroll in Census assignments of firms to industry. The results demonstrate that the method of assigning R&D to industries on the basis of payroll works well in earlier years, but becomes less effective over time as firms outsource their manufacturing function.View Full Paper PDF
-
Working PaperBusiness Dynamics Statistics of High Tech Industries
January 2016
Working Paper Number:
CES-16-55
Modern market economies are characterized by the reallocation of resources from less productive, less valuable activities to more productive, more valuable ones. Businesses in the High Technology sector play a particularly important role in this reallocation by introducing new products and services that impact the entire economy. Tracking the performance of this sector is therefore of primary importance, especially in light of recent evidence that suggests a slowdown in business dynamism in High Tech industries. The Census Bureau produces the Business Dynamics Statistics (BDS), a suite of data products that track job creation, job destruction, startups, and exits by firm and establishment characteristics including sector, firm age, and firm size. In this paper we describe the methodologies used to produce a new extension to the BDS focused on businesses in High Technology industries.View Full Paper PDF
-
Working PaperCollaborative Micro-productivity Project: Establishment-Level Productivity Dataset, 1972-2020
December 2023
Working Paper Number:
CES-23-65
We describe the process for building the Collaborative Micro-productivity Project (CMP) microdata and calculating establishment-level productivity numbers. The documentation is for version 7 and the data cover the years 1972-2020. These data have been used in numerous research papers and are used to create the experimental public-use data product Dispersion Statistics on Productivity (DiSP).View Full Paper PDF
-
Working PaperA FIRST STEP TOWARDS A GERMAN SYNLBD: CONSTRUCTING A GERMAN LONGITUDINAL BUSINESS DATABASE
February 2014
Working Paper Number:
CES-14-13
One major criticism against the use of synthetic data has been that the efforts necessary to generate useful synthetic data are so in- tense that many statistical agencies cannot afford them. We argue many lessons in this evolving field have been learned in the early years of synthetic data generation, and can be used in the development of new synthetic data products, considerably reducing the required in- vestments. The final goal of the project described in this paper will be to evaluate whether synthetic data algorithms developed in the U.S. to generate a synthetic version of the Longitudinal Business Database (LBD) can easily be transferred to generate a similar data product for other countries. We construct a German data product with infor- mation comparable to the LBD - the German Longitudinal Business Database (GLBD) - that is generated from different administrative sources at the Institute for Employment Research, Germany. In a fu- ture step, the algorithms developed for the synthesis of the LBD will be applied to the GLBD. Extensive evaluations will illustrate whether the algorithms provide useful synthetic data without further adjustment. The ultimate goal of the project is to provide access to multiple synthetic datasets similar to the SynLBD at Cornell to enable comparative studies between countries. The Synthetic GLBD is a first step towards that goal.View Full Paper PDF
-
Working PaperRECOVERING THE ITEM-LEVEL EDIT AND IMPUTATION FLAGS IN THE 1977-1997 CENSUSES OF MANUFACTURES
September 2014
Working Paper Number:
CES-14-37
As part of processing the Census of Manufactures, the Census Bureau edits some data items and imputes for missing data and some data that is deemed erroneous. Until recently it was difficult for researchers using the plant-level microdata to determine which data items were changed or imputed during the editing and imputation process, because the edit/imputation processing flags were not available to researchers. This paper describes the process of reconstructing the edit/imputation flags for variables in the 1977, 1982, 1987, 1992, and 1997 Censuses of Manufactures using recently recovered Census Bureau files. Thepaper also reports summary statistics for the percentage of cases that are imputed for key variables. Excluding plants with fewer than 5 employees, imputation rates for several key variables range from 8% to 54% for the manufacturing sector as a whole, and from 1% to 72% at the 2-digit SIC industry level.View Full Paper PDF
-
Working PaperNEW DATA FOR DYNAMIC ANALYSIS: THE LONGITUDINAL ESTABLISHMENT AND ENTERPRISE MICRODATA (LEEM) FILE
December 1999
Working Paper Number:
CES-99-18
Until now, research on U.S. business activities over time has been hindered by the lack of accurate and comprehensive longitudinal data. The new Longitudinal Establishment and Enterprise Microdata (LEEM) are tremendously rich data that open up numerous possibilities for dynamic analyses of businesses in the U.S. economy. It is the first nationwide high-quality longitudinal database that covers the majority of employer businesses from all sectors of the economy. Due to the confidential nature of these data, the file is located at the Center for Economic Studies in the U.S. Bureau of the Census. To access the data, researchers must submit an acceptable proposal to CES and become sworn Census researchers. This paper describes the LEEM file, the variables contained on the file, and current uses of the data.View Full Paper PDF
-
Working PaperThe Effects of Industry Classification Changes on US Employment Composition
June 2018
Working Paper Number:
CES-18-28
This paper documents the extent to which compositional changes in US employment from 1976 to 2009 are due to changes in the industry classification scheme used to categorize economic activity. In 1997, US statistical agencies began implementation of a change from the Standard Industrial Classification System (SIC) to the North American Industrial Classification System (NAICS). NAICS was designed to provide a consistent classification scheme that consolidated declining or obsolete industries and added categories for new industries. Under NAICS, many activities previously classified as Manufacturing, Wholesale Trade, or Retail Trade were re-classified into the Services sector. This re-classification resulted in a significant shift of measured activities across sectors without any change in underlying economic activity. Using a newly developed establishment-level database of employment activity that is consistently classified on a NAICS basis, this paper shows that the change from SIC to NAICS increased the share of Services employment by approximately 36 percent. 7.6 percent of US manufacturing employment, equal to approximately 1.4 million jobs, was reclassified to services. Retail trade and wholesale trade also experienced a significant reclassification of activities in the transition.View Full Paper PDF
-
Working PaperConcording U.S. Harmonized System Categories Over Time
May 2009
Working Paper Number:
CES-09-11
This paper: outlines an algorithm for concording U.S. ten-digit Harmonized System export and import codes over time; describes the concordances we construct for 1989 to 2004; and provides Stata code that can be used to construct similar concordances for arbitrary beginning and ending years from 1989 to 2007.View Full Paper PDF