The classification and aggregation of manufacturing data is vital for the analysis and reporting of economic activity. Most organizations and researchers use the Standard Industrial Classification (SIC) system for this purpose. This is, however, not the only option. Our paper examines an alternative classification based on clustering activity using production technologies. While this approach yields results which are similar to the SIC, there are important differences between the two classifications in terms of the specific industrial categories and the amount of information lost through aggregation.
-
Price Dispersion in U.S. Manufacturing
October 1989
Working Paper Number:
CES-89-07
This paper addresses the question of whether products in the U.S. Manufacturing sector sell at a single (common) price, or whether prices vary across producers. The question of price dispersion is important for two reasons. First, if prices vary across producers, the standard method of using industry price deflators leads to errors in measuring real output at the firm or establishment level. These errors in turn lead to biased estimates of the production function and productivity growth equation as shown in Abbott (1988). Second, if prices vary across producers, it suggests that producers do not take prices as given but use price as a competitive variable. This has several implications for how economists model competitive behavior.
View Full
Paper PDF
-
Primary Versus Secondary Production Techniques in U.S. Manufacturing
October 1994
Working Paper Number:
CES-94-12
In this paper we discuss and analyze a classical economic puzzle: whether differences in factor intensities reflect patterns of specialization or the co-existence of alternative techniques to produce output. We use observations on a large cross-section of U.S. manufacturing plants from the Census of Manufactures, including those that make goods primary to other industries, to study differences in production techniques. We find that in most cases material requirements do not depend on whether goods are made as primary products or as secondary products, which suggests that differences in factor intensities usually reflect patterns of specialization. A few cases where secondary production techniques do differ notably are discussed in more detail. However, overall the regression results support the neoclassical assumption that a single, best-practice technique is chosen for making each product.
View Full
Paper PDF
-
Exploring New Ways to Classify Industries for Energy Analysis and Modeling
November 2022
Working Paper Number:
CES-22-49
Combustion, other emitting processes and fossil energy use outside the power sector have become urgent concerns given the United States' commitment to achieving net-zero greenhouse gas emissions by 2050. Industry is an important end user of energy and relies on fossil fuels used directly for process heating and as feedstocks for a diverse range of applications. Fuel and energy use by industry is heterogeneous, meaning even a single product group can vary broadly in its production routes and associated energy use. In the United States, the North American Industry Classification System (NAICS) serves as the standard for statistical data collection and reporting. In turn, data based on NAICS are the foundation of most United States energy modeling. Thus, the effectiveness of NAICS at representing energy use is a limiting condition for current
expansive planning to improve energy efficiency and alternatives to fossil fuels in industry. Facility-level data could be used to build more detail into heterogeneous sectors and thus supplement data from Bureau of the Census and U.S Energy Information Administration reporting at NAICS code levels but are scarce. This work explores alternative classification schemes for industry based on energy use characteristics and validates an approach to estimate facility-level energy use from publicly available greenhouse gas emissions data from the U.S. Environmental Protection Agency (EPA). The approaches in this study can facilitate understanding of current, as well as possible future, energy demand.
First, current approaches to the construction of industrial taxonomies are summarized along with their usefulness for industrial energy modeling. Unsupervised machine learning techniques are then used to detect clusters in data reported from the U.S. Department of Energy's Industrial Assessment Center program. Clusters of Industrial Assessment Center data show similar levels of correlation between energy use and explanatory variables as three-digit NAICS codes. Interestingly, the clusters each include a large cross section of NAICS codes, which lends additional support to the idea that NAICS may not be particularly suited for correlation between energy use and the variables studied. Fewer clusters are needed for the same level of correlation as shown in NAICS codes. Initial assessment shows a reasonable level of separation using support vector machines with higher than 80% accuracy, so machine learning approaches may be promising for further analysis. The IAC data is focused on smaller and medium-sized facilities and is biased toward higher energy users for a given facility type. Cladistics, an approach for classification developed in biology, is adapted to energy and process characteristics of industries. Cladistics applied to industrial systems seeks to understand the progression of organizations and technology as a type of evolution, wherein traits are inherited from previous systems but evolve due to the emergence of inventions and variations and a selection process driven by adaptation to pressures and favorable outcomes. A cladogram is presented for evolutionary directions in the iron and steel sector. Cladograms are a promising tool for constructing scenarios and summarizing directions of sectoral innovation.
The cladogram of iron and steel is based on the drivers of energy use in the sector. Phylogenetic inference is similar to machine learning approaches as it is based on a machine-led search of the solution space, therefore avoiding some of the subjectivity of other classification systems. Our prototype approach for constructing an industry cladogram is based on process characteristics according to the innovation framework derived from Schumpeter to capture evolution in a given sector. The resulting cladogram represents a snapshot in time based on detailed study of process characteristics. This work could be an important tool for the design of scenarios for more detailed modeling. Cladograms reveal groupings of emerging or dominant processes and their implications in a way that may be helpful for policymakers and entrepreneurs, allowing them to see the larger picture, other good ideas, or competitors. Constructing a cladogram could be a good first step to analysis of many industries (e.g. nitrogenous fertilizer production, ethyl alcohol manufacturing), to understand their heterogeneity, emerging trends, and coherent groupings of related innovations.
Finally, validation is performed for facility-level energy estimates from the EPA Greenhouse Gas Reporting Program. Facility-level data availability continues to be a major challenge for industrial modeling. The method outlined by (McMillan et al. 2016; McMillan and Ruth 2019) allows estimating of facility level energy use based on mandatory greenhouse gas reporting. The validation provided here is an important step for further use of this data for industrial energy modeling.
View Full
Paper PDF
-
Price Dispersion In U.S. Manufacturing: Implications For The Aggregation Of Products And Firms
March 1992
Working Paper Number:
CES-92-03
This paper addresses the question of whether products in the U.S. Manufacturing sector sell at a single (common) price, or whether prices vary across producers. Price dispersion is interesting for at least two reasons. First, if output prices vary across producers, standard methods of using industry price deflators lead to errors in measuring real output at the industry, firm, and establishment level which may bias estimates of the production function and productivity growth. Second, price dispersion suggests product heterogeneity which, if consumers do not have identical preferences, could lead to market segmentation and price in excess of marginal cost, thus making the current (competitive) characterization of the Manufacturing sector inappropriate and invalidating many empirical studies. In the course of examining these issues, the paper develops a robust measure of price dispersion as well as new quantitative methods for testing whether observed price differences are the result of differences in product quality. Our results indicate that price dispersion is widespread throughout manufacturing and that for at least one industry, Hydraulic Cement, it is not the result of differences in product quality.
View Full
Paper PDF
-
The Extent and Nature of Establishment Level Diversification in Sixteen U.S. Manufacturing Industries
August 1990
Working Paper Number:
CES-90-08
This paper examines the heterogeneity of establishments in sixteen manufacturing industries. Basic statistical measures are used to decompose product diversification at the establishment level into industry, firm, and establishment effects. The industry effect is the weakest; nearly all the observed heterogeneity is establishment specific. Product diversification at the establishment level is idiosyncratic to the firm. Establishments within a firm exhibit a significant degree of homogeneity, although the grouping of products differ across firms. With few exceptions, economies of scope and scale in production appear to play a minor role in the establishment's mix of outputs.
View Full
Paper PDF
-
Grouped Variation in Factor Shares: An Application to Misallocation
August 2022
Working Paper Number:
CES-22-33
A striking feature of micro-level plant data is the presence of significant variation in factor cost shares across plants within an industry. We develop a methodology to decompose cost shares into idiosyncratic and group-specific components. In particular, we carry out a cluster analysis to recover the number and membership of groups using breaks in the dispersion of factor cost shares across plants. We apply our methodology to Chilean plant-level data and find that group-specific variation accounts for approximately one-third of the variation in factor shares across firms. We also study the implications ofthese groups in cost shares on the gains from eliminating misallocation. We place bounds on their importance and find that ignoring them can overstate the gains from eliminating misallocation by up to one-third.
View Full
Paper PDF
-
Testing the Advantages of Using Product Level Data to Create Linkages Across Industrial Coding Systems
October 1993
Working Paper Number:
CES-93-14
After the major revision of the U.S. Standard Industrial Classification system (SIC) in the 1987, the problem arose of how to evaluate industrial performance over time. The revision resulted in the creation of new industries, the combination of old industries, and the remixing of other industries to better reflect the present U.S. economy. A method had to be developed to make the old and new sets of industries comparable over time. Ryten (1991) argues for performing the conversion at the "most micro level," the product level. Linking industries should be accomplished by reclassifying product data of each establishment to a standard system, reassigning the primary activity of the establishment, reaggregating the data to the industry level, and then making the desired statistical comparison (Ryten, 1991). This paper discusses linking the data at the very micro, product level, and at the more macro, industry level. The results suggest that with complete product information the product level conversion is preferable for most industries in manufacturing because it recognizes that establishments may switch their primary industry because of the conversion. For some industries, especially those having no substantial changes in SIC codes over time, the conversion at the industry level is fairly accurate. A small group of industries lacks complete product information in 1982 to link the 1982 product codes to the 1987 codes. This results in having to rely on the industry concordance to create a time series of statistics.
View Full
Paper PDF
-
The Person Identification Validation System (PVS): Applying the Center for Administrative Records Research and Applications' (CARRA) Record Linkage Software
July 2014
Working Paper Number:
carra-2014-01
The Census Bureau's Person Identification Validation System (PVS) assigns unique person identifiers to federal, commercial, census, and survey data to facilitate linkages across and within files. PVS uses probabilistic matching to assign a unique Census Bureau identifier for each person. The PVS matches incoming files to reference files created with data from the Social Security Administration (SSA) Numerical Identification file, and SSA data with addresses obtained from federal files. This paper describes the PVS methodology from editing input data to creating the final file.
View Full
Paper PDF
-
Measuring The Trade Balance In Advanced Technology Products
January 1989
Working Paper Number:
CES-89-01
Because of the dramatic decline in the United States Trade Balance since the early 1970's, many economists and policy makers have become increasingly concerned about the ability of U.S. manufacturers to compete with foreign producers. Initially concern was limited to a few basic industries such as shoes, clothing, and steel; but more recently foreign producers have been effectively competing with U.S. manufacturers in automobiles, electronics, and other consumer products. It now seems that foreign producers are even challenging the dominance of America in high technology industries. The most recent publication from the International Trade Administration shows that the U.S. Trade Balance in high technology industries fell from a $24 billion surplus in 1982, to a $2.6 billion deficit in 1986, before rebounding to a $591 million surplus in 1987. As part of the efforts of the U.S. Census Bureau to provide policy makers and other interested parties with the most complete and accurate information possible, we recently completed a review of the methodology and data used to construct trade statistics in the area of high technology trade. Our findings suggest that the statistics presented by the International Trade Administration, although technically correct, do not provide an accurate picture of international trade in high or advanced technology products because of the level of aggregation used in their construction. The ITA statistics are based on the Department of Commerce's DOC3 definition of high technology industries. The DOC3 definition requires that each product classified in a high tech industry be designated high tech. As a result, many products which would not individually be considered high tech are included in the statistics. After developing a disaggregate, product- based measure of international trade in Advanced Technology Products (ATP), we find that although the trade balance in these products did decline over the 1982-1987 period, the decline is much smaller (about $5 billion) than reported by ITA (approximately $24 billion). This paper discusses the methodology used to define the ATP measure, contrasts it to the DOC3 measure, and provides a comparison of the resulting statistics. After discussing alternative approaches to identifying advanced technology products, Section 2 describes the advanced technologies in the classification. (Appendix A, provides definitions and examples of the products which embody these technologies. In addition, Appendix B, available on request, provides a comprehensive list of Advanced Technology Products by technology grouping.) Having described the ATPs, Section 3 examines annual trade statistics for ATP products, in 1982, 1986, and 1987, and compares these statistics with equivalent ones based on the DOC3 measure. The differences between the two measures over the 1982- 87 period stem from changes in the balance of trade of items included in the DOC3 measure but excluded by the Census ATP measure; i.e. the differences are due to changes in the trade balance of "low tech" products which are produced in "high tech" industries. This finding corroborates a principal argument for construction of the ATP measure, that the weakness of the DOC3 measure of high technology trade is the level of aggregation used in its construction. It also suggests that at the level of individual products the high technology sectors of the economy continue to enjoy a strong comparative advantage and are surprisingly healthy. Nonetheless, some areas of weakness are identified, such as low tech products in high tech industries. (Appendix C, supplements this material by providing a detailed listing of traded products included and excluded from the Advanced Technology definition for each DOC3 high tech commodity grouping. These Tables enable the reader to directly assess the Census classification.)
View Full
Paper PDF
-
CONSTRUCTION OF REGIONAL INPUT-OUTPUT TABLES FROM ESTABLISHMENT-LEVEL MICRODATA: ILLINOIS, 1982
August 1993
Working Paper Number:
CES-93-12
This paper presents a new method for use in the construction of hybrid regional input-output tables, based primarily on individual returns from the Census of Manufactures. Using this method, input- output tables can be completed at a fraction of the cost and time involved in the completion of a full survey table. Special attention is paid to secondary production, a problem often ignored by input-output analysts. A new method to handle secondary production is presented. The method reallocates the amount of secondary production and its associated inputs, on an establishment basis, based on the assumption that the input structure for any given commodity is determined not by the industry in which the commodity was produced, but by the commodity itself -- the commodity-based technology assumption. A biproportional adjustment technique is used to perform the reallocations.
View Full
Paper PDF