The classification and aggregation of manufacturing data is vital for the analysis and reporting of economic activity. Most organizations and researchers use the Standard Industrial Classification (SIC) system for this purpose. This is, however, not the only option. Our paper examines an alternative classification based on clustering activity using production technologies. While this approach yields results which are similar to the SIC, there are important differences between the two classifications in terms of the specific industrial categories and the amount of information lost through aggregation.
-
Exploring New Ways to Classify Industries for Energy Analysis and Modeling
November 2022
Working Paper Number:
CES-22-49
Combustion, other emitting processes and fossil energy use outside the power sector have become urgent concerns given the United States' commitment to achieving net-zero greenhouse gas emissions by 2050. Industry is an important end user of energy and relies on fossil fuels used directly for process heating and as feedstocks for a diverse range of applications. Fuel and energy use by industry is heterogeneous, meaning even a single product group can vary broadly in its production routes and associated energy use. In the United States, the North American Industry Classification System (NAICS) serves as the standard for statistical data collection and reporting. In turn, data based on NAICS are the foundation of most United States energy modeling. Thus, the effectiveness of NAICS at representing energy use is a limiting condition for current
expansive planning to improve energy efficiency and alternatives to fossil fuels in industry. Facility-level data could be used to build more detail into heterogeneous sectors and thus supplement data from Bureau of the Census and U.S Energy Information Administration reporting at NAICS code levels but are scarce. This work explores alternative classification schemes for industry based on energy use characteristics and validates an approach to estimate facility-level energy use from publicly available greenhouse gas emissions data from the U.S. Environmental Protection Agency (EPA). The approaches in this study can facilitate understanding of current, as well as possible future, energy demand.
First, current approaches to the construction of industrial taxonomies are summarized along with their usefulness for industrial energy modeling. Unsupervised machine learning techniques are then used to detect clusters in data reported from the U.S. Department of Energy's Industrial Assessment Center program. Clusters of Industrial Assessment Center data show similar levels of correlation between energy use and explanatory variables as three-digit NAICS codes. Interestingly, the clusters each include a large cross section of NAICS codes, which lends additional support to the idea that NAICS may not be particularly suited for correlation between energy use and the variables studied. Fewer clusters are needed for the same level of correlation as shown in NAICS codes. Initial assessment shows a reasonable level of separation using support vector machines with higher than 80% accuracy, so machine learning approaches may be promising for further analysis. The IAC data is focused on smaller and medium-sized facilities and is biased toward higher energy users for a given facility type. Cladistics, an approach for classification developed in biology, is adapted to energy and process characteristics of industries. Cladistics applied to industrial systems seeks to understand the progression of organizations and technology as a type of evolution, wherein traits are inherited from previous systems but evolve due to the emergence of inventions and variations and a selection process driven by adaptation to pressures and favorable outcomes. A cladogram is presented for evolutionary directions in the iron and steel sector. Cladograms are a promising tool for constructing scenarios and summarizing directions of sectoral innovation.
The cladogram of iron and steel is based on the drivers of energy use in the sector. Phylogenetic inference is similar to machine learning approaches as it is based on a machine-led search of the solution space, therefore avoiding some of the subjectivity of other classification systems. Our prototype approach for constructing an industry cladogram is based on process characteristics according to the innovation framework derived from Schumpeter to capture evolution in a given sector. The resulting cladogram represents a snapshot in time based on detailed study of process characteristics. This work could be an important tool for the design of scenarios for more detailed modeling. Cladograms reveal groupings of emerging or dominant processes and their implications in a way that may be helpful for policymakers and entrepreneurs, allowing them to see the larger picture, other good ideas, or competitors. Constructing a cladogram could be a good first step to analysis of many industries (e.g. nitrogenous fertilizer production, ethyl alcohol manufacturing), to understand their heterogeneity, emerging trends, and coherent groupings of related innovations.
Finally, validation is performed for facility-level energy estimates from the EPA Greenhouse Gas Reporting Program. Facility-level data availability continues to be a major challenge for industrial modeling. The method outlined by (McMillan et al. 2016; McMillan and Ruth 2019) allows estimating of facility level energy use based on mandatory greenhouse gas reporting. The validation provided here is an important step for further use of this data for industrial energy modeling.
View Full
Paper PDF
-
Primary Versus Secondary Production Techniques in U.S. Manufacturing
October 1994
Working Paper Number:
CES-94-12
In this paper we discuss and analyze a classical economic puzzle: whether differences in factor intensities reflect patterns of specialization or the co-existence of alternative techniques to produce output. We use observations on a large cross-section of U.S. manufacturing plants from the Census of Manufactures, including those that make goods primary to other industries, to study differences in production techniques. We find that in most cases material requirements do not depend on whether goods are made as primary products or as secondary products, which suggests that differences in factor intensities usually reflect patterns of specialization. A few cases where secondary production techniques do differ notably are discussed in more detail. However, overall the regression results support the neoclassical assumption that a single, best-practice technique is chosen for making each product.
View Full
Paper PDF
-
Price Dispersion in U.S. Manufacturing
October 1989
Working Paper Number:
CES-89-07
This paper addresses the question of whether products in the U.S. Manufacturing sector sell at a single (common) price, or whether prices vary across producers. The question of price dispersion is important for two reasons. First, if prices vary across producers, the standard method of using industry price deflators leads to errors in measuring real output at the firm or establishment level. These errors in turn lead to biased estimates of the production function and productivity growth equation as shown in Abbott (1988). Second, if prices vary across producers, it suggests that producers do not take prices as given but use price as a competitive variable. This has several implications for how economists model competitive behavior.
View Full
Paper PDF
-
CONSTRUCTION OF REGIONAL INPUT-OUTPUT TABLES FROM ESTABLISHMENT-LEVEL MICRODATA: ILLINOIS, 1982
August 1993
Working Paper Number:
CES-93-12
This paper presents a new method for use in the construction of hybrid regional input-output tables, based primarily on individual returns from the Census of Manufactures. Using this method, input- output tables can be completed at a fraction of the cost and time involved in the completion of a full survey table. Special attention is paid to secondary production, a problem often ignored by input-output analysts. A new method to handle secondary production is presented. The method reallocates the amount of secondary production and its associated inputs, on an establishment basis, based on the assumption that the input structure for any given commodity is determined not by the industry in which the commodity was produced, but by the commodity itself -- the commodity-based technology assumption. A biproportional adjustment technique is used to perform the reallocations.
View Full
Paper PDF
-
Price Dispersion In U.S. Manufacturing: Implications For The Aggregation Of Products And Firms
March 1992
Working Paper Number:
CES-92-03
This paper addresses the question of whether products in the U.S. Manufacturing sector sell at a single (common) price, or whether prices vary across producers. Price dispersion is interesting for at least two reasons. First, if output prices vary across producers, standard methods of using industry price deflators lead to errors in measuring real output at the industry, firm, and establishment level which may bias estimates of the production function and productivity growth. Second, price dispersion suggests product heterogeneity which, if consumers do not have identical preferences, could lead to market segmentation and price in excess of marginal cost, thus making the current (competitive) characterization of the Manufacturing sector inappropriate and invalidating many empirical studies. In the course of examining these issues, the paper develops a robust measure of price dispersion as well as new quantitative methods for testing whether observed price differences are the result of differences in product quality. Our results indicate that price dispersion is widespread throughout manufacturing and that for at least one industry, Hydraulic Cement, it is not the result of differences in product quality.
View Full
Paper PDF
-
Multiple Classification Systems For Economic Data: Can A Thousand Flowers Bloom? And Should They?
December 1991
Working Paper Number:
CES-91-08
The principle that the statistical system should provide flexibility-- possibilities for generating multiple groupings of data to satisfy multiple objectives--if it is to satisfy users is universally accepted. Yet in practice, this goal has not been achieved. This paper discusses the feasibility of providing flexibility in the statistical system to accommodate multiple uses of the industrial data now primarily examined within the Standard Industrial Classification (SIC) system. In one sense, the question of feasibility is almost trivial. With today's computer technology, vast amounts of data can be manipulated and stored at very low cost. Reconfigurations of the basic data are very inexpensive compared to the cost of collecting the data. Flexibility in the statistical system implies more than the technical ability to regroup data. It requires that the basic data are sufficiently detailed to support user needs and are processed and maintained in a fashion that makes the use of a variety of aggregation rules possible. For this to happen, statistical agencies must recognize the need for high quality microdata and build this into their planning processes. Agencies need to view their missions from a multiple use perspective and move away from use of a primary reporting and collection vehicle. Although the categories used to report data must be flexible, practical considerations dictate that data collection proceed within a fixed classification system. It is simply too expensive for both respondents and statistical agencies to process survey responses in the absence of standardized forms, data entry programs, etc. I argue for a basic classification centered on commodities--products, services, raw materials and labor inputs--as the focus of data collection. The idea is to make the principle variables of interest--the commodities--the vehicle for the collection and processing of the data. For completeness, the basic classification should include labor usage through some form of occupational classification. In most economic surveys at the Census Bureau, the reporting unit and the classified unit have been the establishment. But there is no need for this to be so. The basic principle to be followed in data collection is that the data should be collected in the most efficient way--efficiency being defined jointly in terms of statistical agency collection costs and respondent burdens.
View Full
Paper PDF
-
Grouped Variation in Factor Shares: An Application to Misallocation
August 2022
Working Paper Number:
CES-22-33
A striking feature of micro-level plant data is the presence of significant variation in factor cost shares across plants within an industry. We develop a methodology to decompose cost shares into idiosyncratic and group-specific components. In particular, we carry out a cluster analysis to recover the number and membership of groups using breaks in the dispersion of factor cost shares across plants. We apply our methodology to Chilean plant-level data and find that group-specific variation accounts for approximately one-third of the variation in factor shares across firms. We also study the implications ofthese groups in cost shares on the gains from eliminating misallocation. We place bounds on their importance and find that ignoring them can overstate the gains from eliminating misallocation by up to one-third.
View Full
Paper PDF
-
The Extent and Nature of Establishment Level Diversification in Sixteen U.S. Manufacturing Industries
August 1990
Working Paper Number:
CES-90-08
This paper examines the heterogeneity of establishments in sixteen manufacturing industries. Basic statistical measures are used to decompose product diversification at the establishment level into industry, firm, and establishment effects. The industry effect is the weakest; nearly all the observed heterogeneity is establishment specific. Product diversification at the establishment level is idiosyncratic to the firm. Establishments within a firm exhibit a significant degree of homogeneity, although the grouping of products differ across firms. With few exceptions, economies of scope and scale in production appear to play a minor role in the establishment's mix of outputs.
View Full
Paper PDF
-
Measuring The Trade Balance In Advanced Technology Products
January 1989
Working Paper Number:
CES-89-01
Because of the dramatic decline in the United States Trade Balance since the early 1970's, many economists and policy makers have become increasingly concerned about the ability of U.S. manufacturers to compete with foreign producers. Initially concern was limited to a few basic industries such as shoes, clothing, and steel; but more recently foreign producers have been effectively competing with U.S. manufacturers in automobiles, electronics, and other consumer products. It now seems that foreign producers are even challenging the dominance of America in high technology industries. The most recent publication from the International Trade Administration shows that the U.S. Trade Balance in high technology industries fell from a $24 billion surplus in 1982, to a $2.6 billion deficit in 1986, before rebounding to a $591 million surplus in 1987. As part of the efforts of the U.S. Census Bureau to provide policy makers and other interested parties with the most complete and accurate information possible, we recently completed a review of the methodology and data used to construct trade statistics in the area of high technology trade. Our findings suggest that the statistics presented by the International Trade Administration, although technically correct, do not provide an accurate picture of international trade in high or advanced technology products because of the level of aggregation used in their construction. The ITA statistics are based on the Department of Commerce's DOC3 definition of high technology industries. The DOC3 definition requires that each product classified in a high tech industry be designated high tech. As a result, many products which would not individually be considered high tech are included in the statistics. After developing a disaggregate, product- based measure of international trade in Advanced Technology Products (ATP), we find that although the trade balance in these products did decline over the 1982-1987 period, the decline is much smaller (about $5 billion) than reported by ITA (approximately $24 billion). This paper discusses the methodology used to define the ATP measure, contrasts it to the DOC3 measure, and provides a comparison of the resulting statistics. After discussing alternative approaches to identifying advanced technology products, Section 2 describes the advanced technologies in the classification. (Appendix A, provides definitions and examples of the products which embody these technologies. In addition, Appendix B, available on request, provides a comprehensive list of Advanced Technology Products by technology grouping.) Having described the ATPs, Section 3 examines annual trade statistics for ATP products, in 1982, 1986, and 1987, and compares these statistics with equivalent ones based on the DOC3 measure. The differences between the two measures over the 1982- 87 period stem from changes in the balance of trade of items included in the DOC3 measure but excluded by the Census ATP measure; i.e. the differences are due to changes in the trade balance of "low tech" products which are produced in "high tech" industries. This finding corroborates a principal argument for construction of the ATP measure, that the weakness of the DOC3 measure of high technology trade is the level of aggregation used in its construction. It also suggests that at the level of individual products the high technology sectors of the economy continue to enjoy a strong comparative advantage and are surprisingly healthy. Nonetheless, some areas of weakness are identified, such as low tech products in high tech industries. (Appendix C, supplements this material by providing a detailed listing of traded products included and excluded from the Advanced Technology definition for each DOC3 high tech commodity grouping. These Tables enable the reader to directly assess the Census classification.)
View Full
Paper PDF
-
Matching Addresses between Household Surveys and Commercial Data
July 2015
Working Paper Number:
carra-2015-04
Matching third-party data sources to household surveys can benefit household surveys in a number of ways, but the utility of these new data sources depends critically on our ability to link units between data sets. To understand this better, this report discusses potential modifications to the existing match process that could potentially improve our matches. While many changes to the matching procedure produce marginal improvements in match rates, substantial increases in match rates can only be achieved by relaxing the definition of a successful match. In the end, the results show that the most important factor determining the success of matching procedures is the quality and composition of the data sets being matched.
View Full
Paper PDF