CREAT: Census Research Exploration and Analysis Tool

Papers Containing Tag(s): 'United Nations'

The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.
Click here to search again

Frequently Occurring Concepts within this Search

No authors occur more than twice in this search.

Viewing papers 1 through 10 of 12


  • Working Paper

    Exploring New Ways to Classify Industries for Energy Analysis and Modeling

    November 2022

    Working Paper Number:

    CES-22-49

    Combustion, other emitting processes and fossil energy use outside the power sector have become urgent concerns given the United States' commitment to achieving net-zero greenhouse gas emissions by 2050. Industry is an important end user of energy and relies on fossil fuels used directly for process heating and as feedstocks for a diverse range of applications. Fuel and energy use by industry is heterogeneous, meaning even a single product group can vary broadly in its production routes and associated energy use. In the United States, the North American Industry Classification System (NAICS) serves as the standard for statistical data collection and reporting. In turn, data based on NAICS are the foundation of most United States energy modeling. Thus, the effectiveness of NAICS at representing energy use is a limiting condition for current expansive planning to improve energy efficiency and alternatives to fossil fuels in industry. Facility-level data could be used to build more detail into heterogeneous sectors and thus supplement data from Bureau of the Census and U.S Energy Information Administration reporting at NAICS code levels but are scarce. This work explores alternative classification schemes for industry based on energy use characteristics and validates an approach to estimate facility-level energy use from publicly available greenhouse gas emissions data from the U.S. Environmental Protection Agency (EPA). The approaches in this study can facilitate understanding of current, as well as possible future, energy demand. First, current approaches to the construction of industrial taxonomies are summarized along with their usefulness for industrial energy modeling. Unsupervised machine learning techniques are then used to detect clusters in data reported from the U.S. Department of Energy's Industrial Assessment Center program. Clusters of Industrial Assessment Center data show similar levels of correlation between energy use and explanatory variables as three-digit NAICS codes. Interestingly, the clusters each include a large cross section of NAICS codes, which lends additional support to the idea that NAICS may not be particularly suited for correlation between energy use and the variables studied. Fewer clusters are needed for the same level of correlation as shown in NAICS codes. Initial assessment shows a reasonable level of separation using support vector machines with higher than 80% accuracy, so machine learning approaches may be promising for further analysis. The IAC data is focused on smaller and medium-sized facilities and is biased toward higher energy users for a given facility type. Cladistics, an approach for classification developed in biology, is adapted to energy and process characteristics of industries. Cladistics applied to industrial systems seeks to understand the progression of organizations and technology as a type of evolution, wherein traits are inherited from previous systems but evolve due to the emergence of inventions and variations and a selection process driven by adaptation to pressures and favorable outcomes. A cladogram is presented for evolutionary directions in the iron and steel sector. Cladograms are a promising tool for constructing scenarios and summarizing directions of sectoral innovation. The cladogram of iron and steel is based on the drivers of energy use in the sector. Phylogenetic inference is similar to machine learning approaches as it is based on a machine-led search of the solution space, therefore avoiding some of the subjectivity of other classification systems. Our prototype approach for constructing an industry cladogram is based on process characteristics according to the innovation framework derived from Schumpeter to capture evolution in a given sector. The resulting cladogram represents a snapshot in time based on detailed study of process characteristics. This work could be an important tool for the design of scenarios for more detailed modeling. Cladograms reveal groupings of emerging or dominant processes and their implications in a way that may be helpful for policymakers and entrepreneurs, allowing them to see the larger picture, other good ideas, or competitors. Constructing a cladogram could be a good first step to analysis of many industries (e.g. nitrogenous fertilizer production, ethyl alcohol manufacturing), to understand their heterogeneity, emerging trends, and coherent groupings of related innovations. Finally, validation is performed for facility-level energy estimates from the EPA Greenhouse Gas Reporting Program. Facility-level data availability continues to be a major challenge for industrial modeling. The method outlined by (McMillan et al. 2016; McMillan and Ruth 2019) allows estimating of facility level energy use based on mandatory greenhouse gas reporting. The validation provided here is an important step for further use of this data for industrial energy modeling.
    View Full Paper PDF
  • Working Paper

    Are Customs Records Consistent Across Countries? Evidence from the U.S. and Colombia

    March 2020

    Working Paper Number:

    CES-20-11

    In many countries, official customs records include identifying information on the exporting and importing firms involved in each shipment. This information allows researchers to study international business networks, offshoring patterns, and the micro-foundations of aggregate trade flows. It also provides the government with a basis for tariff assessments at the border. However, there are no mechanisms in place to ensure that the shipment-level information recorded by the exporting country is consistent with the shipment-level information recorded by the importing country. And to the extent that there are discrepancies, it is not clear how prevalent they are or what form they take. In this paper we explore these issues, both to enhance our understanding of the limitations of customs records, and to inform future discussions of possible revisions in the way they are collected. Specifically, we match U.S.-bound export shipments that appear in Colombian Customs records (DIAN) with their counterparts in the US Customs records (LFTTD): U.S. import shipments from Colombia. Several patterns emerge. First, differences in the coverage of the two countries customs records lead to significant discrepancies in the official bilateral trade flow statistics of these two countries: the DIAN database records 8 percent fewer transactions than the LFTTD database over the sample period, and the average export shipment size in the DIAN is roughly 4 percent smaller than the corresponding import shipment size in the LFTTD. These discrepancies are not due to difference in minimum shipment sizes and they are not particular to a few sectors, though they are more common among small shipments and they evolve over time. Second, if we rely exclusively on firms' names and addresses, ignoring other shipment characteristics (value, product code, etc.), we are able to match 85 percent of the value of U.S. imports from Colombia in our LFTTD sample with particular Colombian suppliers in the DIAN. Further, fully 97 percent of the value of Colombian exports to the U.S. can be mapped onto particular importers in the U.S. LFTTD. Third, however, match rates at the shipment level within buyer-seller pairs are low. That is, while buyers and sellers can be paired up fairly accurately, only 25-30 percent of the individual transactions in the customs records of the two countries can be matched using fuzzy algorithms at reasonable tolerance levels. Fourth, the manufacturer ID (MANUF_ID) that appears in the LFTTD implies there are roughly twice as many Colombian exporters as actually appear in the DIAN. And similar comments apply to an analogous MANUF_ID variable constructed from importer name and address information in the DIAN. Hence studies that treat each MANUF_ID value as a distinct firm are almost surely overstating the number of foreign firms that engage in trade with the U.S. by a substantial amount. Finally, we conclude that if countries were to require that exporters report standardized shipment identifiers'either invoice numbers or bill of lading/air waybill numbers'it would be far easier to track individual transactions and to identify international discrepancies in reporting.
    View Full Paper PDF
  • Working Paper

    Disclosure Limitation and Confidentiality Protection in Linked Data

    January 2018

    Working Paper Number:

    CES-18-07

    Confidentiality protection for linked administrative data is a combination of access modalities and statistical disclosure limitation. We review traditional statistical disclosure limitation methods and newer methods based on synthetic data, input noise infusion and formal privacy. We discuss how these methods are integrated with access modalities by providing three detailed examples. The first example is the linkages in the Health and Retirement Study to Social Security Administration data. The second example is the linkage of the Survey of Income and Program Participation to administrative data from the Internal Revenue Service and the Social Security Administration. The third example is the Longitudinal Employer-Household Dynamics data, which links state unemployment insurance records for workers and firms to a wide variety of censuses and surveys at the U.S. Census Bureau. For examples, we discuss access modalities, disclosure limitation methods, the effectiveness of those methods, and the resulting analytical validity. The final sections discuss recent advances in access modalities for linked administrative data.
    View Full Paper PDF
  • Working Paper

    An 'Algorithmic Links with Probabilities' Crosswalk for USPC and CPC Patent Classifications with an Application Towards Industrial Technology Composition

    March 2016

    Working Paper Number:

    CES-16-15

    Patents are a useful proxy for innovation, technological change, and diffusion. However, fully exploiting patent data for economic analyses requires patents be tied to measures of economic activity, which has proven to be difficult. Recently, Lybbert and Zolas (2014) have constructed an International Patent Classification (IPC) to industry classification crosswalk using an 'Algorithmic Links with Probabilities' approach. In this paper, we utilize a similar approach and apply it to new patent classification schemes, the U.S. Patent Classification (USPC) system and Cooperative Patent Classification (CPC) system. The resulting USPC-Industry and CPC-Industry concordances link both U.S. and global patents to multiple vintages of the North American Industrial Classification System (NAICS), International Standard Industrial Classification (ISIC), Harmonized System (HS) and Standard International Trade Classification (SITC). We then use the crosswalk to highlight changes to industrial technology composition over time. We find suggestive evidence of strong persistence in the association between technologies and industries over time.
    View Full Paper PDF
  • Working Paper

    Business Dynamics Statistics of High Tech Industries

    January 2016

    Working Paper Number:

    CES-16-55

    Modern market economies are characterized by the reallocation of resources from less productive, less valuable activities to more productive, more valuable ones. Businesses in the High Technology sector play a particularly important role in this reallocation by introducing new products and services that impact the entire economy. Tracking the performance of this sector is therefore of primary importance, especially in light of recent evidence that suggests a slowdown in business dynamism in High Tech industries. The Census Bureau produces the Business Dynamics Statistics (BDS), a suite of data products that track job creation, job destruction, startups, and exits by firm and establishment characteristics including sector, firm age, and firm size. In this paper we describe the methodologies used to produce a new extension to the BDS focused on businesses in High Technology industries.
    View Full Paper PDF
  • Working Paper

    Cheap Imports and the Loss of U.S. Manufacturing Jobs

    January 2016

    Working Paper Number:

    CES-16-05

    This paper examines the role of international trade, and specifically imports from low-wage countries, in determining patterns of job loss in U.S. manufacturing industries between 1992 and 2007. Motivated by intuitions from factor-proportions-inspired work on offshoring and heterogeneous firms in trade, we build industry-level measures of import competition. Combining worker data from the Longitudinal Employer-Household Dynamics dataset, detailed establishment information from the Census of Manufactures, and transaction-level trade data, we find that rising import competition from China and other developing economies increases the likelihood of job loss among manufacturing workers with less than a high school degree; it is not significantly related to job losses for workers with at least a college degree.
    View Full Paper PDF
  • Working Paper

    Task Trade and the Wage Effects of Import Competition

    January 2016

    Working Paper Number:

    CES-16-03

    Do job characteristics modulate the relationship between import competition and the wages of workers who perform those jobs? This paper tests the claim that workers in occupations featuring highly routine tasks will be more vulnerable to low-wage country import competition. Using data from the US Census Bureau, we construct a pooled cross-section (1990, 2000, and 2007) of more than 1.6 million individuals linked to the establishment in which they work. Occupational measures of vulnerability to trade competition ' routineness, analytic complexity, and interpersonal interaction on the job ' are constructed using O*NET data. The linked employer-employee data allow us to model the effect of low-wage import competition on the wages of workers with different occupational characteristics. Our results show that low-wage country import competition is associated with lower wages for US workers holding jobs that are highly routine and less complex. For workers holding nonroutine and highly complex jobs, increased import competition is associated with higher wages. Finally, workers in occupations with the highest and lowest levels of interpersonal interaction see higher wages, while workers with medium-low levels of interpersonal interaction suffer lower wages with increased low-wage import competition. These findings demonstrate the importance of accounting for occupational characteristics to more fully understand the relationship between trade and wages, and suggest ways in which task trade vulnerable occupations can disadvantage workers even when their jobs remain onshore.
    View Full Paper PDF
  • Working Paper

    Exploring Administrative Records Use for Race and Hispanic Origin Item Non-Response

    December 2014

    Working Paper Number:

    carra-2014-16

    Race and Hispanic origin data are required to produce official statistics in the United States. Data collected through the American Community Survey and decennial census address missing data through traditional imputation methods, often relying on information from neighbors. These methods work well if neighbors share similar characteristics, however, the shape and patterns of neighborhoods in the United States are changing. Administrative records may provide more accurate data compared to traditional imputation methods for missing race and Hispanic origin responses. This paper first describes the characteristics of persons with missing demographic data, then assesses the coverage of administrative records data for respondents who do not answer race and Hispanic origin questions in Census data. The paper also discusses the distributional impact of using administrative records race and Hispanic origin data to complete missing responses in a decennial census or survey context.
    View Full Paper PDF
  • Working Paper

    MEASURING 'FACTORYLESS' MANUFACTURING: EVIDENCE FROM U.S. SURVEYS

    August 2013

    Working Paper Number:

    CES-13-44

    'Factoryless' manufacturers, as defined by the U.S. OMB, perform underlying entrepreneurial components of arranging the factors of production but outsource all of the actual transformation activities to other specialized units. This paper describes efforts to measure 'factoryless' manufacturing through analyzing data on contract manufacturing services (CMS). We explore two U.S. firm surveys that report data on CMS activities and discuss challenges with identifying and collecting data on entities that are part of global value chains.
    View Full Paper PDF
  • Working Paper

    Getting Patents and Economic Data to Speak to Each Other: An 'Algorithmic Links with Probabilities' Approach for Joint Analyses of Patenting and Economic Activity

    September 2012

    Working Paper Number:

    CES-12-16

    International technological diffusion is a key determinant of cross-country differences in economic performance. While patents can be a useful proxy for innovation and technological change and diffusion, fully exploiting patent data for such economic analyses requires patents to be tied to measures of economic activity. In this paper, we describe and explore a new algorithmic approach to constructing concordances between the International Patent Classification (IPC) system that organizes patents by technical features and industry classification systems that organize economic data, such as the Standard International Trade Classification (SITC), the International Standard Industrial Classification (ISIC) and the Harmonized System (HS). This 'Algorithmic Links with Probabilities' (ALP) approach incorporates text analysis software and keyword extraction programs and applies them to a comprehensive patent dataset. We compare the results of several ALP concordances to existing technology concordances. Based on these comparisons, we select a preferred ALP approach and discuss advantages of this approach relative to conventional approaches. We conclude with a discussion on some of the possible applications of the concordance and provide a sample analysis that uses our preferred ALP concordance to analyze international patent flows based on trade patterns.
    View Full Paper PDF