Characterizing the work that people do on their jobs is a longstanding and core issue in labor economics. Traditionally, classification has been done manually. If it were possible to combine new computational tools and administrative wage records to generate an automated crosswalk between job titles and occupations, millions of dollars could be saved in labor costs, data processing could be sped up, data could become more consistent, and it might be possible to generate, without a lag, current information about the changing occupational composition of the labor market. This paper examines the potential to assign occupations to job titles contained in administrative data using automated, machine-learning approaches. We use a new extraordinarily rich and detailed set of data on transactional HR records of large firms (universities) in a relatively narrowly defined industry (public institutions of higher education) to identify the potential for machine-learning approaches to classify occupations.
-
A Task-based Approach to Constructing Occupational Categories
with Implications for Empirical Research in Labor Economics
September 2019
Working Paper Number:
CES-19-27
Most applied research in labor economics that examines returns to worker skills or differences in earnings across subgroups of workers typically accounts for the role of occupations by controlling for occupational categories. Researchers often aggregate detailed occupations into categories based on the Standard Occupation Classification (SOC) coding scheme, which is based largely on narratives or qualitative measures of workers' tasks. Alternatively, we propose two quantitative task-based approaches to constructing occupational categories by using factor analysis with O*NET job descriptors that provide a rich set of continuous measures of job tasks across all occupations. We find that our task-based approach outperforms the SOC-based approach in terms of lower occupation distance measures. We show that our task-based approach provides an intuitive, nuanced interpretation for grouping occupations and permits quantitative assessments of similarities in task compositions across occupations. We also replicate a recent analysis and find that our task-based occupational categories explain more of the gender wage gap than the SOC-based approaches explain. Our study enhances the Federal Statistical System's understanding of the SOC codes, investigates ways to use third-party data to construct useful research variables that can potentially be added to Census Bureau data products to improve their quality and versatility, and sheds light on how the use of alternative occupational categories in economics research may lead to different empirical results and deeper understanding in the analysis of labor market outcomes.
View Full
Paper PDF
-
A Tale of Two Fields? STEM Career Outcomes
October 2023
Working Paper Number:
CES-23-53
Is the labor market for US researchers experiencing the best or worst of times? This paper analyzes the market for recently minted Ph.D. recipients using supply-and-demand logic and data linking graduate students to their dissertations and W2 tax records. We also construct a new dissertation-industry 'relevance' measure, comparing dissertation and patent text and linking patents to assignee firms and industries. We find large disparities across research fields in placement (faculty, postdoc, and industry positions), earnings, and the use of specialized human capital. Thus, it appears to simultaneously be a good time for some fields and a bad time for others.
View Full
Paper PDF
-
NOISE INFUSION AS A CONFIDENTIALITY PROTECTION MEASURE FOR GRAPH-BASED STATISTICS
September 2014
Working Paper Number:
CES-14-30
We use the bipartite graph representation of longitudinally linked em-ployer-employee data, and the associated projections onto the employer and em-ployee nodes, respectively, to characterize the set of potential statistical summar-ies that the trusted custodian might produce. We consider noise infusion as the primary confidentiality protection method. We show that a relatively straightfor-ward extension of the dynamic noise-infusion method used in the U.S. Census Bureau's Quarterly Workforce Indicators can be adapted to provide the same confidentiality guarantees for the graph-based statistics: all inputs have been modified by a minimum percentage deviation (i.e., no actual respondent data are used) and, as the number of entities contributing to a particular statistic increases, the accuracy of that statistic approaches the unprotected value. Our method also ensures that the protected statistics will be identical in all releases based on the same inputs.
View Full
Paper PDF
-
Scientific Talent Leaks Out of Funding Gaps
February 2024
Working Paper Number:
CES-24-08
We study how delays in NIH grant funding affect the career outcomes of research personnel. Using comprehensive earnings and tax records linked to university transaction data along with a difference-in-differences design, we find that a funding interruption of more than 30 days has a substantial effect on job placements for personnel who work in labs with a single NIH R01 research grant, including a 3 percentage point (40%) increase in the probability of not working in the US. Incorporating information from the full 2020 Decennial Census and data on publications, we find that about half of those induced into nonemployment appear to permanently leave the US and are 90% less likely to publish in a given year, with even larger impacts for trainees (postdocs and graduate students). Among personnel who continue to work in the US, we find that interrupted personnel earn 20% less than their continuously-funded peers, with the largest declines concentrated among trainees and other non-faculty personnel (such as staff and undergraduates). Overall, funding delays account for about 5% of US nonemployment in our data, indicating that they have a meaningful effect on the scientific labor force at the national level.
View Full
Paper PDF
-
Further Evidence from Census 2000 About Earnings by Detailed Occupation for Men and Women: The Role of Race and Hispanic Origin
November 2011
Working Paper Number:
CES-11-37
A 2004 report by the author reviewed data from Census 2000 and concluded "There is a substantial gap in median earnings between men and women that is unexplained, even after controlling for work experience (to the extent it can be represented by age and presence of children), education, and occupation." This paper extends the analysis and concludes that once those characteristics are controlled for, no further explanatory power is attributable to race or Hispanic origin.
View Full
Paper PDF
-
An 'Algorithmic Links with Probabilities' Crosswalk for USPC and CPC Patent Classifications with an Application Towards Industrial Technology Composition
March 2016
Working Paper Number:
CES-16-15
Patents are a useful proxy for innovation, technological change, and diffusion. However, fully exploiting patent data for economic analyses requires patents be tied to measures of economic activity, which has proven to be difficult. Recently, Lybbert and Zolas (2014) have constructed an International Patent Classification (IPC) to industry classification crosswalk using an 'Algorithmic Links with Probabilities' approach. In this paper, we utilize a similar approach and apply it to new patent classification schemes, the U.S. Patent Classification (USPC) system and Cooperative Patent Classification (CPC) system. The resulting USPC-Industry and CPC-Industry concordances link both U.S. and global patents to multiple vintages of the North American Industrial Classification System (NAICS), International Standard Industrial Classification (ISIC), Harmonized System (HS) and Standard International Trade Classification (SITC). We then use the crosswalk to highlight changes to industrial technology composition over time. We find suggestive evidence of strong persistence in the association between technologies and industries over time.
View Full
Paper PDF
-
An Evaluation of the Gender Wage Gap Using Linked Survey and Administrative Data
November 2020
Working Paper Number:
CES-20-34
The narrowing of the gender wage gap has slowed in recent decades. However, current estimates show that, among full-time year-round workers, women earn approximately 18 to 20 percent less than men at the median. Women's human capital and labor force characteristics that drive wages increasingly resemble men's, so remaining differences in these characteristics explain less of the gender wage gap now than in the past. As these factors wane in importance, studies show that others like occupational and industrial segregation explain larger portions of the gender wage gap. However, a major limitation of these studies is that the large datasets required to analyze occupation and industry effectively lack measures of labor force experience. This study combines survey and administrative data to analyze and improve estimates of the gender wage gap within detailed occupations, while also accounting for gender differences in work experience. We find a gender wage gap of 18 percent among full-time, year-round workers across 316 detailed occupation categories. We show the wage gap varies significantly by occupation: while wages are at parity in some occupations, gaps are as large as 45 percent in others. More competitive and hazardous occupations, occupations that reward longer hours of work, and those that have a larger proportion of women workers have larger gender wage gaps. The models explain less of the wage gap in occupations with these attributes. Occupational characteristics shape the conditions under which men and women work and we show these characteristics can make for environments that are more or less conducive to gender parity in earnings.
View Full
Paper PDF
-
Occupation Inflation in the Current Population Survey
September 2012
Working Paper Number:
CES-12-26
A common caveat often accompanying results relying on household surveys regards respondent error. There is research using independent, presumably error-free administrative data, to estimate the extent of error in the data, the correlates of error, and potential corrections for the error. We investigate measurement error in occupation in the Current Population Survey (CPS) using the panel component of the CPS to identify those that incorrectly report changing occupation. We find evidence that individuals are inflating their occupation to higher skilled and higher paying occupations than the ones they actually perform. Occupation inflation biases the education and race coefficients in standard Mincer equation results within occupations.
View Full
Paper PDF
-
Business Dynamics of Innovating Firms: Linking U.S. Patents with Administrative Data on Workers and Firms
July 2015
Working Paper Number:
CES-15-19
This paper discusses the construction of a new longitudinal database tracking inventors and patent-owning firms over time. We match granted patents between 2000 and 2011 to administrative databases of firms and workers housed at the U.S. Census Bureau. We use inventor information in addition to the patent assignee firm name to and improve on previous efforts linking patents to firms. The triangulated database allows us to maximize match rates and provide validation for a large fraction of matches. In this paper, we describe the construction of the database and explore basic features of the data. We find patenting firms, particularly young patenting firms, disproportionally contribute jobs to the U.S. economy. We find patenting is a relatively rare event among small firms but that most patenting firms are nevertheless small, and that patenting is not as rare an event for the youngest firms compared to the oldest firms. While manufacturing firms are more likely to patent than firms in other sectors, we find most patenting firms are in the services and wholesale sectors. These new data are a product of collaboration within the U.S. Department of Commerce, between the U.S. Census Bureau and the U.S. Patent and Trademark Office.
View Full
Paper PDF
-
USING THE PARETO DISTRIBUTION TO IMPROVE ESTIMATES OF TOPCODED EARNINGS
April 2014
Working Paper Number:
CES-14-21
Inconsistent censoring in the public-use March Current Population Survey (CPS) limits its usefulness in measuring labor earnings trends. Using Pareto estimation methods with less-censored internal CPS data, we create an enhanced cell-mean series to capture top earnings in the public-use CPS. We find that previous approaches for imputing topcoded earnings systematically understate top earnings. Annual earnings inequality trends since 1963 using our series closely approximate those found by Kopczuk, Saez, & Song (2010) using Social Security Administration data for commerce and industry workers. However, when we consider all workers, earnings inequality levels are higher but earnings growth is more modest
View Full
Paper PDF