CREAT - Census Bureau

Occupational Classifications: A Machine Learning Approach

August 2018

Written by: Julia I. Lane, Bruce Weinberg, Joseph Staudt, Akina Ikudo

Working Paper Number:

CES-18-37

Abstract

Characterizing the work that people do on their jobs is a longstanding and core issue in labor economics. Traditionally, classification has been done manually. If it were possible to combine new computational tools and administrative wage records to generate an automated crosswalk between job titles and occupations, millions of dollars could be saved in labor costs, data processing could be sped up, data could become more consistent, and it might be possible to generate, without a lag, current information about the changing occupational composition of the labor market. This paper examines the potential to assign occupations to job titles contained in administrative data using automated, machine-learning approaches. We use a new extraordinarily rich and detailed set of data on transactional HR records of large firms (universities) in a relatively narrowly defined industry (public institutions of higher education) to identify the potential for machine-learning approaches to classify occupations.

Document Tags and Keywords

Keywords:

payroll, industrial, employee, employed, job, classified, classification, classifying, department, hiring, workforce, employing, worker, occupation, clerical, associate, wage data, employee data

Tags:

Bureau of Labor Statistics, Social Security Administration, National Science Foundation, National Bureau of Economic Research, Office of Management and Budget, Current Population Survey, Longitudinal Business Database, Employer Identification Numbers, Federal Register, Department of Labor, Standard Occupational Classification, Longitudinal Employer Household Dynamics, Information and Communication Technology Survey, Business Register, Protected Identification Key, Ohio State University, University of Michigan, Integrated Longitudinal Business Database, Person Validation System, Center for Administrative Records Research, Person Identification Validation System, National Center for Science and Engineering Statistics

Similar Working Papers

The 10 most similar working papers to the working paper 'Occupational Classifications: A Machine Learning Approach' are listed below in order of similarity.

Working Paper

A Task-based Approach to Constructing Occupational Categories with Implications for Empirical Research in Labor Economics

September 2019

Authors: Gary Benedetto, Julia Manzella, Evan Totty

Working Paper Number:

CES-19-27

Most applied research in labor economics that examines returns to worker skills or differences in earnings across subgroups of workers typically accounts for the role of occupations by controlling for occupational categories. Researchers often aggregate detailed occupations into categories based on the Standard Occupation Classification (SOC) coding scheme, which is based largely on narratives or qualitative measures of workers' tasks. Alternatively, we propose two quantitative task-based approaches to constructing occupational categories by using factor analysis with O*NET job descriptors that provide a rich set of continuous measures of job tasks across all occupations. We find that our task-based approach outperforms the SOC-based approach in terms of lower occupation distance measures. We show that our task-based approach provides an intuitive, nuanced interpretation for grouping occupations and permits quantitative assessments of similarities in task compositions across occupations. We also replicate a recent analysis and find that our task-based occupational categories explain more of the gender wage gap than the SOC-based approaches explain. Our study enhances the Federal Statistical System's understanding of the SOC codes, investigates ways to use third-party data to construct useful research variables that can potentially be added to Census Bureau data products to improve their quality and versatility, and sheds light on how the use of alternative occupational categories in economics research may lead to different empirical results and deeper understanding in the analysis of labor market outcomes.
View Full Paper PDF
Working Paper

NOISE INFUSION AS A CONFIDENTIALITY PROTECTION MEASURE FOR GRAPH-BASED STATISTICS

September 2014

Authors: John M. Abowd, Kevin L. McKinney

Working Paper Number:

CES-14-30

We use the bipartite graph representation of longitudinally linked em-ployer-employee data, and the associated projections onto the employer and em-ployee nodes, respectively, to characterize the set of potential statistical summar-ies that the trusted custodian might produce. We consider noise infusion as the primary confidentiality protection method. We show that a relatively straightfor-ward extension of the dynamic noise-infusion method used in the U.S. Census Bureau's Quarterly Workforce Indicators can be adapted to provide the same confidentiality guarantees for the graph-based statistics: all inputs have been modified by a minimum percentage deviation (i.e., no actual respondent data are used) and, as the number of entities contributing to a particular statistic increases, the accuracy of that statistic approaches the unprotected value. Our method also ensures that the protected statistics will be identical in all releases based on the same inputs.
View Full Paper PDF
Working Paper

Earnings Inequality and Coordination Costs: Evidence from U.S. Law Firms

September 2009

Authors: Thomas Hubbard, Luis Garicano

Working Paper Number:

CES-09-24

Earnings inequality has increased substantially since the 1970s. Using evidence from confidential Census data on U.S. law offices on lawyers' organization and earnings, we study the extent to which the mechanism suggested by Lucas (1978) and Rosen (1982), a scale of operations effect linking spans of control and earnings inequality, is responsible for increases in inequality. We first show that earnings inequality among lawyers increased substantially between 1977 and 1992, and that the distribution of partner-associate ratios across offices changed in ways consistent with the hypothesis that coordination costs fell during this period. We then propose a 'hierarchical production function' in which output is the product of skill and time and estimate its parameters, applying insights from the equilibrium assignment literature. We find that coordination costs fell broadly and steadily during this period, so that hiring one's first associate leveraged a partner's skill by about 30% more in 1992 than 1977. We find also that changes in lawyers' hierarchical organization account for about 2/3 of the increase in earnings inequality among lawyers in the upper tail, but a much smaller share of the increase in inequality between lawyers in the upper tail and other lawyers. These findings indicate that new organizational efficiencies potentially explain increases in inequality, especially among individuals toward the top of the earnings distribution.
View Full Paper PDF
Working Paper

An Evaluation of the Gender Wage Gap Using Linked Survey and Administrative Data

November 2020

Authors: Liana Christin Landivar, Thomas B. Foster, Marta Murray-Close, Mark deWolf

Working Paper Number:

CES-20-34

The narrowing of the gender wage gap has slowed in recent decades. However, current estimates show that, among full-time year-round workers, women earn approximately 18 to 20 percent less than men at the median. Women's human capital and labor force characteristics that drive wages increasingly resemble men's, so remaining differences in these characteristics explain less of the gender wage gap now than in the past. As these factors wane in importance, studies show that others like occupational and industrial segregation explain larger portions of the gender wage gap. However, a major limitation of these studies is that the large datasets required to analyze occupation and industry effectively lack measures of labor force experience. This study combines survey and administrative data to analyze and improve estimates of the gender wage gap within detailed occupations, while also accounting for gender differences in work experience. We find a gender wage gap of 18 percent among full-time, year-round workers across 316 detailed occupation categories. We show the wage gap varies significantly by occupation: while wages are at parity in some occupations, gaps are as large as 45 percent in others. More competitive and hazardous occupations, occupations that reward longer hours of work, and those that have a larger proportion of women workers have larger gender wage gaps. The models explain less of the wage gap in occupations with these attributes. Occupational characteristics shape the conditions under which men and women work and we show these characteristics can make for environments that are more or less conducive to gender parity in earnings.
View Full Paper PDF
Working Paper

The Gender Pay Gap and Its Determinants Across the Human Capital Distribution

June 2023

Authors: Andrew Foote, Ariel J. Binder, Kendall Houghton, Amanda Eng

Working Paper Number:

CES-23-31R

This paper links American Community Survey data and postsecondary transcript records to examine how the gender pay gap varies across the distribution of education credentials for a sample of 2003-2013 graduates. Although recent literature emphasizes gender inequality among the most-educated, we find a smaller gender pay gap at higher education levels. Field-of-degree and occupation effects explain most of the gap among top bachelor's graduates, while work hours and unobserved channels matter more for less-competitive bachelor's, associate, and certificate graduates. We develop a novel decomposition of the child penalty to examine the role of children in explaining these results.
View Full Paper PDF
Working Paper

Occupation Inflation in the Current Population Survey

September 2012

Authors: Jonathan Fisher, Christina Houseworth

Working Paper Number:

CES-12-26

A common caveat often accompanying results relying on household surveys regards respondent error. There is research using independent, presumably error-free administrative data, to estimate the extent of error in the data, the correlates of error, and potential corrections for the error. We investigate measurement error in occupation in the Current Population Survey (CPS) using the panel component of the CPS to identify those that incorrectly report changing occupation. We find evidence that individuals are inflating their occupation to higher skilled and higher paying occupations than the ones they actually perform. Occupation inflation biases the education and race coefficients in standard Mincer equation results within occupations.
View Full Paper PDF
Working Paper

Job Tasks, Worker Skills, and Productivity

September 2025

Authors: John Haltiwanger, Lucia Foster, Cheryl Grim, Zoltan Wolf, Cindy Cunningham, Sabrina Wulff Pabilonia, Jay Stewart, Cody Tuttle, G. Jacob Blackwood, Matthew Dey, Rachel Nesbit

Working Paper Number:

CES-25-63

We present new empirical evidence suggesting that we can better understand productivity dispersion across businesses by accounting for differences in how tasks, skills, and occupations are organized. This aligns with growing attention to the task content of production. We link establishment-level data from the Bureau of Labor Statistics Occupational Employment and Wage Statistics survey with productivity data from the Census Bureau's manufacturing surveys. Our analysis reveals strong relationships between establishment productivity and task, skill, and occupation inputs. These relationships are highly nonlinear and vary by industry. When we account for these patterns, we can explain a substantial share of productivity dispersion across establishments.
View Full Paper PDF
Working Paper

Further Evidence from Census 2000 About Earnings by Detailed Occupation for Men and Women: The Role of Race and Hispanic Origin

November 2011

Authors: Daniel Weinberg

Working Paper Number:

CES-11-37

A 2004 report by the author reviewed data from Census 2000 and concluded "There is a substantial gap in median earnings between men and women that is unexplained, even after controlling for work experience (to the extent it can be represented by age and presence of children), education, and occupation." This paper extends the analysis and concludes that once those characteristics are controlled for, no further explanatory power is attributable to race or Hispanic origin.
View Full Paper PDF
Working Paper

Wage Determination in Social Occupations: the Role of Individual Social Capital

January 2016

Authors: Julie L. Hotchkiss, Anil Rupasingha

Working Paper Number:

CES-16-46

We make use of predicted social and civic activities (social capital) to account for selection into "social" occupations. Individual selection accounts for more than the total difference in wages observed between social and non-social occupations. The role that individual social capital plays in selecting into these occupations and the importance of selection in explaining wage differences across occupations is similar for both men and women. We make use of restricted 2000 Decennial Census and 2000 Social Capital Community Benchmark Survey. Individual social capital is instrumented by distance weighted surrounding census tract characteristics.
View Full Paper PDF
Working Paper

Squeezing More Out of Your Data: Business Record Linkage with Python

November 2018

Authors: Nathan Goldschlag, John Cuffe

Working Paper Number:

CES-18-46

Integrating data from different sources has become a fundamental component of modern data analytics. Record linkage methods represent an important class of tools for accomplishing such integration. In the absence of common disambiguated identifiers, researchers often must resort to ''fuzzy" matching, which allows imprecision in the characteristics used to identify common entities across dfferent datasets. While the record linkage literature has identified numerous individually useful fuzzy matching techniques, there is little consensus on a way to integrate those techniques within a single framework. To this end, we introduce the Multiple Algorithm Matching for Better Analytics (MAMBA), an easy-to-use, flexible, scalable, and transparent software platform for business record linkage applications using Census microdata. MAMBA leverages multiple string comparators to assess the similarity of records using a machine learning algorithm to disambiguate matches. This software represents a transparent tool for researchers seeking to link external business data to the Census Business Register files.
View Full Paper PDF

Occupational Classifications: A Machine Learning Approach

August 2018

Working Paper Number:

CES-18-37

Abstract

Document Tags and Keywords

The 10 most similar working papers to the working paper 'Occupational Classifications: A Machine Learning Approach' are listed below in order of similarity.

September 2019

Working Paper Number:

CES-19-27

September 2014

Working Paper Number:

CES-14-30

September 2009

Working Paper Number:

CES-09-24

November 2020

Working Paper Number:

CES-20-34

June 2023

Working Paper Number:

CES-23-31R

September 2012

Working Paper Number:

CES-12-26

September 2025

Working Paper Number:

CES-25-63

November 2011

Working Paper Number:

CES-11-37

January 2016

Working Paper Number:

CES-16-46

November 2018

Working Paper Number:

CES-18-46