Characterizing the work that people do on their jobs is a longstanding and core issue in labor economics. Traditionally, classification has been done manually. If it were possible to combine new computational tools and administrative wage records to generate an automated crosswalk between job titles and occupations, millions of dollars could be saved in labor costs, data processing could be sped up, data could become more consistent, and it might be possible to generate, without a lag, current information about the changing occupational composition of the labor market. This paper examines the potential to assign occupations to job titles contained in administrative data using automated, machine-learning approaches. We use a new extraordinarily rich and detailed set of data on transactional HR records of large firms (universities) in a relatively narrowly defined industry (public institutions of higher education) to identify the potential for machine-learning approaches to classify occupations.
-
A Task-based Approach to Constructing Occupational Categories
with Implications for Empirical Research in Labor Economics
September 2019
Working Paper Number:
CES-19-27
Most applied research in labor economics that examines returns to worker skills or differences in earnings across subgroups of workers typically accounts for the role of occupations by controlling for occupational categories. Researchers often aggregate detailed occupations into categories based on the Standard Occupation Classification (SOC) coding scheme, which is based largely on narratives or qualitative measures of workers' tasks. Alternatively, we propose two quantitative task-based approaches to constructing occupational categories by using factor analysis with O*NET job descriptors that provide a rich set of continuous measures of job tasks across all occupations. We find that our task-based approach outperforms the SOC-based approach in terms of lower occupation distance measures. We show that our task-based approach provides an intuitive, nuanced interpretation for grouping occupations and permits quantitative assessments of similarities in task compositions across occupations. We also replicate a recent analysis and find that our task-based occupational categories explain more of the gender wage gap than the SOC-based approaches explain. Our study enhances the Federal Statistical System's understanding of the SOC codes, investigates ways to use third-party data to construct useful research variables that can potentially be added to Census Bureau data products to improve their quality and versatility, and sheds light on how the use of alternative occupational categories in economics research may lead to different empirical results and deeper understanding in the analysis of labor market outcomes.
View Full
Paper PDF
-
NOISE INFUSION AS A CONFIDENTIALITY PROTECTION MEASURE FOR GRAPH-BASED STATISTICS
September 2014
Working Paper Number:
CES-14-30
We use the bipartite graph representation of longitudinally linked em-ployer-employee data, and the associated projections onto the employer and em-ployee nodes, respectively, to characterize the set of potential statistical summar-ies that the trusted custodian might produce. We consider noise infusion as the primary confidentiality protection method. We show that a relatively straightfor-ward extension of the dynamic noise-infusion method used in the U.S. Census Bureau's Quarterly Workforce Indicators can be adapted to provide the same confidentiality guarantees for the graph-based statistics: all inputs have been modified by a minimum percentage deviation (i.e., no actual respondent data are used) and, as the number of entities contributing to a particular statistic increases, the accuracy of that statistic approaches the unprotected value. Our method also ensures that the protected statistics will be identical in all releases based on the same inputs.
View Full
Paper PDF
-
Occupation Inflation in the Current Population Survey
September 2012
Working Paper Number:
CES-12-26
A common caveat often accompanying results relying on household surveys regards respondent error. There is research using independent, presumably error-free administrative data, to estimate the extent of error in the data, the correlates of error, and potential corrections for the error. We investigate measurement error in occupation in the Current Population Survey (CPS) using the panel component of the CPS to identify those that incorrectly report changing occupation. We find evidence that individuals are inflating their occupation to higher skilled and higher paying occupations than the ones they actually perform. Occupation inflation biases the education and race coefficients in standard Mincer equation results within occupations.
View Full
Paper PDF
-
Business Dynamics of Innovating Firms: Linking U.S. Patents with Administrative Data on Workers and Firms
July 2015
Working Paper Number:
CES-15-19
This paper discusses the construction of a new longitudinal database tracking inventors and patent-owning firms over time. We match granted patents between 2000 and 2011 to administrative databases of firms and workers housed at the U.S. Census Bureau. We use inventor information in addition to the patent assignee firm name to and improve on previous efforts linking patents to firms. The triangulated database allows us to maximize match rates and provide validation for a large fraction of matches. In this paper, we describe the construction of the database and explore basic features of the data. We find patenting firms, particularly young patenting firms, disproportionally contribute jobs to the U.S. economy. We find patenting is a relatively rare event among small firms but that most patenting firms are nevertheless small, and that patenting is not as rare an event for the youngest firms compared to the oldest firms. While manufacturing firms are more likely to patent than firms in other sectors, we find most patenting firms are in the services and wholesale sectors. These new data are a product of collaboration within the U.S. Department of Commerce, between the U.S. Census Bureau and the U.S. Patent and Trademark Office.
View Full
Paper PDF
-
An 'Algorithmic Links with Probabilities' Crosswalk for USPC and CPC Patent Classifications with an Application Towards Industrial Technology Composition
March 2016
Working Paper Number:
CES-16-15
Patents are a useful proxy for innovation, technological change, and diffusion. However, fully exploiting patent data for economic analyses requires patents be tied to measures of economic activity, which has proven to be difficult. Recently, Lybbert and Zolas (2014) have constructed an International Patent Classification (IPC) to industry classification crosswalk using an 'Algorithmic Links with Probabilities' approach. In this paper, we utilize a similar approach and apply it to new patent classification schemes, the U.S. Patent Classification (USPC) system and Cooperative Patent Classification (CPC) system. The resulting USPC-Industry and CPC-Industry concordances link both U.S. and global patents to multiple vintages of the North American Industrial Classification System (NAICS), International Standard Industrial Classification (ISIC), Harmonized System (HS) and Standard International Trade Classification (SITC). We then use the crosswalk to highlight changes to industrial technology composition over time. We find suggestive evidence of strong persistence in the association between technologies and industries over time.
View Full
Paper PDF
-
The impact of manufacturing credentials on earnings and the probability of employment
May 2022
Working Paper Number:
CES-22-15
This paper examines the labor market returns to earning industry-certified credentials in the manufacturing sector. Specifically, we are interested in estimating the impact of a manufacturing credential on wages, probability of employment, and probability of employment specifically in the manufacturing sector post credential attainment. We link students who earned manufacturing credentials to their enrollment and completion records, and then further link them to their IRS tax records for earnings and employment (Form W2 and 1040) and to the American Community Survey and decennial census for demographic information. We present earnings trajectories for workers with credentials by type of credential, industry of employment, age, race and ethnicity, gender, and state. To obtain a more causal estimate of the impact of a credential on earnings, we implement a coarsened exact matching strategy to compare outcomes between otherwise similar people with and without a manufacturing credential. We find that the attainment of a manufacturing industry credential is associated with higher earnings and a higher likelihood of labor market participation when we compare attainers to a group of non-attainers who are otherwise similar.
View Full
Paper PDF
-
Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Improve the U.S. Statistical System?
January 2017
Authors:
Lars Vilhuber,
John M. Abowd,
Daniel Weinberg,
Jerome P. Reiter,
Matthew D. Shapiro,
Robert F. Belli,
Noel Cressie,
David C. Folch,
Scott H. Holan,
Margaret C. Levenstein,
Kristen M. Olson,
Jolene Smyth,
Leen-Kiat Soh,
Bruce D. Spencer,
Seth E. Spielman,
Christopher K. Wikle
Working Paper Number:
CES-17-59R
The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistical System (FSS), particularly the Census Bureau. The activities to date have covered both fundamental and applied statistical research and have focused at least in part on the training of current and future generations of researchers in skills of relevance to surveys and alternative measurement of economic units, households, and persons. This paper discusses some of the key research findings of the eight nodes, organized into six topics: (1) Improving census and survey data collection methods; (2) Using alternative sources of data; (3) Protecting privacy and confidentiality by improving disclosure avoidance; (4) Using spatial and spatio-temporal statistical modeling to improve estimates; (5) Assessing data cost and quality tradeoffs; and (6) Combining information from multiple sources. It also reports on collaborations across nodes and with federal agencies, new software developed, and educational activities and outcomes. The paper concludes with an evaluation of the ability of the FSS to apply the NCRN's research outcomes and suggests some next steps, as well as the implications of this research-network model for future federal government renewal initiatives.
View Full
Paper PDF
-
Work Organization and Cumulative Advantage
March 2025
Working Paper Number:
CES-25-18
Over decades of wage stagnation, researchers have argued that reorganizing work can boost pay for disadvantaged workers. But upgrading jobs could inadvertently shift hiring away from those workers, exacerbating their disadvantage. We theorize how work organization affects cumulative advantage in the labor market, or the extent to which high-paying positions are increasingly allocated to already-advantaged workers. Specifically, raising technical skill demands exacerbates cumulative advantage by shifting hiring towards higher-skilled applicants. In contrast, when employers increase autonomy or skills learned on-the-job, they raise wages to buy worker consent or commitment, rather than pre-existing skill. To test this idea, we match administrative earnings to task descriptions from job posts. We compare earnings for workers hired into the same occupation and firm, but under different task allocations. When employers raise complexity and autonomy, new hires' starting earnings increase and grow faster. However, while the earnings boost from complex, technical tasks shifts employment toward workers with higher prior earnings, worker selection changes less for tasks learned on-the-job and very little for high autonomy tasks. These results demonstrate how reorganizing work can interrupt cumulative advantage.
View Full
Paper PDF
-
Job Tasks, Worker Skills, and Productivity
September 2025
Authors:
John Haltiwanger,
Lucia Foster,
Cheryl Grim,
Zoltan Wolf,
Cindy Cunningham,
Sabrina Wulff Pabilonia,
Jay Stewart,
Cody Tuttle,
G. Jacob Blackwood,
Matthew Dey,
Rachel Nesbit
Working Paper Number:
CES-25-63
We present new empirical evidence suggesting that we can better understand productivity dispersion across businesses by accounting for differences in how tasks, skills, and occupations are organized. This aligns with growing attention to the task content of production. We link establishment-level data from the Bureau of Labor Statistics Occupational Employment and Wage Statistics survey with productivity data from the Census Bureau's manufacturing surveys. Our analysis reveals strong relationships between establishment productivity and task, skill, and occupation inputs. These relationships are highly nonlinear and vary by industry. When we account for these patterns, we can explain a substantial share of productivity dispersion across establishments.
View Full
Paper PDF
-
Estimating Record Linkage False Match Rate for the Person Identification Validation System
July 2014
Working Paper Number:
carra-2014-02
The Census Bureau Person Identification Validation System (PVS) assigns unique person identifiers to federal, commercial, census, and survey data to facilitate linkages across files. PVS uses probabilistic matching to assign a unique Census Bureau identifier for each person. This paper presents a method to measure the false match rate in PVS following the approach of Belin and Rubin (1995). The Belin and Rubin methodology requires truth data to estimate a mixture model. The parameters from the mixture model are used to obtain point estimates of the false match rate for each of the PVS search modules. The truth data requirement is satisfied by the unique access the Census Bureau has to high quality name, date of birth, address and Social Security (SSN) data. Truth data are quickly created for the Belin and Rubin model and do not involve a clerical review process. These truth data are used to create estimates for the Belin and Rubin parameters, making the approach more feasible. Both observed and modeled false match rates are computed for all search modules in federal administrative records data and commercial data.
View Full
Paper PDF