CREAT: Census Research Exploration and Analysis Tool

Measuring the Characteristics and Employment Dynamics of U.S. Inventors

September 2022

Working Paper Number:

CES-22-43

Abstract

Innovation is a key driver of long run economic growth. Studying innovation requires a clear view of the characteristics and behavior of the individuals that create new ideas. A general lack of rich, large-scale data has constrained such analyses. We address this by introducing a new dataset linking patent inventors to survey, census, and administrative microdata at the U.S. Census Bureau. We use this data to provide a first look at the demographic characteristics, employer characteristics, earnings, and employment dynamics of inventors. These linkages, which will be available to researchers with approved access, dramatically increases the scope of what can be learned about inventors and innovative activity.

Document Tags and Keywords

Keywords Keywords are automatically generated using KeyBERT, a powerful and innovative keyword extraction tool that utilizes BERT embeddings to ensure high-quality and contextually relevant keywords.

By analyzing the content of working papers, KeyBERT identifies terms and phrases that capture the essence of the text, highlighting the most significant topics and trends. This approach not only enhances searchability but provides connections that go beyond potentially domain-specific author-defined keywords.
:
researcher, disclosure, study, invention, research, entrepreneur, entrepreneurship, innovation, inventory, patent, innovate, patenting, developed, innovative

Tags Tags are automatically generated using a pretrained language model from spaCy, which excels at several tasks, including entity tagging.

The model is able to label words and phrases by part-of-speech, including "organizations." By filtering for frequent words and phrases labeled as "organizations", papers are identified to contain references to specific institutions, datasets, and other organizations.
:
National Bureau of Economic Research, Longitudinal Business Database, Survey of Industrial Research and Development, Decennial Census, Educational Services, American Community Survey, Patent and Trademark Office, Social Security Number, Longitudinal Employer Household Dynamics, Business Register, Protected Identification Key, Census Bureau Disclosure Review Board, Person Validation System, Health Care and Social Assistance, Business Research and Development and Innovation Survey, Personally Identifiable Information, Federal Statistical Research Data Center

Similar Working Papers Similarity between working papers are determined by an unsupervised neural network model know as Doc2Vec.

Doc2Vec is a model that represents entire documents as fixed-length vectors, allowing for the capture of semantic meaning in a way that relates to the context of words within the document. The model learns to associate a unique vector with each document while simultaneously learning word vectors, enabling tasks such as document classification, clustering, and similarity detection by preserving the order and structure of words. The document vectors are compared using cosine similarity/distance to determine the most similar working papers. Papers identified with 🔥 are in the top 20% of similarity.

The 10 most similar working papers to the working paper 'Measuring the Characteristics and Employment Dynamics of U.S. Inventors' are listed below in order of similarity.