CREAT: Census Research Exploration and Analysis Tool

Early-Stage Business Formation: An Analysis of Applications for Employer Identification Numbers

December 2018

Abstract

This paper reports on the development and analysis of a newly constructed dataset on the early stages of business formation. The data are based on applications for Employer Identification Numbers (EINs) submitted in the United States, known as IRS Form SS-4 filings. The goal of the research is to develop high-frequency indicators of business formation at the national, state, and local levels. The analysis indicates that EIN applications provide forward-looking and very timely information on business formation. The signal of business formation provided by counts of applications is improved by using the characteristics of the applications to model the likelihood that applicants become employer businesses. The results also suggest that EIN applications are related to economic activity at the local level. For example, application activity is higher in counties that experienced higher employment growth since the end of the Great Recession, and application counts grew more rapidly in counties engaged in shale oil and gas extraction. Finally, the paper provides a description of new public-use dataset, the 'Business Formation Statistics (BFS),' that contains new data series on business applications and formation. The initial release of the BFS shows that the number of business applications in the 3rd quarter of 2017 that have relatively high likelihood of becoming job creators is still far below pre-Great Recession levels.

Document Tags and Keywords

Keywords Keywords are automatically generated using KeyBERT, a powerful and innovative keyword extraction tool that utilizes BERT embeddings to ensure high-quality and contextually relevant keywords.

By analyzing the content of working papers, KeyBERT identifies terms and phrases that capture the essence of the text, highlighting the most significant topics and trends. This approach not only enhances searchability but provides connections that go beyond potentially domain-specific author-defined keywords.
:
proprietorship, entrepreneur, business startups, entrepreneurship, business data, irs, filing


Similar Working Papers Similarity between working papers are determined by an unsupervised neural network model know as Doc2Vec.

Doc2Vec is a model that represents entire documents as fixed-length vectors, allowing for the capture of semantic meaning in a way that relates to the context of words within the document. The model learns to associate a unique vector with each document while simultaneously learning word vectors, enabling tasks such as document classification, clustering, and similarity detection by preserving the order and structure of words. The document vectors are compared using cosine similarity/distance to determine the most similar working papers. Papers identified with 🔥 are in the top 20% of similarity.

The 10 most similar working papers to the working paper 'Early-Stage Business Formation: An Analysis of Applications for Employer Identification Numbers' are listed below in order of similarity.