CREAT: Census Research Exploration and Analysis Tool

Nonemployer Statistics by Demographics (NES-D): Using Administrative and Census Records Data in Business Statistics

January 2019

Working Paper Number:

CES-19-01

Abstract

The quinquennial Survey of Business Owners or SBO provided the only comprehensive source of information in the United States on employer and nonemployer businesses by the sex, race, ethnicity and veteran status of the business owners. The annual Nonemployer Statistics series (NES) provides establishment counts and receipts for nonemployers but contains no demographic information on the business owners. With the transition of the employer component of the SBO to the Annual Business Survey, the Nonemployer Statistics by Demographics series or NES-D represents the continuation of demographics estimates for nonemployer businesses. NES-D will leverage existing administrative and census records to assign demographic characteristics to the universe of approximately 24 million nonemployer businesses (as of 2015). Demographic characteristics include key demographics measured by the SBO (sex, race, Hispanic origin and veteran status) as well as other demographics (age, place of birth and citizenship status) collected but not imputed by the SBO if missing. A spectrum of administrative and census data sources will provide the nonemployer universe and demographics information. Specifically, the nonemployer universe originates in the Business Register; the Census Numident will provide sex, age, place of birth and citizenship status; race and Hispanic origin information will be obtained from multiple years of the decennial census and the American Community Survey; and the Department of Veteran Affairs will provide administrative records data on veteran status. The use of blended data in this manner will make possible the production of NES-D, an annual series that will become the only source of detailed and comprehensive statistics on the scope, nature and activities of U.S. businesses with no paid employment by the demographic characteristics of the business owner. Using the 2015 vintage of nonemployers, initial results indicate that demographic information is available for the overwhelming majority of the universe of nonemployers. For instance, information on sex, age, place of birth and citizenship status is available for over 95 percent of the 24 million nonemployers while race and Hispanic origin are available for about 90 percent of them. These results exclude owners of C-corporations, which represent only 2 percent of nonemployer firms. Among other things, future work will entail imputation of missing demographics information (including that of C-corporations), testing the longitudinal consistency of the estimates, and expanding the set of characteristics beyond the demographics mentioned above. Without added respondent burden and at lower imputation rates and costs, NES-D will meet the needs of stakeholders as well as the economy as a whole by providing reliable estimates at a higher frequency (annual vs. every 5 years) and with a more timely dissemination schedule than the SBO.

Document Tags and Keywords

Keywords Keywords are automatically generated using KeyBERT, a powerful and innovative keyword extraction tool that utilizes BERT embeddings to ensure high-quality and contextually relevant keywords.

By analyzing the content of working papers, KeyBERT identifies terms and phrases that capture the essence of the text, highlighting the most significant topics and trends. This approach not only enhances searchability but provides connections that go beyond potentially domain-specific author-defined keywords.
:
enterprise, employ, employed, venture, proprietorship, entrepreneur, entrepreneurship, ethnicity, hispanic, proprietor, establishment, population, citizen, nonemployer businesses


Similar Working Papers Similarity between working papers are determined by an unsupervised neural network model know as Doc2Vec.

Doc2Vec is a model that represents entire documents as fixed-length vectors, allowing for the capture of semantic meaning in a way that relates to the context of words within the document. The model learns to associate a unique vector with each document while simultaneously learning word vectors, enabling tasks such as document classification, clustering, and similarity detection by preserving the order and structure of words. The document vectors are compared using cosine similarity/distance to determine the most similar working papers. Papers identified with 🔥 are in the top 20% of similarity.

The 10 most similar working papers to the working paper 'Nonemployer Statistics by Demographics (NES-D): Using Administrative and Census Records Data in Business Statistics' are listed below in order of similarity.