CREAT: Census Research Exploration and Analysis Tool

Nonemployer Statistics by Demographics (NES-D): Exploring Longitudinal Consistency and Sub-national Estimates

December 2019

Working Paper Number:

CES-19-34

Abstract

Until recently, the quinquennial Survey of Business Owners (SBO) was the only source of information for U.S. employer and nonemployer businesses by owner demographic characteristics such as race, ethnicity, sex and veteran status. Now, however, the Nonemployer Statistics by Demographics series (NES-D) will replace the SBO's nonemployer component with reliable, and more frequent (annual) business demographic estimates with no additional respondent burden, and at lower imputation rates and costs. NES-D is not a survey; rather, it exploits existing administrative and census records to assign demographic characteristics to the universe of approximately 25 million (as of 2016) nonemployer businesses. Although only in the second year of its research phase, NES-D is rapidly moving towards production, with a planned prototype or experimental version release of 2017 nonemployer data in 2020, followed by annual releases of the series. After the first year of research, we released a working paper (Luque et al., 2019) that assessed the viability of estimating nonemployer demographics exclusively with administrative records (AR) and census data. That paper used one year of data (2015) to produce preliminary tabulations of business counts at the national level. This year we expand that research in multiple ways by: i) examining the longitudinal consistency of administrative and census records coverage, and of our AR-based demographics estimates, ii) evaluating further coverage from additional data sources, iii) exploring estimates at the sub-national level, iv) exploring estimates by industrial sector, v) examining demographics estimates of business receipts as well as of counts, and vi) implementing imputation of missing demographic values. Our current results are consistent with the main findings in Luque et al. (2019), and show that high coverage and demographic assignment rates are not the exception, but the norm. Specifically, we find that AR coverage rates are high and stable over time for each of the three years we examine, 2014-2016. We are able to identify owners for approximately 99 percent of nonemployer businesses (excluding C-corporations), 92 to 93 percent of identified nonemployer owners have no missing demographics, and only about 1 percent are missing three or more demographic characteristics in each of the three years. We also find that our demographics estimates are stable over time, with expected small annual changes that are consistent with underlying population trends in the U.S.. Due to data limitations, these results do not include C-corporations, which represent only 2 percent of nonemployer businesses and 4 percent of receipts. Without added respondent burden and at lower imputation rates and costs, NES-D will provide high-quality business demographics estimates at a higher frequency (annual vs. every 5 years) than the SBO.

Document Tags and Keywords

Keywords Keywords are automatically generated using KeyBERT, a powerful and innovative keyword extraction tool that utilizes BERT embeddings to ensure high-quality and contextually relevant keywords.

By analyzing the content of working papers, KeyBERT identifies terms and phrases that capture the essence of the text, highlighting the most significant topics and trends. This approach not only enhances searchability but provides connections that go beyond potentially domain-specific author-defined keywords.
:
estimating, survey, employed, employ, venture, proprietorship, entrepreneur, ethnicity, proprietor, warehousing, population, citizen, unemployed, assessed, employer businesses


Similar Working Papers Similarity between working papers are determined by an unsupervised neural network model know as Doc2Vec.

Doc2Vec is a model that represents entire documents as fixed-length vectors, allowing for the capture of semantic meaning in a way that relates to the context of words within the document. The model learns to associate a unique vector with each document while simultaneously learning word vectors, enabling tasks such as document classification, clustering, and similarity detection by preserving the order and structure of words. The document vectors are compared using cosine similarity/distance to determine the most similar working papers. Papers identified with 🔥 are in the top 20% of similarity.

The 10 most similar working papers to the working paper 'Nonemployer Statistics by Demographics (NES-D): Exploring Longitudinal Consistency and Sub-national Estimates' are listed below in order of similarity.