CREAT - Census Bureau

Matching State Business Registration Records to Census Business Data

January 2020

Written by: Kristin McCue, J. Daniel Kim

Working Paper Number:

CES-20-03

Abstract

We describe our methodology and results from matching state Business Registration Records (BRR) to Census business data. We use data from Massachusetts and California to develop methods and preliminary results that could be used to guide matching data for additional states. We obtain matches to Census business records for 45% of the Massachusetts BRR records and 40% of the California BRR records. We find higher match rates for incorporated businesses and businesses with higher startup-quality scores as assigned in Guzman and Stern (2018). Clerical reviews show that using relatively strict matching on address is important for match accuracy, while results are less sensitive to name matching strictness. Among matched BRR records, the modal timing of the first match to the BR is in the year in which the BRR record was filed. We use two sets of software to identify matches: SAS DQ Match and a machine-learning algorithm described in Cuffe and Goldschlag (2018). We find preliminary evidence that while the ML-based method yields more match results, SAS DQ tends to result in higher accuracy rates. To conclude, we provide suggestions on how to proceed with matching other states' data in light of our findings using these two states.

Document Tags and Keywords

Keywords:

data, proprietorship, state, record, matched, matching, identifier

Tags:

Standard Statistical Establishment List, Service Annual Survey, Center for Economic Studies, Longitudinal Business Database, Employer Identification Numbers, Census Bureau Business Register, Business Register, Protected Identification Key, Integrated Longitudinal Business Database, Guzman and Stern

Similar Working Papers

The 10 most similar working papers to the working paper 'Matching State Business Registration Records to Census Business Data' are listed below in order of similarity.

Working Paper

NBER Patent Data-BR Bridge: User Guide and Technical Documentation

October 2010

Authors: Natarajan Balasubramanian, Jagadeesh Sivadasan

Working Paper Number:

CES-10-36

This note provides details about the construction of the NBER Patent Data-BR concordance, and is intended for researchers planning to use this concordance. In addition to describing the matching process used to construct the concordance, this note provides a discussion of the benefits and limitations of this concordance.
View Full Paper PDF
Working Paper

Improving Patent Assignee-Firm Bridge with Web Search Results

August 2022

Authors: Yuheng Ding, Karam Jo, Seula Kim

Working Paper Number:

CES-22-31

This paper constructs a patent assignee-firm longitudinal bridge between U.S. patent assignees and firms using firm-level administrative data from the U.S. Census Bureau. We match granted patents applied between 1976 and 2016 to the U.S. firms recorded in the Longitudinal Business Database (LBD) in the Census Bureau. Building on existing algorithms in the literature, we first use the assignee name, address (state and city), and year information to link the two datasets. We then introduce a novel search-aided algorithm that significantly improves the matching results by 7% and 2.9% at the patent and the assignee level, respectively. Overall, we are able to match 88.2% and 80.1% of all U.S. patents and assignees respectively. We contribute to the existing literature by 1) improving the match rates and quality with the web search-aided algorithm, and 2) providing the longest and longitudinally consistent crosswalk between patent assignees and LBD firms.
View Full Paper PDF
Working Paper

Business Dynamics of Innovating Firms: Linking U.S. Patents with Administrative Data on Workers and Firms

July 2015

Authors: Javier Miranda, Cheryl Grim, Stuart Graham, Tariqul Islam, Alan Marco

Working Paper Number:

CES-15-19

This paper discusses the construction of a new longitudinal database tracking inventors and patent-owning firms over time. We match granted patents between 2000 and 2011 to administrative databases of firms and workers housed at the U.S. Census Bureau. We use inventor information in addition to the patent assignee firm name to and improve on previous efforts linking patents to firms. The triangulated database allows us to maximize match rates and provide validation for a large fraction of matches. In this paper, we describe the construction of the database and explore basic features of the data. We find patenting firms, particularly young patenting firms, disproportionally contribute jobs to the U.S. economy. We find patenting is a relatively rare event among small firms but that most patenting firms are nevertheless small, and that patenting is not as rare an event for the youngest firms compared to the oldest firms. While manufacturing firms are more likely to patent than firms in other sectors, we find most patenting firms are in the services and wholesale sectors. These new data are a product of collaboration within the U.S. Department of Commerce, between the U.S. Census Bureau and the U.S. Patent and Trademark Office.
View Full Paper PDF
Working Paper

Squeezing More Out of Your Data: Business Record Linkage with Python

November 2018

Authors: Nathan Goldschlag, John Cuffe

Working Paper Number:

CES-18-46

Integrating data from different sources has become a fundamental component of modern data analytics. Record linkage methods represent an important class of tools for accomplishing such integration. In the absence of common disambiguated identifiers, researchers often must resort to ''fuzzy" matching, which allows imprecision in the characteristics used to identify common entities across dfferent datasets. While the record linkage literature has identified numerous individually useful fuzzy matching techniques, there is little consensus on a way to integrate those techniques within a single framework. To this end, we introduce the Multiple Algorithm Matching for Better Analytics (MAMBA), an easy-to-use, flexible, scalable, and transparent software platform for business record linkage applications using Census microdata. MAMBA leverages multiple string comparators to assess the similarity of records using a machine learning algorithm to disambiguate matches. This software represents a transparent tool for researchers seeking to link external business data to the Census Business Register files.
View Full Paper PDF
Working Paper

Starting Up AI

March 2024

Authors: Emin Dinlersoz, Nikolas Zolas, Can Dogan

Working Paper Number:

CES-24-09R

Using comprehensive administrative data on business applications over the period 2004- 2023, we study business applications (ideas) and the resulting startups that aim to develop AI technologies or produce goods or services that use, integrate, or rely on AI. The annual number of new AI-related business applications is stable between 2004 and 2011, but begins to rise in 2012 with further increases from 2016 onward into the Covid-19 pandemic and beyond, with a large, discrete jump in 2023. The distribution of these applications is highly uneven across states and sectors. AI business applications have a higher likelihood of becoming employer startups compared to other applications. Moreover, businesses originating from these applications exhibit higher revenue, average wage, and labor share, but similar labor productivity and lower survival rate, compared to other businesses. While it is still early in the diffusion of AI, the rapid rise in AI business applications, combined with the better performance of resulting businesses in several key outcomes, suggests a growing contribution from AI-related business formation to business dynamism.
View Full Paper PDF
Working Paper

Determination of the 2020 U.S. Citizen Voting Age Population (CVAP) Using Administrative Records and Statistical Methodology Technical Report

October 2020

Authors: John M. Abowd, J. David Brown, Lawrence Warren, Moises Yi, Misty L. Heggeness, William R. Bell, Michael B. Hawes, Andrew Keller, Vincent T. Mule Jr., Joseph L. Schafer, Matthew Spence

Working Paper Number:

CES-20-33

This report documents the efforts of the Census Bureau's Citizen Voting-Age Population (CVAP) Internal Expert Panel (IEP) and Technical Working Group (TWG) toward the use of multiple data sources to produce block-level statistics on the citizen voting-age population for use in enforcing the Voting Rights Act. It describes the administrative, survey, and census data sources used, and the four approaches developed for combining these data to produce CVAP estimates. It also discusses other aspects of the estimation process, including how records were linked across the multiple data sources, and the measures taken to protect the confidentiality of the data.
View Full Paper PDF
Working Paper

Age and High-Growth Entrepreneurship

April 2018

Authors: Javier Miranda, J. Daniel Kim, Pierre Azoulay, Benjamin F. Jones

Working Paper Number:

carra-2018-03

Many observers, and many investors, believe that young people are especially likely to produce the most successful new firms. We use administrative data at the U.S. Census Bureau to study the ages of founders of growth-oriented start-ups in the past decade. Our primary finding is that successful entrepreneurs are middle-aged, not young. The mean founder age for the 1 in 1,000 fastest growing new ventures is 45.0. The findings are broadly similar when considering high-technology sectors, entrepreneurial hubs, and successful firm exits. Prior experience in the specific industry predicts much greater rates of entrepreneurial success. These findings strongly reject common hypotheses that emphasize youth as a key trait of successful entrepreneurs.
View Full Paper PDF
Working Paper

Matching Addresses between Household Surveys and Commercial Data

July 2015

Authors: Quentin Brummet

Working Paper Number:

carra-2015-04

Matching third-party data sources to household surveys can benefit household surveys in a number of ways, but the utility of these new data sources depends critically on our ability to link units between data sets. To understand this better, this report discusses potential modifications to the existing match process that could potentially improve our matches. While many changes to the matching procedure produce marginal improvements in match rates, substantial increases in match rates can only be achieved by relaxing the definition of a successful match. In the end, the results show that the most important factor determining the success of matching procedures is the quality and composition of the data sets being matched.
View Full Paper PDF
Working Paper

Measuring the Dynamics of Young and Small Businesses: Integrating the Employer and Nonemployer Universes

February 2006

Authors: Alfred R Nucci, Steven J. Davis, John Haltiwanger, Ron Jarmin, C.J. Krizan, Javier Miranda, Kristin Sandusky

Working Paper Number:

CES-06-04

We develop a preliminary version of an Integrated Longitudinal Business Database (ILBD) that combines administrative records and survey-based data for virtually all employer and nonemployer business units in the United States. In the process, we confront conceptual and practical issues that arise in measuring the importance and dynamic behavior of younger and smaller businesses. We also document some basic facts about younger and smaller businesses. In doing so, we exploit the ability of the ILBD to follow business transitions between employer and nonemployer status, and vice-versa. This aspect of the ILBD opens a new frontier for the study of business formation and the precursors to job creation in the U.S. economy.
View Full Paper PDF
Working Paper

The Person Identification Validation System (PVS): Applying the Center for Administrative Records Research and Applications' (CARRA) Record Linkage Software

July 2014

Authors: Deborah Wagner, Mary Layne

Working Paper Number:

carra-2014-01

The Census Bureau's Person Identification Validation System (PVS) assigns unique person identifiers to federal, commercial, census, and survey data to facilitate linkages across and within files. PVS uses probabilistic matching to assign a unique Census Bureau identifier for each person. The PVS matches incoming files to reference files created with data from the Social Security Administration (SSA) Numerical Identification file, and SSA data with addresses obtained from federal files. This paper describes the PVS methodology from editing input data to creating the final file.
View Full Paper PDF

Matching State Business Registration Records to Census Business Data

January 2020

Working Paper Number:

CES-20-03

Abstract

Document Tags and Keywords

The 10 most similar working papers to the working paper 'Matching State Business Registration Records to Census Business Data' are listed below in order of similarity.

October 2010

Working Paper Number:

CES-10-36

August 2022

Working Paper Number:

CES-22-31

July 2015

Working Paper Number:

CES-15-19

November 2018

Working Paper Number:

CES-18-46

March 2024

Working Paper Number:

CES-24-09R

October 2020

Working Paper Number:

CES-20-33

April 2018

Working Paper Number:

carra-2018-03

July 2015

Working Paper Number:

carra-2015-04

February 2006

Working Paper Number:

CES-06-04

July 2014

Working Paper Number:

carra-2014-01