Matching Compustat Data to the Longitudinal Business Database, 1976-2020
September 2025
Working Paper Number:
CES-25-65
Abstract
Document Tags and Keywords
Keywords
Keywords are automatically generated using KeyBERT, a powerful and innovative
keyword extraction tool that utilizes BERT embeddings to ensure high-quality and contextually relevant
keywords.
By analyzing the content of working papers, KeyBERT identifies terms and phrases that capture the essence of the
text, highlighting the most significant topics and trends. This approach not only enhances searchability but
provides connections that go beyond potentially domain-specific author-defined keywords.
:
information census,
enterprise,
database,
census data,
company,
disclosure,
corporation,
executive,
employee,
corporate,
merger,
subsidiary,
proprietor,
consolidated,
incorporated,
department,
record,
census bureau,
identifier,
firm data
Tags
Tags are automatically generated using a pretrained language model from spaCy, which excels at
several tasks, including entity tagging.
The model is able to label words and phrases by part-of-speech,
including "organizations." By filtering for frequent words and phrases labeled as "organizations", papers are
identified to contain references to specific institutions, datasets, and other organizations.
:
Internal Revenue Service,
Service Annual Survey,
Securities and Exchange Commission,
Office of Management and Budget,
Company Organization Survey,
Longitudinal Business Database,
Center for Research in Security Prices,
Michigan Institute for Teaching and Research in Economics,
Employer Identification Numbers,
North American Industry Classification System,
Longitudinal Employer Household Dynamics,
Business Register,
Protected Identification Key,
Census Bureau Disclosure Review Board,
Person Validation System,
Federal Statistical Research Data Center,
Annual Survey of Entrepreneurs,
Research and Development
Similar Working Papers
Similarity between working papers are determined by an unsupervised neural
network model
know as Doc2Vec.
Doc2Vec is a model that represents entire documents as fixed-length vectors, allowing for the
capture of semantic meaning in a way that relates to the context of words within the document. The model learns to
associate a unique vector with each document while simultaneously learning word vectors, enabling tasks such as
document classification, clustering, and similarity detection by preserving the order and structure of words. The
document vectors are compared using cosine similarity/distance to determine the most similar working papers.
Papers identified with 🔥 are in the top 20% of similarity.
The 10 most similar working papers to the working paper 'Matching Compustat Data to the Longitudinal Business Database, 1976-2020' are listed below in order of similarity.
-
Working PaperLongitudinal Establishment And Enterprise Microdata (LEEM) Documentation
May 1998
Working Paper Number:
CES-98-09
This paper introduces and documents the new Longitudinal Enterprise and Establishment Microdata (LEEM) database, which has been constructed by Census' Economic Planning and Coordination Division under contract to the Office of Advocacy of the U.S. Small Business Administration. The LEEM links three years (1990, 1994, and 1995) of basic data for each private sector establishment with payroll in any of those years, along with data on the firm to which the establishment belongs each year. The LEEM data will facilitate both broader and more detailed analysis of patterns of job creation and destruction in the U.S., as well as research on the structure and dynamics of U.S. businesses. This paper provides documentation of the construction of LEEM data, summary data on most variables in the database, comparisons of the annual data with that of the nearly identical County Business Patterns, and distributions of establishments and their employment by the size of their firms. This is followed by a simple analysis of changes over time in the attributes of surviving establishments, and a brief discussion of turnover (business births and deaths) in the population and gross changes in employment associated with both establishment turnover and with surviving establishments. It concludes with a summary of the strengths and weaknesses of the LEEM.View Full Paper PDF
-
Working PaperDocumenting the Business Register and Related Economic Business Data
March 2016
Working Paper Number:
CES-16-17
The Business Register (BR) is a comprehensive database of business establishments in the United States and provides resources for the U.S. Census Bureau's economic programs for sample selection, research, and survey operations. It is maintained using information from several federal agencies including the Census Bureau, Internal Revenue Service, Bureau of Labor Statistics, and the Social Security Administration. This paper provides a detailed description of the sources and functions of the BR. An overview of the BR as a linking tool and bridge to other Census Bureau data for additional business characteristics is also given.View Full Paper PDF
-
Working PaperIdentifying U.S. Merchandise Traders: Integrating Customs Transactions with Business Administrative Data
September 2020
Working Paper Number:
CES-20-28
This paper describes the construction of the Longitudinal Firm Trade Transactions Database (LFTTD) enabling the identification of merchandise traders - exporters and importers - in the U.S. Census Bureau's Business Register (BR). The LFTTD links merchandise export and import transactions from customs declaration forms to the BR beginning in 1992 through the present. We employ a combination of deterministic and probabilistic matching algorithms to assign a unique firm identifier in the BR to a merchandise export or import transaction record. On average, we match 89 percent of export and import values to a firm identifier. In 1992, we match 79 (88) percent of export (import) value; in 2017, we match 92 (96) percent of export (import) value. Trade transactions in year t are matched to years between 1976 and t+1 of the BR. On average, 94 percent of the trade value matches to a firm in year t of the BR. The LFTTD provides the most comprehensive identification of and the foundation for the analysis of goods trading firms in the U.S. economy.View Full Paper PDF
-
Working PaperAn Analysis of Key Differences in Micro Data: Results from the Business List Comparison Project
September 2008
Working Paper Number:
CES-08-28
The Bureau of Labor Statistics and the Bureau of the Census each maintain a business register, a universe of all U.S. business establishments and their characteristics, created from independent sources. Both registers serve critical functions such as supplying aggregate data inputs for certain national statistics generated by the Bureau of Economic Analysis. This paper examines key micro-level differences across these two business registers.View Full Paper PDF
-
Working PaperMethodology on Creating the U.S. Linked Retail Health Clinic (LiRHC) Database
March 2023
Working Paper Number:
CES-23-10
Retail health clinics (RHCs) are a relatively new type of health care setting and understanding the role they play as a source of ambulatory care in the United States is important. To better understand these settings, a joint project by the Census Bureau and National Center for Health Statistics used data science techniques to link together data on RHCs from Convenient Care Association, County Business Patterns Business Register, and National Plan and Provider Enumeration System to create the Linked RHC (LiRHC, pronounced 'lyric') database of locations throughout the United States during the years 2018 to 2020. The matching methodology used to perform this linkage is described, as well as the benchmarking, match statistics, and manual review and quality checks used to assess the resulting matched data. The large majority (81%) of matches received quality scores at or above 75/100, and most matches were linked in the first two (of eight) matching passes, indicating high confidence in the final linked dataset. The LiRHC database contained 2,000 RHCs and found that 97% of these clinics were in metropolitan statistical areas and 950 were in the South region of the United States. Through this collaborative effort, the Census Bureau and National Center for Health Statistics strive to understand how RHCs can potentially impact population health as well as the access and provision of health care services across the nation.View Full Paper PDF
-
Working PaperDeveloping Content for the Management and Organizational Practices Survey-Hospitals (MOPS-HP)
September 2021
Working Paper Number:
CES-21-25
Nationally representative U.S. hospital data does not exist on management practices, which have been shown to be related to both clinical and financial performance using past data collected in the World Management Survey (WMS). This paper describes the U.S. Census Bureau's development of content for the Management and Organizational Practices Survey Hospitals (MOPS-HP) that is similar to data collected in the MOPS conducted for the manufacturing sector in 2010 and 2015 and the 2009 WMS. Findings from cognitive testing interviews with 18 chief nursing officers and 13 chief financial officers at 30 different hospitals across 7 states and the District of Columbia led to using industry-tested terminology, to confirming chief nursing officers as MOPS-HP respondents and their ability to provide recall data, and to eliminating questions that tested poorly. Hospital data collected in the MOPS-HP would be the first nationally representative data on management practices with queries on clinical key performance indicators, financial and hospital-wide patient care goals, addressing patient care problems, clinical team interactions and staffing, standardized clinical protocols, and incentives for medical record documentation. The MOPS-HP's purpose is not to collect COVID-19 pandemic information; however, data measuring hospital management practices prior to and during the COVID-19 pandemic are a byproduct of the survey's one-year recall period (2019 and 2020).View Full Paper PDF
-
Working PaperThe Industry R&D Survey: Patent Database Link Project
November 2006
Working Paper Number:
CES-06-28
This paper details the construction of a firm-year panel dataset combining the NBER Patent Dataset with the Industry R&D Survey conducted by the Census Bureau and National Science Foundation. The developed platform offers an unprecedented view of the R&D-to-patenting innovation process and a close analysis of the strengths and limitations of the Industry R&D Survey. The files are linked through a name-matching algorithm customized for uniting the firm names to which patents are assigned with the firm names in Census Bureau's SSEL business registry. Through the Census Bureau's file structure, this R&D platform can be linked to the operating performances of each firm's establishments, further facilitating innovation-to-productivity studies.View Full Paper PDF
-
Working PaperThe Management and Organizational Practices Survey (MOPS): Cognitive Testing*
January 2016
Working Paper Number:
CES-16-53
All Census Bureau surveys must meet quality standards before they can be sent to the public for data collection. This paper outlines the pretesting process that was used to ensure that the Management and Organizational Practices Survey (MOPS) met those standards. The MOPS is the first large survey of management practices at U.S. manufacturing establishments. The first wave of the MOPS, issued for reference year 2010, was subject to internal expert review and two rounds of cognitive interviews. The results of this pretesting were used to make significant changes to the MOPS instrument and ensure that quality data was collected. The second wave of the MOPS, featuring new questions on data in decision making (DDD) and uncertainty and issued for reference year 2015, was subject to two rounds of cognitive interviews and a round of usability testing. This paper illustrates the effort undertaken by the Census Bureau to ensure that all surveys released into the field are of high quality and provides insight into how respondents interpret the MOPS questionnaire for those looking to utilize the MOPS data.View Full Paper PDF
-
Working PaperEmployment that is not covered by state unemployment
January 2002
Working Paper Number:
tp-2002-16
View Full Paper PDF
-
Working PaperNEW DATA FOR DYNAMIC ANALYSIS: THE LONGITUDINAL ESTABLISHMENT AND ENTERPRISE MICRODATA (LEEM) FILE
December 1999
Working Paper Number:
CES-99-18
Until now, research on U.S. business activities over time has been hindered by the lack of accurate and comprehensive longitudinal data. The new Longitudinal Establishment and Enterprise Microdata (LEEM) are tremendously rich data that open up numerous possibilities for dynamic analyses of businesses in the U.S. economy. It is the first nationwide high-quality longitudinal database that covers the majority of employer businesses from all sectors of the economy. Due to the confidential nature of these data, the file is located at the Center for Economic Studies in the U.S. Bureau of the Census. To access the data, researchers must submit an acceptable proposal to CES and become sworn Census researchers. This paper describes the LEEM file, the variables contained on the file, and current uses of the data.View Full Paper PDF