Methodology on Creating the U.S. Linked Retail Health Clinic (LiRHC) Database
March 2023
Working Paper Number:
CES-23-10
Abstract
Document Tags and Keywords
Keywords
Keywords are automatically generated using KeyBERT, a powerful and innovative
keyword extraction tool that utilizes BERT embeddings to ensure high-quality and contextually relevant
keywords.
By analyzing the content of working papers, KeyBERT identifies terms and phrases that capture the essence of the
text, highlighting the most significant topics and trends. This approach not only enhances searchability but
provides connections that go beyond potentially domain-specific author-defined keywords.
:
report,
department,
discrepancy,
record,
matched,
matching,
retail,
coverage,
medicare,
healthcare,
medicaid,
datasets,
assessed
Tags
Tags are automatically generated using a pretrained language model from spaCy, which excels at
several tasks, including entity tagging.
The model is able to label words and phrases by part-of-speech,
including "organizations." By filtering for frequent words and phrases labeled as "organizations", papers are
identified to contain references to specific institutions, datasets, and other organizations.
:
Internal Revenue Service,
Bureau of Labor Statistics,
Social Security Administration,
Service Annual Survey,
Center for Economic Studies,
County Business Patterns,
Longitudinal Business Database,
Employer Identification Numbers,
Economic Census,
North American Industry Classification System,
National Center for Health Statistics,
Disclosure Review Board,
Centers for Disease Control and Prevention,
Data Management System
Similar Working Papers
Similarity between working papers are determined by an unsupervised neural
network model
know as Doc2Vec.
Doc2Vec is a model that represents entire documents as fixed-length vectors, allowing for the
capture of semantic meaning in a way that relates to the context of words within the document. The model learns to
associate a unique vector with each document while simultaneously learning word vectors, enabling tasks such as
document classification, clustering, and similarity detection by preserving the order and structure of words. The
document vectors are compared using cosine similarity/distance to determine the most similar working papers.
Papers identified with 🔥 are in the top 20% of similarity.
The 10 most similar working papers to the working paper 'Methodology on Creating the U.S. Linked Retail Health Clinic (LiRHC) Database' are listed below in order of similarity.
-
Working PaperAddressing Data Gaps: Four New Lines of Inquiry in the 2017 Economic Census
September 2019
Working Paper Number:
CES-19-28
We describe four new lines of inquiry added to the 2017 Economic Census regarding (i) retail health clinics, (ii) management practices in health care services, (iii) self-service in retail and service industries, and (iv) water use in manufacturing and mining industries. These were proposed by econo...View Full Paper PDF
-
Working PaperLongitudinal Establishment And Enterprise Microdata (LEEM) Documentation
May 1998
Working Paper Number:
CES-98-09
This paper introduces and documents the new Longitudinal Enterprise and Establishment Microdata (LEEM) database, which has been constructed by Census' Economic Planning and Coordination Division under contract to the Office of Advocacy of the U.S. Small Business Administration. The LEEM links three ...View Full Paper PDF
-
Working PaperAutomating Response Evaluation For Franchising Questions On The 2017 Economic Census
July 2019
Working Paper Number:
CES-19-20
Between the 2007 and 2012 Economic Censuses (EC), the count of franchise-affiliated establishments declined by 9.8%. One reason for this decline was a reduction in resources that the Census Bureau was able to dedicate to the manual evaluation of survey responses in the franchise section of the EC. E...View Full Paper PDF
-
Working PaperSqueezing More Out of Your Data: Business Record Linkage with Python
November 2018
Working Paper Number:
CES-18-46
Integrating data from different sources has become a fundamental component of modern data analytics. Record linkage methods represent an important class of tools for accomplishing such integration. In the absence of common disambiguated identifiers, researchers often must resort to ''fuzzy" matching...View Full Paper PDF
-
Working PaperBusiness Failure In The 1992 Establishment Universe Sources Of Population Heterogeneity
December 1996
Working Paper Number:
CES-96-13
This study shows that establishment dissolution declines with age and that age at dissolution differs for broad industry and geography groups, establishment affiliation status, and establishment size. The paper uses Bureau of the Census Standard Statistical Establishment List datasets, a census of e...View Full Paper PDF
-
Working PaperDescribing the Form 5500-Business Register Match
January 2003
Working Paper Number:
tp-2003-05
-
Working PaperThe Longitudinal Business Database
July 2002
Working Paper Number:
CES-02-17
As the largest federal statistical agency and primary collector of data on businesses, households and individuals, the Census Bureau each year conducts numerous surveys intended to provide statistics on a wide range of topics about the population and economy of the United States. The Census Bureau's...View Full Paper PDF
-
Working PaperPerson Matching in Historical Files using the Census Bureau's Person Validation System
September 2014
Working Paper Number:
carra-2014-11
The recent release of the 1940 Census manuscripts enables the creation of longitudinal data spanning the whole of the twentieth century. Linked historical and contemporary data would allow unprecedented analyses of the causes and consequences of health, demographic, and economic change. The Census B...View Full Paper PDF
-
Working PaperMatching State Business Registration Records to Census Business Data
January 2020
Working Paper Number:
CES-20-03
We describe our methodology and results from matching state Business Registration Records (BRR) to Census business data. We use data from Massachusetts and California to develop methods and preliminary results that could be used to guide matching data for additional states. We obtain matches to Cens...View Full Paper PDF
-
Working PaperGradient Boosting to Address Statistical Problems Arising from Non-Linkage of Census Bureau Datasets
June 2024
Working Paper Number:
CES-24-27
This article introduces the twangRDC package, which contains functions to address non-linkage in US Census Bureau datasets. The Census Bureau's Person Identification Validation System facilitates data linkage by assigning unique person identifiers to federal, third party, decennial census, and surve...View Full Paper PDF