CREAT - Census Bureau

Methodology on Creating the U.S. Linked Retail Health Clinic (LiRHC) Database

March 2023

Written by: Alice Zawacki, Joey Marshall, Donald Cherry, Xianghua Yin, Brian W. Ward

Working Paper Number:

CES-23-10

Abstract

Retail health clinics (RHCs) are a relatively new type of health care setting and understanding the role they play as a source of ambulatory care in the United States is important. To better understand these settings, a joint project by the Census Bureau and National Center for Health Statistics used data science techniques to link together data on RHCs from Convenient Care Association, County Business Patterns Business Register, and National Plan and Provider Enumeration System to create the Linked RHC (LiRHC, pronounced 'lyric') database of locations throughout the United States during the years 2018 to 2020. The matching methodology used to perform this linkage is described, as well as the benchmarking, match statistics, and manual review and quality checks used to assess the resulting matched data. The large majority (81%) of matches received quality scores at or above 75/100, and most matches were linked in the first two (of eight) matching passes, indicating high confidence in the final linked dataset. The LiRHC database contained 2,000 RHCs and found that 97% of these clinics were in metropolitan statistical areas and 950 were in the South region of the United States. Through this collaborative effort, the Census Bureau and National Center for Health Statistics strive to understand how RHCs can potentially impact population health as well as the access and provision of health care services across the nation.

Document Tags and Keywords

Keywords:

report, department, discrepancy, record, matched, matching, retail, coverage, medicare, healthcare, medicaid, datasets, assessed

Tags:

Internal Revenue Service, Bureau of Labor Statistics, Social Security Administration, Service Annual Survey, Center for Economic Studies, County Business Patterns, Longitudinal Business Database, Employer Identification Numbers, Economic Census, North American Industry Classification System, National Center for Health Statistics, Disclosure Review Board, Centers for Disease Control and Prevention, Data Management System

Similar Working Papers

The 10 most similar working papers to the working paper 'Methodology on Creating the U.S. Linked Retail Health Clinic (LiRHC) Database' are listed below in order of similarity.

Working Paper

Addressing Data Gaps: Four New Lines of Inquiry in the 2017 Economic Census

September 2019

Authors: Lucia Foster, Randy Becker, Alice Zawacki, T. Kirk White, Emek Basker

Working Paper Number:

CES-19-28

We describe four new lines of inquiry added to the 2017 Economic Census regarding (i) retail health clinics, (ii) management practices in health care services, (iii) self-service in retail and service industries, and (iv) water use in manufacturing and mining industries. These were proposed by economists from the U.S. Census Bureau's Center for Economic Studies in order to fill data gaps in current Census Bureau products concerning the U.S. economy. The new content addresses such issues as the rise in importance of health care and its complexity, the adoption of automation technologies, and the importance of measuring water, a critical input to many manufacturing and mining industries.
View Full Paper PDF
Working Paper

Longitudinal Establishment And Enterprise Microdata (LEEM) Documentation

May 1998

Authors: Zoltan J Acs, Catherine Armington

Working Paper Number:

CES-98-09

This paper introduces and documents the new Longitudinal Enterprise and Establishment Microdata (LEEM) database, which has been constructed by Census' Economic Planning and Coordination Division under contract to the Office of Advocacy of the U.S. Small Business Administration. The LEEM links three years (1990, 1994, and 1995) of basic data for each private sector establishment with payroll in any of those years, along with data on the firm to which the establishment belongs each year. The LEEM data will facilitate both broader and more detailed analysis of patterns of job creation and destruction in the U.S., as well as research on the structure and dynamics of U.S. businesses. This paper provides documentation of the construction of LEEM data, summary data on most variables in the database, comparisons of the annual data with that of the nearly identical County Business Patterns, and distributions of establishments and their employment by the size of their firms. This is followed by a simple analysis of changes over time in the attributes of surviving establishments, and a brief discussion of turnover (business births and deaths) in the population and gross changes in employment associated with both establishment turnover and with surviving establishments. It concludes with a summary of the strengths and weaknesses of the LEEM.
View Full Paper PDF
Working Paper

Matching Compustat Data to the Longitudinal Business Database, 1976-2020

September 2025

Authors: Cristina Tello-Trillo, Lawrence Schmidt, Sean Streiff

Working Paper Number:

CES-25-65

This paper details the methodology for creating an updated Compustat-Longitudinal Business Database (LBD) bridge, facilitating linkage between company identifiers in Compustat and firm identifiers in the LBD. In addition to data from Compustat, we incorporate historical data on public companies from various public and private sources, including information on executive names. Our methodology involves a series of stages using fuzzy name and address matching, including EIN, telephone number, and industry code matching. Qualified researchers with approved proposals can access this bridge though the Federal Statistical Research Data Centers. The Compustat-SSL bridge serves as a crucial resource for longitudinal studies on U.S. businesses, corporate governance, and executive compensation.
View Full Paper PDF
Working Paper

Automating Response Evaluation For Franchising Questions On The 2017 Economic Census

July 2019

Authors: J. Bradford Jensen, Shawn Klimek, Joseph Staudt, Yifang Wei, Lisa Singh, Andrew L. Baer

Working Paper Number:

CES-19-20

Between the 2007 and 2012 Economic Censuses (EC), the count of franchise-affiliated establishments declined by 9.8%. One reason for this decline was a reduction in resources that the Census Bureau was able to dedicate to the manual evaluation of survey responses in the franchise section of the EC. Extensive manual evaluation in 2007 resulted in many establishments, whose survey forms indicated they were not franchise-affiliated, being recoded as franchise-affiliated. No such evaluation could be undertaken in 2012. In this paper, we examine the potential of using external data harvested from the web in combination with machine learning methods to automate the process of evaluating responses to the franchise section of the 2017 EC. Our method allows us to quickly and accurately identify and recode establishments have been mistakenly classified as not being franchise-affiliated, increasing the unweighted number of franchise-affiliated establishments in the 2017 EC by 22%-42%.
View Full Paper PDF
Working Paper

NEW DATA FOR DYNAMIC ANALYSIS: THE LONGITUDINAL ESTABLISHMENT AND ENTERPRISE MICRODATA (LEEM) FILE

December 1999

Authors: Alicia Robb

Working Paper Number:

CES-99-18

Until now, research on U.S. business activities over time has been hindered by the lack of accurate and comprehensive longitudinal data. The new Longitudinal Establishment and Enterprise Microdata (LEEM) are tremendously rich data that open up numerous possibilities for dynamic analyses of businesses in the U.S. economy. It is the first nationwide high-quality longitudinal database that covers the majority of employer businesses from all sectors of the economy. Due to the confidential nature of these data, the file is located at the Center for Economic Studies in the U.S. Bureau of the Census. To access the data, researchers must submit an acceptable proposal to CES and become sworn Census researchers. This paper describes the LEEM file, the variables contained on the file, and current uses of the data.
View Full Paper PDF
Working Paper

The Longitudinal Business Database

July 2002

Authors: Ron Jarmin, Javier Miranda

Working Paper Number:

CES-02-17

As the largest federal statistical agency and primary collector of data on businesses, households and individuals, the Census Bureau each year conducts numerous surveys intended to provide statistics on a wide range of topics about the population and economy of the United States. The Census Bureau's decennial population and quinquennial economic censuses are unique, providing information on all U.S. households and business establishments, respectively.
View Full Paper PDF
Working Paper

Person Matching in Historical Files using the Census Bureau's Person Validation System

September 2014

Authors: Amy B. O'Hara, Catherine G. Massey, Amy OHara

Working Paper Number:

carra-2014-11

The recent release of the 1940 Census manuscripts enables the creation of longitudinal data spanning the whole of the twentieth century. Linked historical and contemporary data would allow unprecedented analyses of the causes and consequences of health, demographic, and economic change. The Census Bureau is uniquely equipped to provide high quality linkages of person records across datasets. This paper summarizes the linkage techniques employed by the Census Bureau and discusses utilization of these techniques to append protected identification keys to the 1940 Census.
View Full Paper PDF
Working Paper

Documenting the Business Register and Related Economic Business Data

March 2016

Authors: Shawn Klimek, Frank Limehouse, Bethany DeSalvo

Working Paper Number:

CES-16-17

The Business Register (BR) is a comprehensive database of business establishments in the United States and provides resources for the U.S. Census Bureau's economic programs for sample selection, research, and survey operations. It is maintained using information from several federal agencies including the Census Bureau, Internal Revenue Service, Bureau of Labor Statistics, and the Social Security Administration. This paper provides a detailed description of the sources and functions of the BR. An overview of the BR as a linking tool and bridge to other Census Bureau data for additional business characteristics is also given.
View Full Paper PDF
Working Paper

Gradient Boosting to Address Statistical Problems Arising from Non-Linkage of Census Bureau Datasets

June 2024

Authors: Narayan Sastry, Todd Gardner, Matthew Cefalu, John Sullivan, Elizabeth Fussell

Working Paper Number:

CES-24-27

This article introduces the twangRDC package, which contains functions to address non-linkage in US Census Bureau datasets. The Census Bureau's Person Identification Validation System facilitates data linkage by assigning unique person identifiers to federal, third party, decennial census, and survey data. Not all records in these datasets can be linked to the reference file and as such not all records will be assigned an identifier. This article is a tutorial for using the twangRDC to generate nonresponse weights to account for non-linkage of person records across US Census Bureau datasets.
View Full Paper PDF
Working Paper

Squeezing More Out of Your Data: Business Record Linkage with Python

November 2018

Authors: Nathan Goldschlag, John Cuffe

Working Paper Number:

CES-18-46

Integrating data from different sources has become a fundamental component of modern data analytics. Record linkage methods represent an important class of tools for accomplishing such integration. In the absence of common disambiguated identifiers, researchers often must resort to ''fuzzy" matching, which allows imprecision in the characteristics used to identify common entities across dfferent datasets. While the record linkage literature has identified numerous individually useful fuzzy matching techniques, there is little consensus on a way to integrate those techniques within a single framework. To this end, we introduce the Multiple Algorithm Matching for Better Analytics (MAMBA), an easy-to-use, flexible, scalable, and transparent software platform for business record linkage applications using Census microdata. MAMBA leverages multiple string comparators to assess the similarity of records using a machine learning algorithm to disambiguate matches. This software represents a transparent tool for researchers seeking to link external business data to the Census Business Register files.
View Full Paper PDF

Methodology on Creating the U.S. Linked Retail Health Clinic (LiRHC) Database

March 2023

Working Paper Number:

CES-23-10

Abstract

Document Tags and Keywords

The 10 most similar working papers to the working paper 'Methodology on Creating the U.S. Linked Retail Health Clinic (LiRHC) Database' are listed below in order of similarity.

September 2019

Working Paper Number:

CES-19-28

May 1998

Working Paper Number:

CES-98-09

September 2025

Working Paper Number:

CES-25-65

July 2019

Working Paper Number:

CES-19-20

December 1999

Working Paper Number:

CES-99-18

July 2002

Working Paper Number:

CES-02-17

September 2014

Working Paper Number:

carra-2014-11

March 2016

Working Paper Number:

CES-16-17

June 2024

Working Paper Number:

CES-24-27

November 2018

Working Paper Number:

CES-18-46