Retail health clinics (RHCs) are a relatively new type of health care setting and understanding the role they play as a source of ambulatory care in the United States is important. To better understand these settings, a joint project by the Census Bureau and National Center for Health Statistics used data science techniques to link together data on RHCs from Convenient Care Association, County Business Patterns Business Register, and National Plan and Provider Enumeration System to create the Linked RHC (LiRHC, pronounced 'lyric') database of locations throughout the United States during the years 2018 to 2020. The matching methodology used to perform this linkage is described, as well as the benchmarking, match statistics, and manual review and quality checks used to assess the resulting matched data. The large majority (81%) of matches received quality scores at or above 75/100, and most matches were linked in the first two (of eight) matching passes, indicating high confidence in the final linked dataset. The LiRHC database contained 2,000 RHCs and found that 97% of these clinics were in metropolitan statistical areas and 950 were in the South region of the United States. Through this collaborative effort, the Census Bureau and National Center for Health Statistics strive to understand how RHCs can potentially impact population health as well as the access and provision of health care services across the nation.
-
Addressing Data Gaps:
Four New Lines of Inquiry in the 2017 Economic Census
September 2019
Working Paper Number:
CES-19-28
We describe four new lines of inquiry added to the 2017 Economic Census regarding (i) retail health clinics, (ii) management practices in health care services, (iii) self-service in retail and service industries, and (iv) water use in manufacturing and mining industries. These were proposed by economists from the U.S. Census Bureau's Center for Economic Studies in order to fill data gaps in current Census Bureau products concerning the U.S. economy. The new content addresses such issues as the rise in importance of health care and its complexity, the adoption of automation technologies, and the importance of measuring water, a critical input to many manufacturing and mining industries.
View Full
Paper PDF
-
Automating Response Evaluation For Franchising Questions On The 2017 Economic Census
July 2019
Working Paper Number:
CES-19-20
Between the 2007 and 2012 Economic Censuses (EC), the count of franchise-affiliated establishments declined by 9.8%. One reason for this decline was a reduction in resources that the Census Bureau was able to dedicate to the manual evaluation of survey responses in the franchise section of the EC. Extensive manual evaluation in 2007 resulted in many establishments, whose survey forms indicated they were not franchise-affiliated, being recoded as franchise-affiliated. No such evaluation could be undertaken in 2012. In this paper, we examine the potential of using external data harvested from the web in combination with machine learning methods to automate the process of evaluating responses to the franchise section of the 2017 EC. Our method allows us to quickly and accurately identify and recode establishments have been mistakenly classified as not being franchise-affiliated, increasing the unweighted number of franchise-affiliated establishments in the 2017 EC by 22%-42%.
View Full
Paper PDF
-
Estimating Record Linkage False Match Rate for the Person Identification Validation System
July 2014
Working Paper Number:
carra-2014-02
The Census Bureau Person Identification Validation System (PVS) assigns unique person identifiers to federal, commercial, census, and survey data to facilitate linkages across files. PVS uses probabilistic matching to assign a unique Census Bureau identifier for each person. This paper presents a method to measure the false match rate in PVS following the approach of Belin and Rubin (1995). The Belin and Rubin methodology requires truth data to estimate a mixture model. The parameters from the mixture model are used to obtain point estimates of the false match rate for each of the PVS search modules. The truth data requirement is satisfied by the unique access the Census Bureau has to high quality name, date of birth, address and Social Security (SSN) data. Truth data are quickly created for the Belin and Rubin model and do not involve a clerical review process. These truth data are used to create estimates for the Belin and Rubin parameters, making the approach more feasible. Both observed and modeled false match rates are computed for all search modules in federal administrative records data and commercial data.
View Full
Paper PDF
-
Longitudinal Establishment And Enterprise Microdata (LEEM) Documentation
May 1998
Working Paper Number:
CES-98-09
This paper introduces and documents the new Longitudinal Enterprise and Establishment Microdata (LEEM) database, which has been constructed by Census' Economic Planning and Coordination Division under contract to the Office of Advocacy of the U.S. Small Business Administration. The LEEM links three years (1990, 1994, and 1995) of basic data for each private sector establishment with payroll in any of those years, along with data on the firm to which the establishment belongs each year. The LEEM data will facilitate both broader and more detailed analysis of patterns of job creation and destruction in the U.S., as well as research on the structure and dynamics of U.S. businesses. This paper provides documentation of the construction of LEEM data, summary data on most variables in the database, comparisons of the annual data with that of the nearly identical County Business Patterns, and distributions of establishments and their employment by the size of their firms. This is followed by a simple analysis of changes over time in the attributes of surviving establishments, and a brief discussion of turnover (business births and deaths) in the population and gross changes in employment associated with both establishment turnover and with surviving establishments. It concludes with a summary of the strengths and weaknesses of the LEEM.
View Full
Paper PDF
-
Gradient Boosting to Address Statistical Problems Arising from Non-Linkage of Census Bureau Datasets
June 2024
Working Paper Number:
CES-24-27
This article introduces the twangRDC package, which contains functions to address non-linkage in US Census Bureau datasets. The Census Bureau's Person Identification Validation System facilitates data linkage by assigning unique person identifiers to federal, third party, decennial census, and survey data. Not all records in these datasets can be linked to the reference file and as such not all records will be assigned an identifier. This article is a tutorial for using the twangRDC to generate nonresponse weights to account for non-linkage of person records across US Census Bureau datasets.
View Full
Paper PDF
-
Matching State Business Registration Records
to Census Business Data
January 2020
Working Paper Number:
CES-20-03
We describe our methodology and results from matching state Business Registration Records (BRR) to Census business data. We use data from Massachusetts and California to develop methods and preliminary results that could be used to guide matching data for additional states. We obtain matches to Census business records for 45% of the Massachusetts BRR records and 40% of the California BRR records. We find higher match rates for incorporated businesses and businesses with higher startup-quality scores as assigned in Guzman and Stern (2018). Clerical reviews show that using relatively strict matching on address is important for match accuracy, while results are less sensitive to name matching strictness. Among matched BRR records, the modal timing of the first match to the BR is in the year in which the BRR record was filed. We use two sets of software to identify matches: SAS DQ Match and a machine-learning algorithm described in Cuffe and Goldschlag (2018). We find preliminary evidence that while the ML-based method yields more match results, SAS DQ tends to result in higher accuracy rates. To conclude, we provide suggestions on how to proceed with matching other states' data in light of our findings using these two states.
View Full
Paper PDF
-
Full Report of the Comparisons of Administrative Record Rosters to Census Self-Responses and NRFU Household Member Responses
March 2023
Working Paper Number:
CES-23-08
One of the U.S. Census Bureau's innovations in the 2020 U.S. Census was the use of administrative records (AR) to create household rosters for enumerating some addresses when a self response was not available but high-quality ARs were. The goal was to reduce the cost of fieldwork during the Nonresponse Followup operation (NRFU). The original plan had NRFU beginning in mid-May and continuing through late July 2020. However, the COVID-19 pandemic forced the delay of NRFU and caused the Internal Revenue Service to postpone the income tax filing deadline, resulting in an interruption in the delivery of ARs to the U.S. Census Bureau. The delays were not anticipated when U.S. Census Bureau staff conducted the research on AR enumeration with the 2010 Census data in preparation for the 2020 Census or during the fine tuning of plans for using ARs during the 2018 End-to-End Census Test. These circumstances raised questions about whether the quality of the AR household rosters was high enough for use in enumeration. To aid in investigating the concern about the quality of the AR rosters, our analyses compared AR rosters to self-response rosters and NRFU household member responses at addresses where both ARs and a self-response were available.
View Full
Paper PDF
-
The Longitudinal Business Database
July 2002
Working Paper Number:
CES-02-17
As the largest federal statistical agency and primary collector of data on businesses, households and individuals, the Census Bureau each year conducts numerous surveys intended to provide statistics on a wide range of topics about the population and economy of the United States. The Census Bureau's decennial population and quinquennial economic censuses are unique, providing information on all U.S. households and business establishments, respectively.
View Full
Paper PDF
-
Squeezing More Out of Your Data: Business Record Linkage with Python
November 2018
Working Paper Number:
CES-18-46
Integrating data from different sources has become a fundamental component of modern data analytics. Record linkage methods represent an important class of tools for accomplishing such integration. In the absence of common disambiguated identifiers, researchers often must resort to ''fuzzy" matching, which allows imprecision in the characteristics used to identify common entities across dfferent datasets. While the record linkage literature has identified numerous individually useful fuzzy matching techniques, there is little consensus on a way to integrate those techniques within a
single framework. To this end, we introduce the Multiple Algorithm Matching for Better Analytics (MAMBA), an easy-to-use, flexible, scalable, and transparent software platform for business record linkage applications using Census microdata. MAMBA leverages multiple string comparators to assess the similarity of records using a machine learning algorithm to disambiguate matches. This software represents a transparent tool for researchers seeking to link external business data to the Census Business Register files.
View Full
Paper PDF
-
Documenting the Business Register and Related Economic Business Data
March 2016
Working Paper Number:
CES-16-17
The Business Register (BR) is a comprehensive database of business establishments in the United States and provides resources for the U.S. Census Bureau's economic programs for sample selection, research, and survey operations. It is maintained using information from several federal agencies including the Census Bureau, Internal Revenue Service, Bureau of Labor Statistics, and the Social Security Administration. This paper provides a detailed description of the sources and functions of the BR. An overview of the BR as a linking tool and bridge to other Census Bureau data for additional business characteristics is also given.
View Full
Paper PDF