CREAT: Census Research Exploration and Analysis Tool

Using Census Business Data to Augment the MEPS-IC

December 2005

Working Paper Number:

CES-05-26

Abstract

This paper has two aims: first to describe methods, issues, and outcomes involved in matching data from the Insurance Component of the Medical Expenditure Panel Survey (MEPSIC) to other business microdata collected by the U.S. Census Bureau, and second to present some simple results that illustrate the usefulness of such combined data. We present the results of linking the MEPS-IC with data from the 1997 Economic Censuses (EC), but also discuss other possible sources of business data. An issue in any linkage is whether the linked sample remains representative and large enough to be useful. The EC data are attractive because, given the survey's broad coverage and large sample, most of the MEPS-IC sample can be matched to it. We use the combined EC/MEPS-IC data to construct productivity measures that are useful auxiliary data in examining employers' health insurance offering decisions.

Document Tags and Keywords

Keywords Keywords are automatically generated using KeyBERT, a powerful and innovative keyword extraction tool that utilizes BERT embeddings to ensure high-quality and contextually relevant keywords.

By analyzing the content of working papers, KeyBERT identifies terms and phrases that capture the essence of the text, highlighting the most significant topics and trends. This approach not only enhances searchability but provides connections that go beyond potentially domain-specific author-defined keywords.
:
payroll, data census, quarterly, census data, survey, aggregate, respondent, earnings, manufacturer, expenditure, revenue, economic census, insurance, businesses census, population, expense, census business, census bureau, coverage, use census, healthcare, health insurance, insurance coverage

Tags Tags are automatically generated using a pretrained language model from spaCy, which excels at several tasks, including entity tagging.

The model is able to label words and phrases by part-of-speech, including "organizations." By filtering for frequent words and phrases labeled as "organizations", papers are identified to contain references to specific institutions, datasets, and other organizations.
:
Census of Manufactures, Annual Survey of Manufactures, Standard Statistical Establishment List, Internal Revenue Service, Standard Industrial Classification, Service Annual Survey, Retail Trade, Medical Expenditure Panel Survey, Census of Manufacturing Firms, Employer Identification Numbers, Social Security, Economic Census, Wholesale Trade, North American Industry Classification System, Agency for Healthcare Research and Quality, Business Register

Similar Working Papers Similarity between working papers are determined by an unsupervised neural network model know as Doc2Vec.

Doc2Vec is a model that represents entire documents as fixed-length vectors, allowing for the capture of semantic meaning in a way that relates to the context of words within the document. The model learns to associate a unique vector with each document while simultaneously learning word vectors, enabling tasks such as document classification, clustering, and similarity detection by preserving the order and structure of words. The document vectors are compared using cosine similarity/distance to determine the most similar working papers. Papers identified with 🔥 are in the top 20% of similarity.

The 10 most similar working papers to the working paper 'Using Census Business Data to Augment the MEPS-IC' are listed below in order of similarity.