CREAT: Census Research Exploration and Analysis Tool

Confidentiality Protection in the Census Bureau Quarterly Workforce Indicators

February 2006

Working Paper Number:

tp-2006-02

Abstract

The QuarterlyWorkforce Indicators are new estimates developed by the Census Bureau's Longitudinal Employer-Household Dynamics Program as a part of its Local Employment Dynamics partnership with 37 state Labor Market Information offices. These data provide detailed quarterly statistics on employment, accessions, layoffs, hires, separations, full-quarter employment (and related flows), job creations, job destructions, and earnings (for flow and stock categories of workers). The data are released for NAICS industries (and 4-digit SICs) at the county, workforce investment board, and metropolitan area levels of geography. The confidential microdata - unemployment insurance wage records, ES-202 establishment employment, and Title 13 demographic and economic information - are protected using a permanent multiplicative noise distortion factor. This factor distorts all input sums, counts, differences and ratios. The released statistics are analytically valid - measures are unbiased and time series properties are preserved. The confidentiality protection is manifested in the release of some statistics that are flagged as "significantly distorted to preserve confidentiality." These statistics differ from the undistorted statistics by a significant proportion. Even for the significantly distorted statistics, the data remain analytically valid for time series properties. The released data can be aggregated; however, published aggregates are less distorted than custom postrelease aggregates. In addition to the multiplicative noise distortion, confidentiality protection is provided by the estimation process for the QWIs, which multiply imputes all missing data (including missing establishment, given UI account, in the UI wage record data) and dynamically re-weights the establishment data to provide state-level comparability with the BLS's Quarterly Census of Employment and Wages.

Document Tags and Keywords

Keywords Keywords are automatically generated using KeyBERT, a powerful and innovative keyword extraction tool that utilizes BERT embeddings to ensure high-quality and contextually relevant keywords.

By analyzing the content of working papers, KeyBERT identifies terms and phrases that capture the essence of the text, highlighting the most significant topics and trends. This approach not only enhances searchability but provides connections that go beyond potentially domain-specific author-defined keywords.
:
statistical, report, payroll, agency, respondent, survey, employed, employ, employee, labor, longitudinal, economic census, workplace, workforce, worker, household, census bureau, research census, employer household, aging, censuses surveys, census employment, 2010 census

Tags Tags are automatically generated using a pretrained language model from spaCy, which excels at several tasks, including entity tagging.

The model is able to label words and phrases by part-of-speech, including "organizations." By filtering for frequent words and phrases labeled as "organizations", papers are identified to contain references to specific institutions, datasets, and other organizations.
:
Bureau of Labor Statistics, National Science Foundation, Employer Identification Number, American Economic Review, Cornell University, Unemployment Insurance, Research Data Center, North American Industry Classification System, Social Security Number, Longitudinal Employer Household Dynamics, Cornell Institute for Social and Economic Research, LEHD Program, Business Register, Employer-Household Dynamics, Protected Identification Key, Sloan Foundation, Quarterly Workforce Indicators, Quarterly Census of Employment and Wages, Census Bureau Disclosure Review Board

Similar Working Papers Similarity between working papers are determined by an unsupervised neural network model know as Doc2Vec.

Doc2Vec is a model that represents entire documents as fixed-length vectors, allowing for the capture of semantic meaning in a way that relates to the context of words within the document. The model learns to associate a unique vector with each document while simultaneously learning word vectors, enabling tasks such as document classification, clustering, and similarity detection by preserving the order and structure of words. The document vectors are compared using cosine similarity/distance to determine the most similar working papers. Papers identified with 🔥 are in the top 20% of similarity.

The 10 most similar working papers to the working paper 'Confidentiality Protection in the Census Bureau Quarterly Workforce Indicators' are listed below in order of similarity.