2.1. Employment History Files

2.1.1. Overview

The Employment History Files are a set of job-level tables in the LEHD data. The source of the jobs data is earnings records collected by state unemployment insurance programs, called UI wage record data. These consist of quarterly reports of earnings for all UI-covered employees, provided by employers to states and shared with Census via a cooperative partnership. More information on UI-covered employment and wages is available in previous sections of this document.

The EHF file family consists of four tables:

  • Employment History File (EHF_ZZ)

    This table contains annual records with quarterly earnings for each covered job in a state. This is a simple file that is relatively easy to use for research analysis.

    • Scope: State

    • Key: PIK SEIN SEINUNIT YEAR

  • Job History File (JHF_ZZ)

    This is a wide file containing the earnings arrayed in quarterly variables for each job. In addition to the earnings available in the EHF, the JHF contains imputed establishments for multi-unit employers and successor-predecessor information. The earnings arrays in the JHF are also useful for calculating worker flows. Job spells may be recorded on multiple records in this table as a result of identifier changes or imputation.

    • Scope: State

    • Key: PIK SEIN SPELL_U2W

  • Earnings Indicators (EHF_US_INDICATORS)

    This file contains flags indicating whether or not a PIK had earnings reported in any state in the LEHD system. This is useful for identifying bias in research samples using a subset of state UI data.

    • Scope: National

    • Key: PIK YEAR

  • Earnings Availability (EHF_US_AVAILABILITY)

    This file provides quarter ranges for which state data is available in the EHF and JHF tables. This is useful for identifying bias in research samples.

    • Scope: National

    • Key: STATE

Researchers are encouraged to request all four of the job-level files described for analysis. The most useful file for most researchers is the JHF. However, the EHF can be useful for analyses where establishment-level characteristics or the longitudinal nature of the job is less important than person-level earnings. The last indicators and availabilitiy tables are useful for situations where sample bias or left-censoring of job histories might be an issue.

2.1.2. User Guidance

Identifiers: Employers vs. Firms vs. Establishments

UI wage records are reported at the state tax identifier level (SEIN) for all states - except for Minnesota where they are reported at the establishment level (SEINUNIT). We typically refer to SEINs as employers in this document, but users should be aware they are fundamentally a tax entity and distinct from an establishment or a firm. While identifiers that correspond to the definition of a firm as used by the Longitudinal Business Database (LBD) are available in the ECF T26 file, job-level and employer-level files are generally linked at either the employer or establishment level.

For further details on the definition and usage of each identifier, see the ECF documentation.

For further details on our establishment imputation methodology, see sections below.

Counting Jobs and Constructing Longitudinal Job Histories on the JHF

On the JHF, job spells are arrayed wide by quarter using qtime indexing. There can be one or many records per PIK-SEIN, depending on firm structure and information availability.

  • Single establishment employer (SEIN)

    • There is one record per PIK-SEIN for these jobs. The establishment will be 00000.

  • Multi establishment employer (SEIN) with impute

    • The PIK-SEIN record will have ten imputes of the establishment attached to it.

    • In cases where a worker leaves an employer for more than a year or the establishment structure changes significantly (births/deaths), the JHF record may be broken into multiple records by quarter ranges.

    • The establishment impute is performed independently for each record on the JHF.

  • Multi establishment employer (SEIN) with direct reporting (MN only)

    • There will be one record for each PIK-SEIN-SEINUNIT when there is regular reporting of the establishment on the UI record.

    • Each PIK-SEIN-SEINUNIT record is considered a separate longitudinal job for purposes of QWI calculation.

    • In cases of irregular establishment reporting patterns on the wage records, SEINs can have jobs with directly reported establishments in some quarters and imputes in other quarters, or imputes in all quarters.

The SPELL_U2W variable is created as a simple counter within each PIK-SEIN to form a key that can uniquely identify each record on the JHF. Hence, the key on the table is PIK-SEIN-SPELL_U2W. However, a single JHF record may not be a complete longitudinal history of a job spell. In order to properly link the JHF records to construct the full job spell, the longitudinal job identifier (FID) is added to the JHF. This variable is created as follows:

  • A sequential counter is created for all JHF records for a single PIK.

  • All jobs in a set that is considered part of the same longitudinal history are assigned the lowest value of the counter within the set.

  • The FID is created by turning the counter into a character field prepending with the state numeric FIPS code to make it nationally unique within a PIK.

The PIK-FID should be considered the unit of analysis when constructing the longitudinal history of the job spell. The job can be considered active in any quarter that the PIK-FID has positive earnings on any JHF record. Note that earnings can be reported on multiple JHF records in the same quarter within a PIK-FID group, and these should be summed. For QWI calculation purposes, counts are weighted by the share of total earnings for the PIK-FID, and divided by 10 if multiple imputation is used. This method ensures that each active PIK-FID will be counted as one job in QWI measures.

Accounting for Successor-Predecessor Events

A job in the EHF is a PIK-SEIN pair with positive earnings in a quarter. However, a worker’s job spell can be broken up over multiple SEINs (i.e., be multiple rows in the JHF) if the employer SEIN changes during the spell. Employer tax identifiers change quite frequently in the LEHD data, usually due to firm restructuring. There are also limited instances where virtually all SEINs change within a state in a single quarter. If using the job-level data to identify job starts, separations, or job tenure, successor-predecessor (S-P) information to link jobs must be used to avoid bias arising from firm S-P events. S-P information must also be used to avoid double counting of jobs when earnings are provided by multiple related SEINs in the same quarter.

The SPF reports all significant flows of workers between SEINs. A subset of these flows have been identified as representing S-P events according to business rules described in Identifying Transitions in UI Data. Workers that traverse between two SEINs engaged in a S-P relationship should be considered as having a continuous job history across SEINs. In addition, if the worker received earnings from both SEINs in the same quarter, a common feature of an S-P event, the earnings should be combined, not counted as two separate jobs.

In the JHF, records in S-P relationships are linked by assigning the same FID, as described above. The mechanics of linking records follows:

  • All separations and accessions are identified from all records for a PIK within the JHF.

    • A separation is a quarter of positive earnings followed by a quarter of no earnings, and an accession is the reverse.

    • Separations and accessions can occur in the beginning, middle, or end of the quarter range on the JHF record.

  • A separation is linked with every accession in the same or following quarter to find candidates for linkage.

  • If a S-P relationship identified on the SPF coincides with a link found for a PIK, the JHF records are joined via the FID.

    • Because significant flows have been observed between two SEINs in quarters proximate to the officially recognized transition quarter, links at the PIK level will be made if they are found within several quarters before or after the transition quarter.

    • Some links are provided via internal clerical tables, particularly to handle large-scale SEIN changes. These may not be included in the SPF, but will be incorporated into the FID.

  • The linkage process works iteratively to link all related JHF records to the same FID.

Using the Worker-to-Establishment Imputes

The Census Bureau calculates establishment-level workforce statistics from the LEHD data by imputing establishments for jobs at multi-unit SEINs using multiple imputation. The 10 implicates from that imputation are available on the JHF for researchers. What follows is some general guidance on using the imputed values:

  • First establish if you need the imputed establishment for your analysis. Many researchers use employment-weighted SEIN-level employer characteristics from the ECF. In many cases this might be a preferred approach if you are not using all 10 implicates.

  • While we recommend using all 10 implicates, we are aware some researchers choose to use only one. If you choose to use only one of the imputes, we recommend you use the first implicate.

Worker-to-Establishment Methodology (U2W)

The full description of the worker-to-establishment imputation is available in Abowd et al. [2009], but we describe it briefly here. The model assigns workers to establishments using the relative size of the establishments and the distance between the worker’s residence and the establishment. In other words, the model favors imputing workers to the closest establishment, but must also allocate more workers to larger establishments. Imputed values are drawn 10 times to better capture the uncertainty in the imputation. Model parameters for the imputation are determined using the Minnesota data, which does have SEINUNIT-level jobs data.

To give a very simple example, suppose worker Z works at a SEIN with 3 establishments, A, B and C. If the model determines there is a 60 percent likelihood Z works at A and 20 percent chance Z works at B or C, the 10 implicates will be: A, A, A, A, A, A, B, B, C, C. Multiple imputation gives the user more information about the uncertainty of the imputation than a single impute.

Adding Establishment Characteristics

Employer characteristics to attach to jobs can be retrieved from one of the following sources:

  • ECF_SEIN table - Employment-weighted modal characteristics from establishment data

  • ECF_SEIN_T26 table - Firm level characteristics using Title 26 data (age/size)

  • ECF_SEINUNIT table - Establishment level characteristics

  • QWI_SEINUNIT table - Establishment level characteristics, with some longitudinal editing after linking with jobs data

Establishment-level characteristics include ownership, industry, and geography. Ownership will almost always be the same for all establishments in an SEIN, and cases where that is not true may suggest a data irregularity. There is significant establishment heterogeneity in substate georaphy, and less but still significant heterogeneity in industry. While SEIN-level characteristics may provide a first-order approximation, there will likely be mass points and redistributions in counts relative to establishment-level statistics.

One complication of using establishment characteristics, however, is that the establishment will not always be present in the ECF SEINUNIT or QWI SEINUNIT tables in all quarters that a job to which it has been imputed is active on the JHF. This is usually a function of the operation of the worker-to-establishment methodology. While establishment candidates are intended to be available for the whole of a worker’s job spell at an SEIN, there are allowances for exceptions to this rule. There are several possible fallback strategies to handle these cases:

  1. Use the SEIN-level modal characteristics. This is typically the most straightforward approach, but it can result in mass points and sudden longitudinal changes to the assignment.

  2. Perform a longitudinal edit to get characteristics from the establishment in the nearest quarter it exists. This will maintain longitudinal consistency in assigned characteristics, but it is possible the establishment did not exist and using that geography (or industry) could be incorrect.

  3. Draw an establishment from the set of establishments that do exist within the SEIN in the time period. This approach would come closest to replicating the distribution of the characteristics of the firm in the quarters of interest, though there may be minimal information on the job level to condition the impute. It is also likely the most complicated to implement.

While no particular approach is necessarily preferred, it is recommended that researchers applying establishment-level characteristics be aware of this phenomenon and apply a consistent approach to handling it.

Imputed Wage Records

About 1.5 percent of jobs are missing in the UI wage record data, jobs not reported by the employer or reported after the records are sent to Census Bureau. In these cases, wage records are imputed. A simple explanation of this imputation follows:

  • A first stage identifies likely cases of nonreporting by identifying employers where the number of wage records reported in a quarter is less than the minimum expected given employment in the QCEW.

  • In the second stage workers with jobs at the firm but no earnings in the underreported quarter form the candidate pool for imputation. Workers are imputed to active status in that quarter conditioning on the worker’s empoloyment in the previous and next quarter using a posterior built from historical employment patterns. Generally speaking, workers with active jobs in the previous and following quarter have the highest probability of being imputed to active and receiving imputed earnings. Workers who are never observed working at the firm will never be imputed as working at the firm.

The Census Bureau currently fills only one-quarter reporting gaps, which is the most common type of reporting issue observed in the data. Employers with persistent reporting issues are a more difficult case for imputation.

Selecting a Random Subsample of Persons

The RANDOM_PIK_GROUP variable can be used to select a subsample of PIKs. This is extracted from the first two digits of the PIK, approximately uniformly distributed on [00, 99]. Note that “AA” is also a valid value, denoting individuals for whom no valid SSN was on file. Occurence of such “pseudo-PIKs” varies by state.

2.1.3. Codebook: The EHF Files

Table Metadata for Employment History File (EHF_ZZ)

Access Requirements for EHF_ZZ
State Approval Required IRS Approval Required SSA Approval Required
Access Requirements X
Description

Quarterly earnings for each job in a state as reported by the employer to the state’s UI system.

Scope

State

Key

PIK SEIN SEINUNIT YEAR

Sort Order

PIK SEIN SEINUNIT YEAR

File Format

SAS Data Table

Download Codebook

CSV

Variable Information

Variable Information for EHF_ZZ
Variable Name Type Length Description
PIK char 9 Protected Identification Key
YEAR num 4 Calendar year
EARN_ANN num 8 Annual earnings
EARN1 num 5 Qtr 1 earnings
EARN2 num 5 Qtr 2 earnings
EARN3 num 5 Qtr 3 earnings
EARN4 num 5 Qtr 4 earnings
SEIN char 12 State Employer Identification Number
SEINUNIT char 5 State UI reporting unit

STATE

char 2

Geographic state of the job record. For UI, it is the FIPS code of the state being processed, for OPM, it is the duty state as reported on the OPMUI files (See details in appendix)

FLAG_IMPUTE1

char 1

Qtr 1 imputation flag (See details below)

FLAG_IMPUTE2

char 1

Qtr 2 imputation flag (See details below)

FLAG_IMPUTE3

char 1

Qtr 3 imputation flag (See details below)

FLAG_IMPUTE4

char 1

Qtr 4 imputation flag (See details below)

Details for variable FLAG_IMPUTE1 on EHF_ZZ

Back

Description

Qtr 1 imputation flag

Codebook

Download as CSV

Value Label
0 Quarterly earnings as reported
1 Quarterly earnings imputed

Details for variable FLAG_IMPUTE2 on EHF_ZZ

Back

Description

Qtr 2 imputation flag

Codebook

Download as CSV

Value Label
0 Quarterly earnings as reported
1 Quarterly earnings imputed

Details for variable FLAG_IMPUTE3 on EHF_ZZ

Back

Description

Qtr 3 imputation flag

Codebook

Download as CSV

Value Label
0 Quarterly earnings as reported
1 Quarterly earnings imputed

Details for variable FLAG_IMPUTE4 on EHF_ZZ

Back

Description

Qtr 4 imputation flag

Codebook

Download as CSV

Value Label
0 Quarterly earnings as reported
1 Quarterly earnings imputed

2.1.4. Codebook: The JHF Files

Table Metadata for Job History File (JHF_ZZ)

Access Requirements for JHF_ZZ
State Approval Required IRS Approval Required SSA Approval Required
Access Requirements X
Description

Wide version of earnings history that contains imputed establishment (SEINUNITs) for multi-unit firms, and a job-level ID useful for tracking job spells across employer identifier changes.

Scope

State

Key

PIK SEIN SPELL_U2W

Sort Order

PIK FID SEIN SPELL_U2W

File Format

SAS Data Table

Download Codebook

CSV

Variable Information

Variable Information for JHF_ZZ
Variable Name Type Length Description
PIK char 9 Protected Identification Key
SEIN char 12 State Employer Identification Number
SPELL_U2W num 3

E21-E153

num 8

Earnings (See details below)

FID num 5 Within-PIK linked job spell identifier
SEINUNIT1 char 5 State UI Reporting Unit Number (Impute 1)
SEINUNIT2 char 5 State UI Reporting Unit Number (Impute 2)
SEINUNIT3 char 5 State UI Reporting Unit Number (Impute 3)
SEINUNIT4 char 5 State UI Reporting Unit Number (Impute 4)
SEINUNIT5 char 5 State UI Reporting Unit Number (Impute 5)
SEINUNIT6 char 5 State UI Reporting Unit Number (Impute 6)
SEINUNIT7 char 5 State UI Reporting Unit Number (Impute 7)
SEINUNIT8 char 5 State UI Reporting Unit Number (Impute 8)
SEINUNIT9 char 5 State UI Reporting Unit Number (Impute 9)
SEINUNIT10 char 5 State UI Reporting Unit Number (Impute 10)
FIRST_ACC num 3 First accession (First quarter of employment at this job spell)
LAST_SEP num 3 Last separation (Last quarter of employment at this job spell)

FLAG_SEINUNIT_IMPUTED

num 3

(See details below)

STATE

char 2

(See details in appendix)

RANDOM_PIK_GROUP char 2 Selector based on random pik

Details for variables E21-E153 on JHF_ZZ

Back

Description

Earnings

Source

UI Records

Valid Values

Positive integer values.

Notes

The set of quarterly variables available in each state corresponds to the set of quarters indexed by qtime for which earnings data are available for that state.

Details for variable FLAG_SEINUNIT_IMPUTED on JHF_ZZ

Back

Codebook

Download as CSV

Value Label
0 Establishment assignment not imputed
1 Establishment assignment imputed

2.1.5. Codebook: The EHF_US_INDICATORS File

Table Metadata for Earnings Indicators (EHF_US_INDICATORS)

Access Requirements for EHF_US_INDICATORS
State Approval Required IRS Approval Required SSA Approval Required
Access Requirements
Description

Indicates whether or not a PIK had any earnings reported in any state in the LEHD system.

Scope

National

Key

PIK YEAR

File Format

SAS Data Table

Variable Information

Variable Information for EHF_US_INDICATORS
Variable Name Type Length Description
PIK char 9 Protected Identification Key
YEAR num 3 Calendar year
NUM_STATES_EARN1 num 3 Number of states reporting earnings for PIK in Q1
NUM_STATES_EARN2 num 3 Number of states reporting earnings for PIK in Q2
NUM_STATES_EARN3 num 3 Number of states reporting earnings for PIK in Q3
NUM_STATES_EARN4 num 3 Number of states reporting earnings for PIK in Q4

2.1.6. Codebook: The EHF_ALL_AVAILABILITY File

Table Metadata for Earnings Availability (EHF_ALL_AVAILABILITY)

Access Requirements for EHF_ALL_AVAILABILITY
State Approval Required IRS Approval Required SSA Approval Required
Access Requirements
Description

Indicates data availability ranges for all states.

Scope

National

Key

State

File Format

SAS Data Table

Download Codebook

CSV

Variable Information

Variable Information for EHF_ALL_AVAILABILITY
Variable Name Type Length Description

STATE

char 2

State postal code (See details in appendix)

START_YEAR num 8 First year of data on EHF
START_QUARTER num 8 First quarter of data on EHF
END_YEAR num 8 Last year of data on EHF
END_QUARTER num 8 Last quarter of data on EHF
START_YEAR_JHF num 8 First year of data on JHF
START_QUARTER_JHF num 8 First quarter of data on JHF
END_YEAR_JHF num 8 Last year of data on JHF
END_QUARTER_JHF num 8 Last quarter of data on JHF

START_QTIME_JHF

num 3

Last quarter of data on JHF (1985Q1=1) (See details in appendix)

END_QTIME_JHF

num 3

Last quarter of data on JHF (1985Q1=1) (See details in appendix)