2.1. Employment History Files

2.1.1. Overview

The Employment History Files are a set of job-level tables in the LEHD data. The source of the jobs data is earnings records collected by state unemployment insurance programs, called UI wage record data. These consist of quarterly reports of earnings for all UI-covered employees, provided by employers to states and shared with Census via a cooperative partnership. More information on UI-covered employment and wages is available in previous sections of this document.

The EHF file family consists of four tables:

Employment History File (EHF_ZZ)

This table contains annual records with quarterly earnings for each covered job in a state. This is a simple file that is relatively easy to use for research analysis.
- Scope: State
- Key: PIK SEIN SEINUNIT YEAR
Job History File (JHF_ZZ)

This is a wide file containing the earnings arrayed in quarterly variables for each job. In addition to the earnings available in the EHF, the JHF contains imputed establishments for multi-unit employers and successor-predecessor information. The earnings arrays in the JHF are also useful for calculating worker flows. Job spells may be recorded on multiple records in this table as a result of identifier changes or imputation.
- Scope: State
- Key: PIK SEIN SPELL_U2W
Earnings Indicators (EHF_US_INDICATORS)

This file contains flags indicating whether or not a PIK had earnings reported in any state in the LEHD system. This is useful for identifying bias in research samples using a subset of state UI data.
- Scope: National
- Key: PIK YEAR
Earnings Availability (EHF_US_AVAILABILITY)

This file provides quarter ranges for which state data is available in the EHF and JHF tables. This is useful for identifying bias in research samples.
- Scope: National
- Key: STATE

Researchers are encouraged to request all four of the job-level files described for analysis. The most useful file for most researchers is the JHF. However, the EHF can be useful for analyses where establishment-level characteristics or the longitudinal nature of the job is less important than person-level earnings. The last indicators and availabilitiy tables are useful for situations where sample bias or left-censoring of job histories might be an issue.

2.1.2. User Guidance

Identifiers: Employers vs. Firms vs. Establishments

UI wage records are reported at the state tax identifier level (SEIN) for all states - except for Minnesota where they are reported at the establishment level (SEINUNIT). We typically refer to SEINs as employers in this document, but users should be aware they are fundamentally a tax entity and distinct from an establishment or a firm. While identifiers that correspond to the definition of a firm as used by the Longitudinal Business Database (LBD) are available in the ECF T26 file, job-level and employer-level files are generally linked at either the employer or establishment level.

For further details on the definition and usage of each identifier, see the ECF documentation.

For further details on our establishment imputation methodology, see sections below.

Counting Jobs and Constructing Longitudinal Job Histories on the JHF

On the JHF, job spells are arrayed wide by quarter using qtime indexing. There can be one or many records per PIK-SEIN, depending on firm structure and information availability.

Single establishment employer (SEIN)
- There is one record per PIK-SEIN for these jobs. The establishment will be 00000.
Multi establishment employer (SEIN) with impute
- The PIK-SEIN record will have ten imputes of the establishment attached to it.
- In cases where a worker leaves an employer for more than a year or the establishment structure changes significantly (births/deaths), the JHF record may be broken into multiple records by quarter ranges.
- The establishment impute is performed independently for each record on the JHF.
Multi establishment employer (SEIN) with direct reporting (MN only)
- There will be one record for each PIK-SEIN-SEINUNIT when there is regular reporting of the establishment on the UI record.
- Each PIK-SEIN-SEINUNIT record is considered a separate longitudinal job for purposes of QWI calculation.
- In cases of irregular establishment reporting patterns on the wage records, SEINs can have jobs with directly reported establishments in some quarters and imputes in other quarters, or imputes in all quarters.

The SPELL_U2W variable is created as a simple counter within each PIK-SEIN to form a key that can uniquely identify each record on the JHF. Hence, the key on the table is PIK-SEIN-SPELL_U2W. However, a single JHF record may not be a complete longitudinal history of a job spell. In order to properly link the JHF records to construct the full job spell, the longitudinal job identifier (FID) is added to the JHF. This variable is created as follows:

A sequential counter is created for all JHF records for a single PIK.
All jobs in a set that is considered part of the same longitudinal history are assigned the lowest value of the counter within the set.
The FID is created by turning the counter into a character field prepending with the state numeric FIPS code to make it nationally unique within a PIK.

The PIK-FID should be considered the unit of analysis when constructing the longitudinal history of the job spell. The job can be considered active in any quarter that the PIK-FID has positive earnings on any JHF record. Note that earnings can be reported on multiple JHF records in the same quarter within a PIK-FID group, and these should be summed. For QWI calculation purposes, counts are weighted by the share of total earnings for the PIK-FID, and divided by 10 if multiple imputation is used. This method ensures that each active PIK-FID will be counted as one job in QWI measures.

Accounting for Successor-Predecessor Events

A job in the EHF is a PIK-SEIN pair with positive earnings in a quarter. However, a worker’s job spell can be broken up over multiple SEINs (i.e., be multiple rows in the JHF) if the employer SEIN changes during the spell. Employer tax identifiers change quite frequently in the LEHD data, usually due to firm restructuring. There are also limited instances where virtually all SEINs change within a state in a single quarter. If using the job-level data to identify job starts, separations, or job tenure, successor-predecessor (S-P) information to link jobs must be used to avoid bias arising from firm S-P events. S-P information must also be used to avoid double counting of jobs when earnings are provided by multiple related SEINs in the same quarter.

The SPF reports all significant flows of workers between SEINs. A subset of these flows have been identified as representing S-P events according to business rules described in Identifying Transitions in UI Data. Workers that traverse between two SEINs engaged in a S-P relationship should be considered as having a continuous job history across SEINs. In addition, if the worker received earnings from both SEINs in the same quarter, a common feature of an S-P event, the earnings should be combined, not counted as two separate jobs.

In the JHF, records in S-P relationships are linked by assigning the same FID, as described above. The mechanics of linking records follows:

All separations and accessions are identified from all records for a PIK within the JHF.
- A separation is a quarter of positive earnings followed by a quarter of no earnings, and an accession is the reverse.
- Separations and accessions can occur in the beginning, middle, or end of the quarter range on the JHF record.
A separation is linked with every accession in the same or following quarter to find candidates for linkage.
If a S-P relationship identified on the SPF coincides with a link found for a PIK, the JHF records are joined via the FID.
- Because significant flows have been observed between two SEINs in quarters proximate to the officially recognized transition quarter, links at the PIK level will be made if they are found within several quarters before or after the transition quarter.
- Some links are provided via internal clerical tables, particularly to handle large-scale SEIN changes. These may not be included in the SPF, but will be incorporated into the FID.
The linkage process works iteratively to link all related JHF records to the same FID.

Using the Worker-to-Establishment Imputes

The Census Bureau calculates establishment-level workforce statistics from the LEHD data by imputing establishments for jobs at multi-unit SEINs using multiple imputation. The 10 implicates from that imputation are available on the JHF for researchers. What follows is some general guidance on using the imputed values:

First establish if you need the imputed establishment for your analysis. Many researchers use employment-weighted SEIN-level employer characteristics from the ECF. In many cases this might be a preferred approach if you are not using all 10 implicates.
While we recommend using all 10 implicates, we are aware some researchers choose to use only one. If you choose to use only one of the imputes, we recommend you use the first implicate.

Worker-to-Establishment Methodology (U2W)

The full description of the worker-to-establishment imputation is available in Abowd et al. [2009], but we describe it briefly here. The model assigns workers to establishments using the relative size of the establishments and the distance between the worker’s residence and the establishment. In other words, the model favors imputing workers to the closest establishment, but must also allocate more workers to larger establishments. Imputed values are drawn 10 times to better capture the uncertainty in the imputation. Model parameters for the imputation are determined using the Minnesota data, which does have SEINUNIT-level jobs data.

To give a very simple example, suppose worker Z works at a SEIN with 3 establishments, A, B and C. If the model determines there is a 60 percent likelihood Z works at A and 20 percent chance Z works at B or C, the 10 implicates will be: A, A, A, A, A, A, B, B, C, C. Multiple imputation gives the user more information about the uncertainty of the imputation than a single impute.

Adding Establishment Characteristics

Employer characteristics to attach to jobs can be retrieved from one of the following sources:

ECF_SEIN table - Employment-weighted modal characteristics from establishment data
ECF_SEIN_T26 table - Firm level characteristics using Title 26 data (age/size)
ECF_SEINUNIT table - Establishment level characteristics
QWI_SEINUNIT table - Establishment level characteristics, with some longitudinal editing after linking with jobs data

Establishment-level characteristics include ownership, industry, and geography. Ownership will almost always be the same for all establishments in an SEIN, and cases where that is not true may suggest a data irregularity. There is significant establishment heterogeneity in substate georaphy, and less but still significant heterogeneity in industry. While SEIN-level characteristics may provide a first-order approximation, there will likely be mass points and redistributions in counts relative to establishment-level statistics.

One complication of using establishment characteristics, however, is that the establishment will not always be present in the ECF SEINUNIT or QWI SEINUNIT tables in all quarters that a job to which it has been imputed is active on the JHF. This is usually a function of the operation of the worker-to-establishment methodology. While establishment candidates are intended to be available for the whole of a worker’s job spell at an SEIN, there are allowances for exceptions to this rule. There are several possible fallback strategies to handle these cases:

Use the SEIN-level modal characteristics. This is typically the most straightforward approach, but it can result in mass points and sudden longitudinal changes to the assignment.
Perform a longitudinal edit to get characteristics from the establishment in the nearest quarter it exists. This will maintain longitudinal consistency in assigned characteristics, but it is possible the establishment did not exist and using that geography (or industry) could be incorrect.
Draw an establishment from the set of establishments that do exist within the SEIN in the time period. This approach would come closest to replicating the distribution of the characteristics of the firm in the quarters of interest, though there may be minimal information on the job level to condition the impute. It is also likely the most complicated to implement.

While no particular approach is necessarily preferred, it is recommended that researchers applying establishment-level characteristics be aware of this phenomenon and apply a consistent approach to handling it.

Imputed Wage Records

About 1.5 percent of jobs are missing in the UI wage record data, jobs not reported by the employer or reported after the records are sent to Census Bureau. In these cases, wage records are imputed. A simple explanation of this imputation follows:

A first stage identifies likely cases of nonreporting by identifying employers where the number of wage records reported in a quarter is less than the minimum expected given employment in the QCEW.
In the second stage workers with jobs at the firm but no earnings in the underreported quarter form the candidate pool for imputation. Workers are imputed to active status in that quarter conditioning on the worker’s empoloyment in the previous and next quarter using a posterior built from historical employment patterns. Generally speaking, workers with active jobs in the previous and following quarter have the highest probability of being imputed to active and receiving imputed earnings. Workers who are never observed working at the firm will never be imputed as working at the firm.

The Census Bureau currently fills only one-quarter reporting gaps, which is the most common type of reporting issue observed in the data. Employers with persistent reporting issues are a more difficult case for imputation.

Selecting a Random Subsample of Persons

The RANDOM_PIK_GROUP variable can be used to select a subsample of PIKs. This is extracted from the first two digits of the PIK, approximately uniformly distributed on [00, 99]. Note that “AA” is also a valid value, denoting individuals for whom no valid SSN was on file. Occurence of such “pseudo-PIKs” varies by state.

2.1.3. Codebook: The EHF Files

Table Metadata for Employment History File (EHF_ZZ)

Access Requirements for EHF_ZZ
	State Approval Required	IRS Approval Required	SSA Approval Required
Access Requirements	X

Description: Quarterly earnings for each job in a state as reported by the employer to the state’s UI system.
Scope: State
Key: pik sein seinunit year
Sort Order: pik sein seinunit year
File Formats: SAS Data Table, Parquet (partitioned by PIK group)
Download Codebook: CSV

Variable Information

Variable Information for EHF_ZZ
Variable Name	SAS Variable Type	SAS Variable Length	Parquet Variable Type	Description
pik	char	9	string	Protected Identification Key
year	num	4	uint32	Calendar year
earn_ann	num	8	uint64	Annual earnings
earn1	num	5	uint64	Qtr 1 earnings
earn2	num	5	uint64	Qtr 2 earnings
earn3	num	5	uint64	Qtr 3 earnings
earn4	num	5	uint64	Qtr 4 earnings
sein	char	12	string	State Employer Identification Number
seinunit	char	5	string	State UI reporting unit
state	char	2	string	Geographic state of the job record. For UI, it is the FIPS code of the state being processed, for OPM, it is the duty state as reported on the OPMUI files (See details in appendix)
flag_impute1	char	1	string	Qtr 1 imputation flag (See details below)
flag_impute2	char	1	string	Qtr 2 imputation flag (See details below)
flag_impute3	char	1	string	Qtr 3 imputation flag (See details below)
flag_impute4	char	1	string	Qtr 4 imputation flag (See details below)

Details for variables: flag_impute1, flag_impute2, flag_impute3, flag_impute4 on EHF_ZZ

Back

Description

flag_impute1: Qtr 1 imputation flag

flag_impute2: Qtr 2 imputation flag

flag_impute3: Qtr 3 imputation flag

flag_impute4: Qtr 4 imputation flag

Codebook

Download flag_impute1 codebook as CSV

Download flag_impute2 codebook as CSV

Download flag_impute3 codebook as CSV

Download flag_impute4 codebook as CSV

Code	Label
0	Quarterly earnings as reported
1	Quarterly earnings imputed

2.1.4. Codebook: The JHF Files

Table Metadata for Job History File (JHF_ZZ)

Access Requirements for JHF_ZZ
	State Approval Required	IRS Approval Required	SSA Approval Required
Access Requirements	X

Description: Wide version of earnings history that contains imputed establishment (SEINUNITs) for multi-unit firms, and a job-level ID useful for tracking job spells across employer identifier changes.
Scope: State
Key: pik sein spell_u2w
Sort Order: pik fid sein spell_u2w
File Formats: SAS Data Table, Parquet (partitioned by PIK group)
Download Codebook: CSV

Variable Information

Variable Information for JHF_ZZ
Variable Name	SAS Variable Type	SAS Variable Length	Parquet Variable Type	Description
pik	char	9	string	Protected Identification Key
sein	char	12	string	State Employer Identification Number
spell_u2w	num	3	uint64	Spell of employment at an SEIN associated with a reported or imputed establishment
e21-e161	num	8	uint64	Earnings (See details below)
fid	num	5	uint32	Within-PIK linked job spell identifier
seinunit1	char	5	string	State UI Reporting Unit Number (Impute 1)
seinunit2	char	5	string	State UI Reporting Unit Number (Impute 2)
seinunit3	char	5	string	State UI Reporting Unit Number (Impute 3)
seinunit4	char	5	string	State UI Reporting Unit Number (Impute 4)
seinunit5	char	5	string	State UI Reporting Unit Number (Impute 5)
seinunit6	char	5	string	State UI Reporting Unit Number (Impute 6)
seinunit7	char	5	string	State UI Reporting Unit Number (Impute 7)
seinunit8	char	5	string	State UI Reporting Unit Number (Impute 8)
seinunit9	char	5	string	State UI Reporting Unit Number (Impute 9)
seinunit10	char	5	string	State UI Reporting Unit Number (Impute 10)
first_acc	num	3	uint32	First accession (First quarter of employment at this job spell)
last_sep	num	3	uint32	Last separation (Last quarter of employment at this job spell)
flag_seinunit_imputed	num	3	bool	(See details below)
state	char	2	string	(See details in appendix)
random_pik_group	char	2	string	Selector based on random pik

Details for variables e21-e161 on JHF_ZZ

Back

Description: Earnings
Source: UI Records
Valid Values: Positive integer values.
Notes: The set of quarterly variables available in each state corresponds to the set of quarters indexed by qtime for which earnings data are available for that state.

Details for variable flag_seinunit_imputed on JHF_ZZ

Back

Codebook

Download as CSV

Code	Label
0	Establishment assignment not imputed
1	Establishment assignment imputed

2.1.5. Codebook: The EHF_US_INDICATORS File

Table Metadata for Earnings Indicators (EHF_US_INDICATORS)

Access Requirements for EHF_US_INDICATORS
	State Approval Required	IRS Approval Required	SSA Approval Required
Access Requirements

Description: Indicates whether or not a PIK had any earnings reported in any state in the LEHD system.
Scope: National
Key: pik year
File Formats: SAS Data Table, Parquet (partitioned by PIK group)

Variable Information

Variable Information for EHF_US_INDICATORS
Variable Name	SAS Variable Type	SAS Variable Length	Parquet Variable Type	Description
pik	char	9	string	Protected Identification Key
year	num	3	uint32	Calendar year
num_states_earn1	num	3	uint32	Number of states reporting earnings for PIK in Q1
num_states_earn2	num	3	uint32	Number of states reporting earnings for PIK in Q2
num_states_earn3	num	3	uint32	Number of states reporting earnings for PIK in Q3
num_states_earn4	num	3	uint32	Number of states reporting earnings for PIK in Q4

2.1.6. Codebook: The EHF_ALL_AVAILABILITY File

Table Metadata for Earnings Availability (EHF_ALL_AVAILABILITY)

Access Requirements for EHF_ALL_AVAILABILITY
	State Approval Required	IRS Approval Required	SSA Approval Required
Access Requirements

Description: Indicates data availability ranges for all states.
Scope: National
Key: state
File Formats: SAS Data Table, CSV
Download Codebook: CSV

Variable Information

Variable Information for EHF_ALL_AVAILABILITY
Variable Name	SAS Variable Type	SAS Variable Length	Description
state	char	2	State postal code (See details in appendix)
start_year	num	8	First year of data on EHF
start_quarter	num	8	First quarter of data on EHF
end_year	num	8	Last year of data on EHF
end_quarter	num	8	Last quarter of data on EHF
start_year_jhf	num	8	First year of data on JHF
start_quarter_jhf	num	8	First quarter of data on JHF
end_year_jhf	num	8	Last year of data on JHF
end_quarter_jhf	num	8	Last quarter of data on JHF
start_qtime_jhf	num	3	Last quarter of data on JHF (1985Q1=1) (See details in appendix)
end_qtime_jhf	num	3	Last quarter of data on JHF (1985Q1=1) (See details in appendix)