2.1. Employment History Files
2.1.1. Overview
The Employment History Files are a set of job-level tables in the LEHD data. The source of the jobs data is earnings records collected by state unemployment insurance programs, called UI wage record data. These consist of quarterly reports of earnings for all UI-covered employees, provided by employers to states and shared with Census via a cooperative partnership. More information on UI-covered employment and wages is available in previous sections of this document.
The EHF file family consists of four tables:
Employment History File (EHF_ZZ)
This table contains annual records with quarterly earnings for each covered job in a state. This is a simple file that is relatively easy to use for research analysis.
Scope: State
Key: PIK SEIN SEINUNIT YEAR
Job History File (JHF_ZZ)
This is a wide file containing the earnings arrayed in quarterly variables for each job. In addition to the earnings available in the EHF, the JHF contains imputed establishments for multi-unit employers and successor-predecessor information. The earnings arrays in the JHF are also useful for calculating worker flows. Job spells may be recorded on multiple records in this table as a result of identifier changes or imputation.
Scope: State
Key: PIK SEIN SPELL_U2W
Earnings Indicators (EHF_US_INDICATORS)
This file contains flags indicating whether or not a PIK had earnings reported in any state in the LEHD system. This is useful for identifying bias in research samples using a subset of state UI data.
Scope: National
Key: PIK YEAR
Earnings Availability (EHF_US_AVAILABILITY)
This file provides quarter ranges for which state data is available in the EHF and JHF tables. This is useful for identifying bias in research samples.
Scope: National
Key: STATE
Researchers are encouraged to request all four of the job-level files described for analysis. The most useful file for most researchers is the JHF. However, the EHF can be useful for analyses where establishment-level characteristics or the longitudinal nature of the job is less important than person-level earnings. The last indicators and availabilitiy tables are useful for situations where sample bias or left-censoring of job histories might be an issue.
2.1.2. User Guidance
Identifiers: Employers vs. Firms vs. Establishments
UI wage records are reported at the state tax identifier level (SEIN) for all states - except for Minnesota where they are reported at the establishment level (SEINUNIT). We typically refer to SEINs as employers in this document, but users should be aware they are fundamentally a tax entity and distinct from an establishment or a firm. While identifiers that correspond to the definition of a firm as used by the Longitudinal Business Database (LBD) are available in the ECF T26 file, job-level and employer-level files are generally linked at either the employer or establishment level.
For further details on the definition and usage of each identifier, see the ECF documentation.
For further details on our establishment imputation methodology, see sections below.
Counting Jobs and Constructing Longitudinal Job Histories on the JHF
On the JHF, job spells are arrayed wide by quarter using qtime indexing. There can be one or many records per PIK-SEIN, depending on firm structure and information availability.
Single establishment employer (SEIN)
There is one record per PIK-SEIN for these jobs. The establishment will be 00000.
Multi establishment employer (SEIN) with impute
The PIK-SEIN record will have ten imputes of the establishment attached to it.
In cases where a worker leaves an employer for more than a year or the establishment structure changes significantly (births/deaths), the JHF record may be broken into multiple records by quarter ranges.
The establishment impute is performed independently for each record on the JHF.
Multi establishment employer (SEIN) with direct reporting (MN only)
There will be one record for each PIK-SEIN-SEINUNIT when there is regular reporting of the establishment on the UI record.
Each PIK-SEIN-SEINUNIT record is considered a separate longitudinal job for purposes of QWI calculation.
In cases of irregular establishment reporting patterns on the wage records, SEINs can have jobs with directly reported establishments in some quarters and imputes in other quarters, or imputes in all quarters.
The SPELL_U2W variable is created as a simple counter within each PIK-SEIN to form a key that can uniquely identify each record on the JHF. Hence, the key on the table is PIK-SEIN-SPELL_U2W. However, a single JHF record may not be a complete longitudinal history of a job spell. In order to properly link the JHF records to construct the full job spell, the longitudinal job identifier (FID) is added to the JHF. This variable is created as follows:
A sequential counter is created for all JHF records for a single PIK.
All jobs in a set that is considered part of the same longitudinal history are assigned the lowest value of the counter within the set.
The FID is created by turning the counter into a character field prepending with the state numeric FIPS code to make it nationally unique within a PIK.
The PIK-FID should be considered the unit of analysis when constructing the longitudinal history of the job spell. The job can be considered active in any quarter that the PIK-FID has positive earnings on any JHF record. Note that earnings can be reported on multiple JHF records in the same quarter within a PIK-FID group, and these should be summed. For QWI calculation purposes, counts are weighted by the share of total earnings for the PIK-FID, and divided by 10 if multiple imputation is used. This method ensures that each active PIK-FID will be counted as one job in QWI measures.
Accounting for Successor-Predecessor Events
A job in the EHF is a PIK-SEIN pair with positive earnings in a quarter. However, a worker’s job spell can be broken up over multiple SEINs (i.e., be multiple rows in the JHF) if the employer SEIN changes during the spell. Employer tax identifiers change quite frequently in the LEHD data, usually due to firm restructuring. There are also limited instances where virtually all SEINs change within a state in a single quarter. If using the job-level data to identify job starts, separations, or job tenure, successor-predecessor (S-P) information to link jobs must be used to avoid bias arising from firm S-P events. S-P information must also be used to avoid double counting of jobs when earnings are provided by multiple related SEINs in the same quarter.
The SPF reports all significant flows of workers between SEINs. A subset of these flows have been identified as representing S-P events according to business rules described in Identifying Transitions in UI Data. Workers that traverse between two SEINs engaged in a S-P relationship should be considered as having a continuous job history across SEINs. In addition, if the worker received earnings from both SEINs in the same quarter, a common feature of an S-P event, the earnings should be combined, not counted as two separate jobs.
In the JHF, records in S-P relationships are linked by assigning the same FID, as described above. The mechanics of linking records follows:
All separations and accessions are identified from all records for a PIK within the JHF.
A separation is a quarter of positive earnings followed by a quarter of no earnings, and an accession is the reverse.
Separations and accessions can occur in the beginning, middle, or end of the quarter range on the JHF record.
A separation is linked with every accession in the same or following quarter to find candidates for linkage.
If a S-P relationship identified on the SPF coincides with a link found for a PIK, the JHF records are joined via the FID.
Because significant flows have been observed between two SEINs in quarters proximate to the officially recognized transition quarter, links at the PIK level will be made if they are found within several quarters before or after the transition quarter.
Some links are provided via internal clerical tables, particularly to handle large-scale SEIN changes. These may not be included in the SPF, but will be incorporated into the FID.
The linkage process works iteratively to link all related JHF records to the same FID.
Using the Worker-to-Establishment Imputes
The Census Bureau calculates establishment-level workforce statistics from the LEHD data by imputing establishments for jobs at multi-unit SEINs using multiple imputation. The 10 implicates from that imputation are available on the JHF for researchers. What follows is some general guidance on using the imputed values:
First establish if you need the imputed establishment for your analysis. Many researchers use employment-weighted SEIN-level employer characteristics from the ECF. In many cases this might be a preferred approach if you are not using all 10 implicates.
While we recommend using all 10 implicates, we are aware some researchers choose to use only one. If you choose to use only one of the imputes, we recommend you use the first implicate.
Worker-to-Establishment Methodology (U2W)
The full description of the worker-to-establishment imputation is available in Abowd et al. [2009], but we describe it briefly here. The model assigns workers to establishments using the relative size of the establishments and the distance between the worker’s residence and the establishment. In other words, the model favors imputing workers to the closest establishment, but must also allocate more workers to larger establishments. Imputed values are drawn 10 times to better capture the uncertainty in the imputation. Model parameters for the imputation are determined using the Minnesota data, which does have SEINUNIT-level jobs data.
To give a very simple example, suppose worker Z works at a SEIN with 3 establishments, A, B and C. If the model determines there is a 60 percent likelihood Z works at A and 20 percent chance Z works at B or C, the 10 implicates will be: A, A, A, A, A, A, B, B, C, C. Multiple imputation gives the user more information about the uncertainty of the imputation than a single impute.
Adding Establishment Characteristics
Employer characteristics to attach to jobs can be retrieved from one of the following sources:
ECF_SEIN table - Employment-weighted modal characteristics from establishment data
ECF_SEIN_T26 table - Firm level characteristics using Title 26 data (age/size)
ECF_SEINUNIT table - Establishment level characteristics
QWI_SEINUNIT table - Establishment level characteristics, with some longitudinal editing after linking with jobs data
Establishment-level characteristics include ownership, industry, and geography. Ownership will almost always be the same for all establishments in an SEIN, and cases where that is not true may suggest a data irregularity. There is significant establishment heterogeneity in substate georaphy, and less but still significant heterogeneity in industry. While SEIN-level characteristics may provide a first-order approximation, there will likely be mass points and redistributions in counts relative to establishment-level statistics.
One complication of using establishment characteristics, however, is that the establishment will not always be present in the ECF SEINUNIT or QWI SEINUNIT tables in all quarters that a job to which it has been imputed is active on the JHF. This is usually a function of the operation of the worker-to-establishment methodology. While establishment candidates are intended to be available for the whole of a worker’s job spell at an SEIN, there are allowances for exceptions to this rule. There are several possible fallback strategies to handle these cases:
Use the SEIN-level modal characteristics. This is typically the most straightforward approach, but it can result in mass points and sudden longitudinal changes to the assignment.
Perform a longitudinal edit to get characteristics from the establishment in the nearest quarter it exists. This will maintain longitudinal consistency in assigned characteristics, but it is possible the establishment did not exist and using that geography (or industry) could be incorrect.
Draw an establishment from the set of establishments that do exist within the SEIN in the time period. This approach would come closest to replicating the distribution of the characteristics of the firm in the quarters of interest, though there may be minimal information on the job level to condition the impute. It is also likely the most complicated to implement.
While no particular approach is necessarily preferred, it is recommended that researchers applying establishment-level characteristics be aware of this phenomenon and apply a consistent approach to handling it.
Imputed Wage Records
About 1.5 percent of jobs are missing in the UI wage record data, jobs not reported by the employer or reported after the records are sent to Census Bureau. In these cases, wage records are imputed. A simple explanation of this imputation follows:
A first stage identifies likely cases of nonreporting by identifying employers where the number of wage records reported in a quarter is less than the minimum expected given employment in the QCEW.
In the second stage workers with jobs at the firm but no earnings in the underreported quarter form the candidate pool for imputation. Workers are imputed to active status in that quarter conditioning on the worker’s empoloyment in the previous and next quarter using a posterior built from historical employment patterns. Generally speaking, workers with active jobs in the previous and following quarter have the highest probability of being imputed to active and receiving imputed earnings. Workers who are never observed working at the firm will never be imputed as working at the firm.
The Census Bureau currently fills only one-quarter reporting gaps, which is the most common type of reporting issue observed in the data. Employers with persistent reporting issues are a more difficult case for imputation.
Selecting a Random Subsample of Persons
The RANDOM_PIK_GROUP variable can be used to select a subsample of PIKs. This is extracted from the first two digits of the PIK, approximately uniformly distributed on [00, 99]. Note that “AA” is also a valid value, denoting individuals for whom no valid SSN was on file. Occurence of such “pseudo-PIKs” varies by state.
2.1.3. Codebook: The EHF Files
Table Metadata for Employment History File (EHF_ZZ)
State Approval Required | IRS Approval Required | SSA Approval Required | |
---|---|---|---|
Access Requirements | X |
- Description
Quarterly earnings for each job in a state as reported by the employer to the state’s UI system.
- Scope
State
- Key
PIK SEIN SEINUNIT YEAR
- Sort Order
PIK SEIN SEINUNIT YEAR
- File Format
SAS Data Table
- Download Codebook
Variable Information
Variable Name | Type | Length | Description |
---|---|---|---|
PIK | char | 9 | Protected Identification Key |
YEAR | num | 4 | Calendar year |
EARN_ANN | num | 8 | Annual earnings |
EARN1 | num | 5 | Qtr 1 earnings |
EARN2 | num | 5 | Qtr 2 earnings |
EARN3 | num | 5 | Qtr 3 earnings |
EARN4 | num | 5 | Qtr 4 earnings |
SEIN | char | 12 | State Employer Identification Number |
SEINUNIT | char | 5 | State UI reporting unit |
char | 2 | Geographic state of the job record. For UI, it is the FIPS code of the state being processed, for OPM, it is the duty state as reported on the OPMUI files (See details in appendix) |
|
char | 1 | Qtr 1 imputation flag (See details below) |
|
char | 1 | Qtr 2 imputation flag (See details below) |
|
char | 1 | Qtr 3 imputation flag (See details below) |
|
char | 1 | Qtr 4 imputation flag (See details below) |
Details for variable FLAG_IMPUTE1 on EHF_ZZ
- Description
Qtr 1 imputation flag
- Codebook
Value Label 0 Quarterly earnings as reported 1 Quarterly earnings imputed
Details for variable FLAG_IMPUTE2 on EHF_ZZ
- Description
Qtr 2 imputation flag
- Codebook
Value Label 0 Quarterly earnings as reported 1 Quarterly earnings imputed
Details for variable FLAG_IMPUTE3 on EHF_ZZ
- Description
Qtr 3 imputation flag
- Codebook
Value Label 0 Quarterly earnings as reported 1 Quarterly earnings imputed
Details for variable FLAG_IMPUTE4 on EHF_ZZ
- Description
Qtr 4 imputation flag
- Codebook
Value Label 0 Quarterly earnings as reported 1 Quarterly earnings imputed
2.1.4. Codebook: The JHF Files
Table Metadata for Job History File (JHF_ZZ)
State Approval Required | IRS Approval Required | SSA Approval Required | |
---|---|---|---|
Access Requirements | X |
- Description
Wide version of earnings history that contains imputed establishment (SEINUNITs) for multi-unit firms, and a job-level ID useful for tracking job spells across employer identifier changes.
- Scope
State
- Key
PIK SEIN SPELL_U2W
- Sort Order
PIK FID SEIN SPELL_U2W
- File Format
SAS Data Table
- Download Codebook
Variable Information
Variable Name | Type | Length | Description |
---|---|---|---|
PIK | char | 9 | Protected Identification Key |
SEIN | char | 12 | State Employer Identification Number |
SPELL_U2W | num | 3 | |
num | 8 | Earnings (See details below) |
|
FID | num | 5 | Within-PIK linked job spell identifier |
SEINUNIT1 | char | 5 | State UI Reporting Unit Number (Impute 1) |
SEINUNIT2 | char | 5 | State UI Reporting Unit Number (Impute 2) |
SEINUNIT3 | char | 5 | State UI Reporting Unit Number (Impute 3) |
SEINUNIT4 | char | 5 | State UI Reporting Unit Number (Impute 4) |
SEINUNIT5 | char | 5 | State UI Reporting Unit Number (Impute 5) |
SEINUNIT6 | char | 5 | State UI Reporting Unit Number (Impute 6) |
SEINUNIT7 | char | 5 | State UI Reporting Unit Number (Impute 7) |
SEINUNIT8 | char | 5 | State UI Reporting Unit Number (Impute 8) |
SEINUNIT9 | char | 5 | State UI Reporting Unit Number (Impute 9) |
SEINUNIT10 | char | 5 | State UI Reporting Unit Number (Impute 10) |
FIRST_ACC | num | 3 | First accession (First quarter of employment at this job spell) |
LAST_SEP | num | 3 | Last separation (Last quarter of employment at this job spell) |
num | 3 | ||
char | 2 | ||
RANDOM_PIK_GROUP | char | 2 | Selector based on random pik |
Details for variables E21-E153 on JHF_ZZ
- Description
Earnings
- Source
UI Records
- Valid Values
Positive integer values.
- Notes
The set of quarterly variables available in each state corresponds to the set of quarters indexed by qtime for which earnings data are available for that state.
Details for variable FLAG_SEINUNIT_IMPUTED on JHF_ZZ
- Codebook
Value Label 0 Establishment assignment not imputed 1 Establishment assignment imputed
2.1.5. Codebook: The EHF_US_INDICATORS File
Table Metadata for Earnings Indicators (EHF_US_INDICATORS)
State Approval Required | IRS Approval Required | SSA Approval Required | |
---|---|---|---|
Access Requirements |
- Description
Indicates whether or not a PIK had any earnings reported in any state in the LEHD system.
- Scope
National
- Key
PIK YEAR
- File Format
SAS Data Table
Variable Information
Variable Name | Type | Length | Description |
---|---|---|---|
PIK | char | 9 | Protected Identification Key |
YEAR | num | 3 | Calendar year |
NUM_STATES_EARN1 | num | 3 | Number of states reporting earnings for PIK in Q1 |
NUM_STATES_EARN2 | num | 3 | Number of states reporting earnings for PIK in Q2 |
NUM_STATES_EARN3 | num | 3 | Number of states reporting earnings for PIK in Q3 |
NUM_STATES_EARN4 | num | 3 | Number of states reporting earnings for PIK in Q4 |
2.1.6. Codebook: The EHF_ALL_AVAILABILITY File
Table Metadata for Earnings Availability (EHF_ALL_AVAILABILITY)
State Approval Required | IRS Approval Required | SSA Approval Required | |
---|---|---|---|
Access Requirements |
- Description
Indicates data availability ranges for all states.
- Scope
National
- Key
State
- File Format
SAS Data Table
- Download Codebook
Variable Information
Variable Name | Type | Length | Description |
---|---|---|---|
char | 2 | State postal code (See details in appendix) |
|
START_YEAR | num | 8 | First year of data on EHF |
START_QUARTER | num | 8 | First quarter of data on EHF |
END_YEAR | num | 8 | Last year of data on EHF |
END_QUARTER | num | 8 | Last quarter of data on EHF |
START_YEAR_JHF | num | 8 | First year of data on JHF |
START_QUARTER_JHF | num | 8 | First quarter of data on JHF |
END_YEAR_JHF | num | 8 | Last year of data on JHF |
END_QUARTER_JHF | num | 8 | Last quarter of data on JHF |
num | 3 | Last quarter of data on JHF (1985Q1=1) (See details in appendix) |
|
num | 3 | Last quarter of data on JHF (1985Q1=1) (See details in appendix) |