= LEHD Public Use Data Schema V4.1c-draft Lars Vilhuber 13 April 2016 // a2x: --dblatex-opts "-P latex.output.revhistory=0 --param toc.section.depth=3" ( link:lehd_public_use_schema.pdf[Printable version] ) [IMPORTANT] .Important ============================================== This specification is draft. Feedback is welcome. Please write us at link:mailto:ces.qwi.feedback@census.gov?subject=LEHD_Schema_draft[ces.qwi.feedback@census.gov]. ============================================== The public-use data from the Longitudinal Employer-Household Dynamics Program, including the Quarterly Workforce Indicators (QWI) and Job-to-Job Flows (J2J), are available for download with the following data schema. These data are available as Comma-Separated Value (CSV) files through the LEHD website’s Data page at http://lehd.ces.census.gov/data/ . This document describes the data schema for LEHD files. For each variable, a set of allowable values is defined. Definitions are provided as CSV files, with header variable definitions. The naming conventions of the data files is documented in link:lehd_csv_naming.html[]. Changes relative to the original v4.0 version are listed <>. Basic Schema ------------ Each file is structured as a CSV file. The first columns contain <>, subsequent columns contain <>, followed by <>. === Generic structure [width="30%",format="csv",cols="<2",options="header"] |=================================================== Column name [ Identifier1 ] [ Identifier2 ] [ Identifier3 ] [ ... ] [ Indicator 1 ] [ Indicator 2 ] [ Indicator 3 ] [ ... ] [ Status Flag 1 ] [ Status Flag 2 ] [ Status Flag 3 ] [ ... ] |=================================================== Note: A full list of indicators for each type of file are shown below in the <> section. While all indicators are included in the CSV files, only the requested indicators will be included in data outputs from the LED Extraction Tool. <<< === [[identifiers]]Identifiers Records, unless otherwise noted, are parts of time-series data. Unique record identifiers are noted below, by file type. Identifiers without the year and quarter component can be considered a series identifier. ==== Mapping for Identifiers ( link:lehd_mapping_identifiers.csv[] ) Each of the released files has a set of variables uniquely identifying records ('Identifiers'). The table below relates the set of identifier specifications to the released files. The actual CSV files containing the identifiers for each set are listed after this table. Each identifier can take on a specified list of values, documented in the section on <>. [width="80%",format="csv",cols="<3,6*^1",options="header"] |=================================================== include::lehd_mapping_identifiers.csv[] |=================================================== <<< ==== Identifiers for j2j ( link:lehd_identifiers_j2j.csv[] ) [width="100%",format="csv",cols="2*^1,<3",options="header"] |=================================================== include::lehd_identifiers_j2j.csv[] |=================================================== <<< ==== Identifiers for j2jod ( link:lehd_identifiers_j2jod.csv[] ) [width="100%",format="csv",cols="2*^1,<3",options="header"] |=================================================== include::lehd_identifiers_j2jod.csv[] |=================================================== <<< ==== Identifiers for qwi ( link:lehd_identifiers_qwi.csv[] ) [width="100%",format="csv",cols="2*^1,<3",options="header"] |=================================================== include::lehd_identifiers_qwi.csv[] |=================================================== <<< <<< === [[indicators]]Indicators The following tables and associated mapping files list the indicators available on each file. The ''Indicator Variable'' is the short name of the variable on the CSV files, suitable for machine processing in a wide variety of statistical applications. When given, the ''Alternate name'' may appear in related documentation and articles. The ''Status Flag'' is used to indicate publication or data quality status (see <>). The ''Indicator Name'' is a more verbose description of the indicator. ==== National QWI and state-level QWI (QWIPU) ==== ( link:variables_qwi.csv[variables_qwi.csv] ) [width="95%",format="csv",cols="3*^2,<5",options="header"] |=================================================== include::variables_qwi.csv[] |=================================================== <<< ==== National QWI and state-level QWI rates (QWIPUR) ==== ( link:variables_qwir.csv[variables_qwir.csv] ) [width="95%",format="csv",cols="3*^2,<5,<2",options="header"] |=================================================== include::variables_qwir.csv[] |=================================================== where the column *Base* indicates the denominator used to compute the rate, with *AvgEmp = (Emp + EmpEnd)/2* and *AvgEmpS = (EmpSpv + EmpS)/2*. <<< ==== Job-to-job flow counts (J2J) ( link:variables_j2j.csv[] ) [width="95%",format="csv",cols="3*^2,<5",options="header"] |=================================================== include::variables_j2j.csv[] |=================================================== <<< ==== Job-to-job flow rates (J2JR) ( link:variables_j2jr.csv[] ) [width="95%",format="csv",cols="3*^2,<5,<2",options="header"] |=================================================== include::variables_j2jr.csv[] |=================================================== where the column *Base* indicates the denominator used to compute the rate. <<< ==== Job-to-job flow Origin-Destination (J2JOD) ( link:variables_j2jod.csv[] ) [width="95%",format="csv",cols="^3,^2,^3,<5",options="header"] |=================================================== include::variables_j2jod.csv[] |=================================================== <<< <<< <<< === [[vmeasures]]Variability measures The following tables and associated mapping files list the variability measures available on each file. The ''Variability Measure'' is the short name of the variable on the CSV files, suitable for machine processing in a wide variety of statistical applications. When given, the ''Alternate Name'' may appear in related documentation and articles. The ''Variable Name'' is a more verbose description of the variability measure. Six variability measures are published: * Total variability, prefixed by vt_ * Standard error, prefixed by st_, and computed as the square root of Total Variability * Between-implicate variability, prefixed by vb_ * Average within-implicate variability, prefixed by vw_ * Degrees of freedom, prefixed by df_ * Missingness ratio, prefixed by mr_ A missing variability measure indicates a structural zero in the corresponding indicator. This is currently not associated with a flag. //Not all indicators have associated variability measures. For more details, see the following document TBD. ==== Generic structure [width="30%",format="csv",cols="<2",options="header"] |=================================================== Column name [ Identifier1 ] [ Identifier2 ] [ Identifier3 ] [ ... ] [ Standard error for Indicator 1 ] [ Standard error for Indicator 2 ] [ Standard error for Indicator 3 ] [ ... ] [ Total variation for Indicator 1 ] [ Total variation for Indicator 2 ] [ Total variation for Indicator 3 ] [ ... ] [ Between-implicate variability for Indicator 1 ] [ Between-implicate variability for Indicator 2 ] [ Between-implicate variability for Indicator 3 ] [ ... ] [ Average within-implicate variability for Indicator 1 ] [ Average within-implicate variability for Indicator 2 ] [ Average within-implicate variability for Indicator 3 ] [ ... ] [ Degrees of freedom for Indicator 1 ] [ Degrees of freedom for Indicator 2 ] [ Degrees of freedom for Indicator 3 ] [ ... ] [ Missingness ratio for Indicator 1 ] [ Missingness ratio for Indicator 2 ] [ Missingness ratio for Indicator 3 ] [ ... ] |=================================================== Note: A full list of indicators for each type of file are shown in the <> section. In the tables below, only a sample of variability measures are printed, but the complete list is available in the linked CSV schema files. <<< ==== National QWI and state-level QWI ==== ( link:variables_qwiv.csv[variables_qwiv.csv] ) [width="95%",format="csv",cols="2*^2,<5",options="header"] |=================================================== include::tmp_variables_qwiv.csv[] |=================================================== <<< ==== National QWI and state-level QWI rates ==== ( link:variables_qwirv.csv[variables_qwirv.csv] ) [width="95%",format="csv",cols="2*^2,<5",options="header"] |=================================================== include::tmp_variables_qwirv.csv[] |=================================================== <<< ==== Job-to-job flow counts (J2J) Soon. //( link:variables_j2j.csv[] ) //[width="95%",format="csv",cols="3*^2,<5",options="header"] //|=================================================== //include::tmp_variables_j2jv.csv[] //|=================================================== //<<< // ==== Job-to-job flow rates (J2JR) Soon. //( link:variables_j2jr.csv[] ) //[width="95%",format="csv",cols="3*^2,<5",options="header"] //|=================================================== //include::tmp_variables_j2jrv.csv[] //|=================================================== //<<< ==== Job-to-job flow Origin-Destination (J2JOD) Soon. //( link:variables_j2jod.csv[] ) //[width="95%",format="csv",cols="^3,^2,^3,<5",options="header"] //|=================================================== //include::tmp_variables_j2jodv.csv[] //|=================================================== <<< == [[catvars]]Categorical Variables Categorical variable descriptions are displayed above each table, with the variable name shown in parentheses. Unless otherwise stated, every possible value/label combination for each categorical variable is listed. Please note that not all values will be available in every table. === agegrp ( link:label_agegrp.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_agegrp.csv[] |=================================================== === education ( link:label_education.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_education.csv[] |=================================================== === ethnicity ( link:label_ethnicity.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_ethnicity.csv[] |=================================================== === firmage ( link:label_firmage.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_firmage.csv[] |=================================================== === firmsize ( link:label_firmsize.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_firmsize.csv[] |=================================================== === ownercode ( link:label_ownercode.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_ownercode.csv[] |=================================================== === periodicity ( link:label_periodicity.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_periodicity.csv[] |=================================================== === quarter ( link:label_quarter.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_quarter.csv[] |=================================================== === race ( link:label_race.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_race.csv[] |=================================================== === seasonadj ( link:label_seasonadj.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_seasonadj.csv[] |=================================================== === sex ( link:label_sex.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_sex.csv[] |=================================================== <<< === Industry === [[ind_level]] ==== Industry levels ( link:label_ind_level.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_ind_level.csv[] |=================================================== ==== Industry ( link:label_industry.csv[] ) Only a small subset of available values shown. The 2007 NAICS (North American Industry Classification System) is used for all years. For a full listing of all valid NAICS codes, see http://www.census.gov/eos/www/naics/. [width="90%",format="csv",cols="^1,<4",options="header"] |=================================================== include::tmp2.csv[] |=================================================== <<< === Geography === [[geo_level]] ==== Geographic levels ( link:label_geo_level.csv[] ) [width="40%",format="csv",cols="^1,<3",options="header"] |=================================================== include::label_geo_level.csv[] |=================================================== Geography labels are provided in separate files by state. Note that cross-state CBSA will have state-specific parts, and thus will appear in multiple files. A separate link:label_fipsnum.csv[label_fipsnum.csv] contains values and labels for all entities of geo_level 'n' or 's', and is a summary of separately available files. ==== National and state-level values ==== ( link:label_fipsnum.csv[] ) [width="40%",format="csv",cols="^1,<3",options="header"] |=================================================== include::tmp.csv[] |=================================================== ==== Detailed state and substate level values For a full listing of all valid geography codes (except for WIA codes), see http://www.census.gov/geo/maps-data/data/tiger.html. Note about geography codes: Four types of geography codes are represented with this field. Each geography has its own code structure. - State is the 2-digit http://quickfacts.census.gov/qfd/meta/long_fips.htm[FIPS] code. 'us' stands for national scope. - County is the 5-digit FIPS code. - Metropolitan/Micropolitan codes are constructed from the 2-digit state FIPS code and the 5-digit http://www.census.gov/population/metro/[CBSA] code provided by the Census Bureau’s Geography Division. ** In the QWI, the metropolitan/micropolitan areas are the state parts of the full CBSA areas. ** In J2J, tabulations are based on the complete metropolitan/micropolitan area. - The WIA code is constructed from the 2-digit state FIPS code and the 6-digit WIA identifier provided by LED State Partners. The 2014 vintage of Census TIGER geography is used for all tabulations as of the R2014Q3 release. For convenience, a composite file containing all geocodes is available as link:label_geography_all.csv[]. [format="csv",width="50%",cols="^1,^3",options="header"] |=================================================== State,Format file US,link:label_geography_us.csv[] AK,link:label_geography_ak.csv[] AL,link:label_geography_al.csv[] AR,link:label_geography_ar.csv[] AZ,link:label_geography_az.csv[] CA,link:label_geography_ca.csv[] CO,link:label_geography_co.csv[] CT,link:label_geography_ct.csv[] DC,link:label_geography_dc.csv[] DE,link:label_geography_de.csv[] FL,link:label_geography_fl.csv[] GA,link:label_geography_ga.csv[] HI,link:label_geography_hi.csv[] IA,link:label_geography_ia.csv[] ID,link:label_geography_id.csv[] IL,link:label_geography_il.csv[] IN,link:label_geography_in.csv[] KS,link:label_geography_ks.csv[] KY,link:label_geography_ky.csv[] LA,link:label_geography_la.csv[] MA,link:label_geography_ma.csv[] MD,link:label_geography_md.csv[] ME,link:label_geography_me.csv[] MI,link:label_geography_mi.csv[] MN,link:label_geography_mn.csv[] MO,link:label_geography_mo.csv[] MS,link:label_geography_ms.csv[] MT,link:label_geography_mt.csv[] NC,link:label_geography_nc.csv[] ND,link:label_geography_nd.csv[] NE,link:label_geography_ne.csv[] NH,link:label_geography_nh.csv[] NJ,link:label_geography_nj.csv[] NM,link:label_geography_nm.csv[] NV,link:label_geography_nv.csv[] NY,link:label_geography_ny.csv[] OH,link:label_geography_oh.csv[] OK,link:label_geography_ok.csv[] OR,link:label_geography_or.csv[] PA,link:label_geography_pa.csv[] RI,link:label_geography_ri.csv[] SC,link:label_geography_sc.csv[] SD,link:label_geography_sd.csv[] TN,link:label_geography_tn.csv[] TX,link:label_geography_tx.csv[] UT,link:label_geography_ut.csv[] VA,link:label_geography_va.csv[] VT,link:label_geography_vt.csv[] WA,link:label_geography_wa.csv[] WI,link:label_geography_wi.csv[] WV,link:label_geography_wv.csv[] WY,link:label_geography_wy.csv[] |=================================================== <<< === Aggregation level ( link:label_agg_level.csv[] ) Measures within the J2J and QWI data products are tabulated on many different dimensions, including demographic characteristics, geography, industry, and other firm characteristics. For Origin-Destination (O-D) tables, characteristics of the origin and destination firm can be tabulated separately. Every tabulation level is assigned a unique aggregation index, represented by the agg_level variable. This index starts from 1, representing a national level grand total (all industries, workers, etc.), and progresses through different combinations of characteristics. There are gaps in the progression to leave space for aggregation levels that may be included in future data releases. *agg_level* is currently reported only for J2J data products. The following variables are included in the link:label_agg_level.csv[agg_level.csv] file: - agg_level - index representing level of aggregation reported on a given record. - worker_char - demographic (worker) characteristics reported on record. - firm_char - firm characteristics reported on record. In origin-destination tabulations, these will be the characteristics of the destination firm. - firm_orig_char - characteristics of origin firm reported on record (O-D tabulations, only) - j2j - Flag: Aggregation level available on J2J counts tables - j2jr - Flag: Aggregation level available on J2J rates tables - j2jod - Flag: Aggregation level available on J2J O-D tables. - qwi - Flag: Aggregation level available on QWI (placeholder for future integration) The characteristics available on an aggregation level are repeated using a series of flags following the standard schema: - <> - geographic level of table, as per 2.13.1. - <> - industry level of table, as per 2.12.1. - by_ variables - flags indicating other dimensions reported, including ownership, demographics, firm age and size. These flags will be expanded to include origin characteristics in a later release. A shortened representation of the file is provided below, the complete file is available in the link above. [width="90%",format="csv",cols=">1,3*<2,5*<1",options="header"] |=================================================== include::tmp_label_agg_level.csv[] |=================================================== <<< == [[statusflags]]Status flags ( link:label_flags.csv[] ) Each status flag in the tables above contains one of the following valid values. The values and their interpretation are listed in the table below. [IMPORTANT] .Important ============================================== Note: Currently, the J2J tables only contain status flags '-1' and '1.' Status flags with values 10 or above only appear in online applications, not in CSV files. ============================================== [width="80%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_flags.csv[] |=================================================== <<< == [[changes]] Changes === This version from previous releases of this document - 2015-02-25: corrected flag values - 2015-02-25: documents are now identified by date, not revision - 2015-03-10: Correction of the TIGER vintage that is used for geographic references - 2015-03-11: Added URL for J2J - 2015-03-11: Correction of typo in type naming convention, rename of naming_fipsalpha.csv to naming_geohi.csv to be consistent. - 2015-03-17: Changing of naming convention for to-be-released files based on federal government from fg -> of. At this time, no such files have been released. - 2015-04-24: Changes to alternate name of SepSnx and EmpSpv, tentative rate names - 2015-04-26: Changes to name of variable schema files (qwipu -> qwi), addition of variability variable schema files. - 2015-04-28: Fixed small typos in QWI variable short names - 2015-05-18: Updated agg_level description, replaced agg_level.csv file - 2015-05-22: Fixed minor rendering bug for QWI rate variability names. No change to actual metadata. - 2015-06-09: Fixed a minor coding error in label_fipsnum.csv, added a concatenation of geography files as label_geography_all.csv. - 2015-08-07: Minor text change for agg_level, modified agg_level file. - 2015-08-12: Removed the last 4 rows of variables_j2jod.csv, since they are not on the current beta J2JOD files. - 2015-08-25: Added a extension component [ext] to the file naming convention to reflect availability of Excel files (and PDF files) - 2016-03-16: Removed extraneous empty lines - 2016-04-12: Fixed typo in variables_qwi.csv (FrmJbLsS, EarnHirNS and status variables) - 2016-04-13: Fixed typo in variables_qwi.csv (HirAS, HirNS, and status variables) - 2016-04-13: Fixed typo in variables_qwi*v.csv (HirAS, HirNS) === Version 4.1c-draft from 4.0 - added J2J, National QWI spec - added geohi category of ALL, US - added definitions of variability measures - added definitions of rates on separate file - added naming convention for additional files - added agg_level variable <<< ******************* This revision: Wed Apr 13 08:35:56 EDT 2016 *******************