= LEHD Public Use Data Schema for J2J Explorer (beta) V4.2-rc1 Lars Vilhuber 21 December 2017 // a2x: --dblatex-opts "-P latex.output.revhistory=0 --param toc.section.depth=3" :ext-relative: {outfilesuffix} ( link:lehd_j2jexplorer_schema.pdf[Printable version] ) [IMPORTANT] .Important ============================================== This document is not an official Census Bureau publication. It is compiled from publicly accessible information by Lars Vilhuber (http://www.ilr.cornell.edu/ldi/[Labor Dynamics Institute, Cornell University]). Feedback is welcome. Please write us at link:mailto:lars.vilhuber@cornell.edu?subject=LEHD_Schema_v4[lars.vilhuber@cornell.edu]. ============================================== Purpose ------- The public-use Job-to-Job Flows (J2J) data provided by the Longitudinal Employer-Household Dynamics Program are accessible through the https://j2jexplorer.ces.census.gov/[J2J Explorer (beta)]. This document provides information on the schema used to format files downloaded through that application. Additional information ---------------------- The complete LEHD schema is documented in link:lehd_public_use_schema{ext-relative}[]. LEHD-provided SHP files are separately described in link:lehd_shapefiles{ext-relative}[]. The naming conventions of the data files is documented in link:lehd_csv_naming{ext-relative}[]. Extends ------- This is the first version of the schema for the J2J Explorer (beta) application. Supersedes ---------- No prior version. Basic Schema ------------ Each data file is structured as a CSV file. The first columns contain <>, subsequent columns contain <>, followed by <>. === Generic structure [width="30%",format="csv",cols="<2",options="header"] |=================================================== Column name [ Identifier1 ] [ Identifier2 ] [ Identifier3 ] [ ... ] [ Indicator 1 ] [ Indicator 2 ] [ Indicator 3 ] [ ... ] [ Status Flag 1 ] [ Status Flag 2 ] [ Status Flag 3 ] [ ... ] |=================================================== Note: The J2J Explorer (beta) provides the full set of J2J indicators in addition to two composite Origin-Destination indicators. Files downloadable through other means may be structured differently, please consult the complete LEHD schema in link:lehd_public_use_schema{ext-relative}[]. <<< === [[identifiers]]Identifiers Records, unless otherwise noted, are parts of time-series data. Unique record identifiers are noted below, by file type. Identifiers without the year and quarter component can be considered a series identifier. ==== Identifiers for j2j ( link:lehd_identifiers_j2j.csv[] ) [width="100%",format="csv",cols="2*^1,<3",options="header"] |=================================================== include::lehd_identifiers_j2j.csv[] |=================================================== <<< ==== Identifiers for j2jod ( link:lehd_identifiers_j2jod.csv[] ) [width="100%",format="csv",cols="2*^1,<3",options="header"] |=================================================== include::lehd_identifiers_j2jod.csv[] |=================================================== <<< <<< === [[indicators]]Indicators The following tables and associated mapping files list the indicators available on each file. The ''Indicator Variable'' is the short name of the variable on the CSV files, suitable for machine processing in a wide variety of statistical applications. When given, the ''Alternate name'' may appear in related documentation and articles. The ''Status Flag'' is used to indicate publication or data quality status (see <>). The ''Indicator Name'' is a more verbose name for the indicator. The ''Description'' provides a complete description of the indicator. ''Units'' identify the type of variable: counts, rates, monetary amounts. ''Concept'' classifies each indicator in a descriptive category: employment, hire, separation, earnings, or flow. The ''Base'' indicates the denominator used to compute the statistic, and may be '1'. ==== Job-to-job flow counts (J2J) ( link:variables_j2j.csv[] ) [width="95%",format="csv",cols="3*^2,<5,<5,<2,<2,^1",options="header"] |=================================================== include::variables_j2j.csv[] |=================================================== <<< ==== Job-to-job flow rates (J2JR) ( link:variables_j2jr.csv[] ) Rates are computed from published data, and are provided as a convenience. [width="95%",format="csv",cols="3*^2,<5,<5,<2,<2,^1",options="header"] |=================================================== include::variables_j2jr.csv[] |=================================================== <<< ==== Job-to-job flow Origin-Destination (J2JOD) ( link:variables_j2jod.csv[] ) [width="95%",format="csv",cols="3*^2,<5,<5,<2,<2,^1",options="header"] |=================================================== include::variables_j2jod.csv[] |=================================================== <<< ==== Job-to-job flow computed by the app (J2JAPP) ( link:variables_j2japp.csv[] ) [width="95%",format="csv",cols="3*^2,<5,<5,<2,<2,^1",options="header"] |=================================================== include::variables_j2japp.csv[] |=================================================== <<< == [[catvars]]Categorical Variables Categorical variable descriptions are displayed above each table, with the variable name shown in parentheses. Unless otherwise stated, every possible value/label combination for each categorical variable is listed. Please note that not all values will be available in every table. === agegrp ( link:label_agegrp.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_agegrp.csv[] |=================================================== === concept_draft ( link:label_concept_draft.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_concept_draft.csv[] |=================================================== === education ( link:label_education.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_education.csv[] |=================================================== === ethnicity ( link:label_ethnicity.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_ethnicity.csv[] |=================================================== === firmage ( link:label_firmage.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_firmage.csv[] |=================================================== === firmsize ( link:label_firmsize.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_firmsize.csv[] |=================================================== === ownercode ( link:label_ownercode.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_ownercode.csv[] |=================================================== === periodicity ( link:label_periodicity.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_periodicity.csv[] |=================================================== === quarter ( link:label_quarter.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_quarter.csv[] |=================================================== === race ( link:label_race.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_race.csv[] |=================================================== === seasonadj ( link:label_seasonadj.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_seasonadj.csv[] |=================================================== === sex ( link:label_sex.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_sex.csv[] |=================================================== === stusps ( link:label_stusps.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_stusps.csv[] |=================================================== <<< === Industry === [[ind_level]] ==== Industry levels ( link:label_ind_level.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_ind_level.csv[] |=================================================== ==== Industry ( link:label_industry.csv[] ) Only a small subset of available values shown. The 2017 NAICS (North American Industry Classification System) is used for all years. QWI releases prior to R2018Q1 used the 2012 NAICS classification (see link:../V4.1.3[Schema v4.1.3]). For a full listing of all valid 2017 NAICS codes, see http://www.census.gov/cgi-bin/sssd/naics/naicsrch?chart=2017. [width="90%",format="csv",cols="^1,<5,^1",options="header"] |=================================================== include::tmp2.csv[] |=================================================== <<< === [[geography]]Geography === [[geo_level]] ==== [[geolevel]] Geographic levels Geography labels for data files are provided in separate files, by scope. Each file 'label_geograpy_SCOPE.csv' may contain one or more types of records as flagged by <>. For convenience, a composite file containing all geocodes is available as link:label_geography.csv[]. The 2017 vintage of https://www.census.gov/geo/maps-data/data/tiger-line.html[Census TIGER/Line geography] is used for all tabulations as of the R2018Q1 release. Shapefiles are described in a link:lehd_shapefiles{ext-relative}[separate document]. ( link:label_geo_level.csv[] ) [width="80%",format="csv",cols="^1,<3,<8,<8",options="header"] |=================================================== include::label_geo_level.csv[] |=================================================== ==== National and state-level values ==== ( link:label_fipsnum.csv[] ) The file link:label_fipsnum.csv[label_fipsnum.csv] contains values and labels for all entities of <> 'N' or 'S', and is a summary of separately available files. [width="40%",format="csv",cols="^1,<3,^1",options="header"] |=================================================== include::tmp.csv[] |=================================================== ==== [[stusps]]State postal codes Some parts of the schema use (lower or upper-case) state postal codes. ( link:label_stusps.csv[] ) [width="60%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_stusps.csv[] |=================================================== ==== Detailed state and substate level values Note: cross-state CBSA, in records of type <> = M, are present on files of type 'label_geography_XX.csv'. A particular cross-state CBSA will appear on multiple files. [format="csv",width="50%",cols="^1,^3",options="header"] |=================================================== Scope,Format file US,link:label_geography_us.csv[] METRO,link:label_geography_metro.csv[] *States*, AK,link:label_geography_ak.csv[] AL,link:label_geography_al.csv[] AR,link:label_geography_ar.csv[] AZ,link:label_geography_az.csv[] CA,link:label_geography_ca.csv[] CO,link:label_geography_co.csv[] CT,link:label_geography_ct.csv[] DC,link:label_geography_dc.csv[] DE,link:label_geography_de.csv[] FL,link:label_geography_fl.csv[] GA,link:label_geography_ga.csv[] HI,link:label_geography_hi.csv[] IA,link:label_geography_ia.csv[] ID,link:label_geography_id.csv[] IL,link:label_geography_il.csv[] IN,link:label_geography_in.csv[] KS,link:label_geography_ks.csv[] KY,link:label_geography_ky.csv[] LA,link:label_geography_la.csv[] MA,link:label_geography_ma.csv[] MD,link:label_geography_md.csv[] ME,link:label_geography_me.csv[] MI,link:label_geography_mi.csv[] MN,link:label_geography_mn.csv[] MO,link:label_geography_mo.csv[] MS,link:label_geography_ms.csv[] MT,link:label_geography_mt.csv[] NC,link:label_geography_nc.csv[] ND,link:label_geography_nd.csv[] NE,link:label_geography_ne.csv[] NH,link:label_geography_nh.csv[] NJ,link:label_geography_nj.csv[] NM,link:label_geography_nm.csv[] NV,link:label_geography_nv.csv[] NY,link:label_geography_ny.csv[] OH,link:label_geography_oh.csv[] OK,link:label_geography_ok.csv[] OR,link:label_geography_or.csv[] PA,link:label_geography_pa.csv[] RI,link:label_geography_ri.csv[] SC,link:label_geography_sc.csv[] SD,link:label_geography_sd.csv[] TN,link:label_geography_tn.csv[] TX,link:label_geography_tx.csv[] UT,link:label_geography_ut.csv[] VA,link:label_geography_va.csv[] VT,link:label_geography_vt.csv[] WA,link:label_geography_wa.csv[] WI,link:label_geography_wi.csv[] WV,link:label_geography_wv.csv[] WY,link:label_geography_wy.csv[] |=================================================== <<< === Aggregation level ( link:label_agg_level.csv[] ) Measures within the J2J and QWI data products are tabulated on many different dimensions, including demographic characteristics, geography, industry, and other firm characteristics. For Origin-Destination (O-D) tables, characteristics of the origin and destination firm can be tabulated separately. Every tabulation level is assigned a unique aggregation index, represented by the agg_level variable. This index starts from 1, representing a national level grand total (all industries, workers, etc.), and progresses through different combinations of characteristics. There are gaps in the progression to leave space for aggregation levels that may be included in future data releases. *agg_level* is currently reported only for J2J data products. The following variables are included in the link:label_agg_level.csv[label_agg_level.csv] file: [width="60%",format="csv",cols="<2,<5",options="header"] |=================================================== include::variables_agg_level.csv[] |=================================================== The characteristics available on an aggregation level are repeated using a series of flags following the standard schema: - <> - geographic level of table - <> - industry level of table - by_ variables - flags indicating other dimensions reported, including ownership, demographics, firm age and size. A shortened representation of the file is provided below, the complete file is available in the link above. [width="90%",format="csv",cols=">1,3*<2,5*<1",options="header"] |=================================================== include::tmp_label_agg_level.csv[] |=================================================== <<< == [[statusflags]]Status flags ( link:label_flags.csv[] ) Each status flag in the tables above contains one of the following valid values. The values and their interpretation are listed in the table below. [IMPORTANT] .Important ============================================== Note: Currently, the J2J tables only contain status flags '-1', '1', '5'. Status flags with values 10 or above only appear in online applications, not in CSV files. ============================================== [width="80%",format="csv",cols="^1,<4",options="header"] |=================================================== include::label_flags.csv[] |=================================================== <<< <<< == [[metadata]]Metadata ( link:variables_version.csv[] ) === [[metadataqwij2j]]Version Metadata for J2J Files (version.txt) Each data release is accompanied by one or more files with metadata on geographic and temporal coverage, in a compact notation. These files follow the following naming convention: -------------------------------- version_[type].txt -------------------------------- where each component is described in more detail in link:lehd_csv_naming{ext-relative}[]. The contents contains the following elements: [width="90%",format="csv",cols="<1,<2,<5",options="header"] |=================================================== include::tmp_variables_version.csv[] |=================================================== For instance, the metadata for the R2017Q3 release of Missouri J2J tabulations (obtained from https://lehd.ces.census.gov/data/j2j/R2017Q3/j2j/mo//version_j2j.txt[here]) has the following content: -------------------------------- J2J MO 29 2000:2-2016:3 V4.2b-draft R2017Q3 j2jpu_mo_20171023_1412 -------------------------------- Some J2J metadata may contain multiple lines, as necessary. === [[metadataj2jod]]Additional Metadata for J2JOD Files (avail.csv) (link:variables_avail.csv[]) Because the origin-destination (J2JOD) data link two regions, we provide an auxiliary file with the time range that cells containing data for each geographic pairing may appear in a data release. [width="80%",format="csv",cols="<2,<2,<4",options="header"] |=================================================== include::variables_avail.csv[] |=================================================== The reference region will always be either the origin or the destination. National tabulations contain records where both origin and destination are <>=N; state tabulations contain records where <> in (N,S); metro tabulations contain records where <> in (N,S,B). Data may be suppressed for certain combinations of regions and quarters because the estimates do not meet Census Bureau publication standards. === [[metadatalags]]Metadata on Indicator Availability (link:variables_lags.csv[]) Each <> potentially requires leads and/or lags of data to be computed, and thus may not be available for certain time periods. The date range for J2J and J2JR can be found in <>; the date range for J2JOD can be found in <>. For each indicator, the following files contain the quarters of data required to be available relative to the overall date range described in the metadata for the release: * link:lags_j2j.csv[] * link:lags_j2japp.csv[] The files are structured as follows: [width="80%",format="csv",cols="<2,<2,<4",options="header"] |=================================================== include::variables_lags.csv[] |=================================================== <<< ******************* This revision: Thu Dec 21 14:47:59 EST 2017 *******************