= LEHD Public Use CSV Naming Schema V4.2.0 - CSV Naming Convention
Lars Vilhuber <lars.vilhuber@cornell.edu>
17 April 2018
// a2x: --dblatex-opts "-P latex.output.revhistory=0 --param toc.section.depth=3"
:ext-relative: {outfilesuffix}

( link:lehd_csv_naming.pdf[Printable version] )


[IMPORTANT]
.Important
==============================================
This document is not an official Census Bureau publication. It is compiled from publicly accessible information
by Lars Vilhuber (http://www.ilr.cornell.edu/ldi/[Labor Dynamics Institute, Cornell University]).
Feedback is welcome. Please write us at
link:mailto:lars.vilhuber@cornell.edu?subject=LEHD_Schema_v4[lars.vilhuber@cornell.edu].
==============================================


Purpose
-------
The public-use data from the Longitudinal Employer-Household Dynamics Program, including the Quarterly Workforce Indicators (QWI)
and Job-to-Job Flows (J2J), are available for download with the following data schema.
These data are available as Comma-Separated Value (CSV) files through the LEHD website’s Data page at
http://lehd.ces.census.gov/data/ and through LED Extraction Tool at http://ledextract.ces.census.gov/.

This document describes the file naming schema for LEHD-provided CSV files.

Schema for Data File Contents
-----------------------------

The contents (schema) for LEHD data files are described in  link:lehd_public_use_schema{ext-relative}[].
The contents (schema) for shapefiles are described in link:lehd_shapefiles{ext-relative}[].

Extends
-------
This version reimplements some features from  V4.0. Many files compliant with LEHD or QWI Schema v4.0 will also be compliant with this schema, but compatibility is not guaranteed.

Supersedes
----------
This version supersedes V4.1, for files released as of R2018Q1


Basic Schema
------------

All files are preceded by a file type definition, followed by additional information on the aggregation level of the file, or
some other identifier.

-------------------
[TYPE]_[DETAILS].[EXT]
-------------------

( link:naming_convention.csv[] )


[width="90%",format="csv",delim=",",cols="^1,<3,<5",options="header"]
|===================================================
include::tmp_naming_convention.csv[]
|===================================================

=== QWIPU from the LED Extraction Tool
Files downloaded through the  LED Extraction Tool at http://ledextract.ces.census.gov/ follow the following naming convention:
------------------------------------
[type]_[id].[EXT]
------------------------------------
where +[id]+ is the Request ID (a unique string of characters) generated every time Submit Request'' is clicked. The ID references each query submission made to the database.


=== [[versionqwi]]Metadata for QWI data files (version.txt)
Metadata accompanies the data files, identifying provenance, geographic and temporal coverage. These files follow the following naming convention:
--------------------------------
version_[demo]_[fas].txt
--------------------------------
where each name component is described in more detail <<components,below>>.

=== [[versionj2j]]Metadata for J2J data files (version.txt)
Metadata accompanies the data files, identifying provenance, geographic and temporal coverage. These files follow the following naming convention:
--------------------------------
version_[type].txt
--------------------------------
where each name component is described in more detail <<components,below>>.

=== [[version_j2jod]]Metadata for J2JOD files
Because the origin-destination (J2JOD) data link two regions, we provide an auxiliary file with the time range for which cells containing data for each geographic pairing may appear in a data release. The reference region will always be either the origin or the destination.
These files follow the following naming convention:
--------------------------------
j2jod_[geography]_avail.csv
--------------------------------
where each name component is described in more detail <<components,below>>.


== [[components]]Description of Filename Components

=== Types

( link:naming_type.csv[] )

[width="90%",format="csv",delim=";",cols="^1,<3,<5,<3",options="header"]
|===================================================
include::naming_type.csv[]
|===================================================

=== geohi
( link:naming_geohi.csv[] )

The geohi component is based on one of two possible code sets:

- the lower-case alphabetic FIPS or postal state code based on https://catalog.data.gov/dataset/fips-state-codes[FIPS PUB 5-2] (see also link:label_stusps.csv[] for upper-case variants).
- the numeric CBSA code corresponding to the metropolitan areas (see link:label_geography_metro.csv[])
- It is expanded to encompass additional codes denoting national coverage, or a collection of states.

[width="60%",format="csv",cols="^1,<4",options="header"]
|===================================================
type,Description
all,"Collection of all available states"
us,"National data (50 states + DC)"
st,Any legal 2-character lower-case state postal code
NNNNN,Any valid CBSA-derived code listed in link:label_geography_metro.csv[]
|===================================================

=== demo
( link:naming_demo.csv[] )

[width="60%",format="csv",cols="^1,<4",options="header"]
|===================================================
include::naming_demo.csv[]
|===================================================

<<<


=== fas
( link:naming_fas.csv[] )

[width="60%",format="csv",cols="^1,<4",options="header"]
|===================================================
include::naming_fas.csv[]
|===================================================

<<<


=== geocat
( link:naming_geocat.csv[] )

[width="60%",format="csv",cols="^1,<4",options="header"]
|===================================================
include::naming_geocat.csv[]
|===================================================

<<<


=== indcat
( link:naming_indcat.csv[] )

[width="60%",format="csv",cols="^1,<4",options="header"]
|===================================================
include::naming_indcat.csv[]
|===================================================

<<<


=== owncat
( link:naming_owncat.csv[] )

[width="60%",format="csv",cols="^1,<4",options="header"]
|===================================================
include::naming_owncat.csv[]
|===================================================

<<<


=== sa
( link:naming_sa.csv[] )

[width="60%",format="csv",cols="^1,<4",options="header"]
|===================================================
include::naming_sa.csv[]
|===================================================

<<<


=== ext
( link:naming_ext.csv[] )

[width="60%",format="csv",cols="^1,<4",options="header"]
|===================================================
include::naming_ext.csv[]
|===================================================

<<<


== [[changes]] Changes
For a description of how schema files are versioned, see link:../VERSIONING{ext-relative}[main directory].

=== This version (revisions)
- 2017-12-15: Initial release
- 2018-03-23: Fix EOL issues

=== Version 4.2.0 from 4.1.3
- Updated industry classification from NAICS 2012 to NAICS 2017
- Added a column +ind_level+ to label_industry.csv similar to the +geo_level+
- Added additional columns to the variable metadata schema for greater clarity
* Description,
* Concept,
* Base
- Added a (draft) taxonomy of concepts used in the LEHD data world (link:label_concept_draft.csv[label_concept_draft.csv])
- Fixed the labeling of ownership code +A00+ to correctly reflect scope
- Added files describing the number of quarters of data availability required relative to start and end quarters (link:lags_qwi.csv[] and link:lags_j2j.csv[]), and its metadata (link:variables_lags.csv[])

[IMPORTANT]
.Important
==============================================
Some of the data products noted above do not exist yet.
==============================================

<<<
*******************
This revision: Tue Apr 17 17:54:57 EDT 2018
*******************