1. LODES Code Building Blocks

1.1. Loading Data

There are four types of LODES data files, and all of them will be used in these samples. The types and their purposes/contents are briefly described here. For more information on the files, their organization, and naming convention, see the LODES Technical Documentation. Additionally, labels for labor market segments and worker/job characteristics can be found in the LODES Schema Files section in the Appendix.

Residence Area Characteristics (RAC): jobs are totaled by home census block.
Workplace Area Characteristics (WAC): jobs are totaled by work census block.
Origin-Destination (OD): jobs totals are associated with both a home census block and a work census block.
Geography Crosswalk: relationship file that establishes the hierarchical connection between each 2020 census tabulation block and all higher level geographic entities.

The naming conventions for the RAC, WAC, and OD files contain the following information:

Two letter state abbreviation (i.e. ‘ny’)
File type: RAC, WAC, or OD
Part of the file: “main” or “aux” (only for OD files)
Four character segment code (only for RAC and WAC files)
Four character job type
Four digit year

There are three easy ways to use LODES data in open source languages. You can (1) read directly from the online file structure, (2) download the raw data files directly, or (3) use output from the LED Extraction Tool.

1.1.1. Using the LODES Raw Data Files

Reading LODES directly from the the online directory structure may be the simplest option for users, and the code examples in all other sections will read the data in this way. Because the file types are determined entirely by the variables above, we can set variables equal to the desired values, and use that to construct the accurate URL/file name. The code below represents a function that can be used repeatedly and flexibly to read in OD, RAC, and WAC data (and in fact this function will be found repeatedly in these samples). Beneath the function is an example of its use. This code loads the RAC, WAC, and OD data for job type “JT00” (all workers) in 2004 for New Jersey. For the RAC and WAC files, the segment “SA01” (workers age 29 or younger) is selected. Note that for each file type, the geographic identifiers are read in specifically as a “string” datatype, this ensures consistency across file types, which will be important when merging files.

Note

These functions can also be used to read downloaded files, instead of reading them directly from the online file structure. Simply change the argument base to be a location on your computer (i.e. local filesystem) where the files are located.

Python

import pandas as pd


def read_lodes_data(
    data_type,
    state,
    segment,
    job_type,
    year,
    main=True,
    base="https://lehd.ces.census.gov/data/lodes/LODES8/",
):
    """
    Reads LODES RAC, WAC, or OD data based on the specified type.

    Parameters:
    - data_type (str): One of 'rac', 'wac', or 'od' to specify the dataset.
    - state (str): Lowercase two-character state abbreviation (e.g., 'nj').
    - segment (str): Segment type (e.g., 'SA01'). Ignored if 'data_type' is 'od'.
    - job_type (str): Job type (e.g., 'JT00').
    - year (int): Year of data (e.g., 2004).
    - main (bool): Whether to read the 'main' file type. Only used if 'data_type' is 'od'.
    - base (str): Base file path to read data from. Can be URL or local file path.

    Returns:
    - pd.DataFrame: The requested LODES dataset as a pandas DataFrame.
    """
    if data_type not in ["rac", "wac", "od"]:
        raise ValueError("Invalid data type. Choose from 'rac', 'wac', or 'od'.")
    if data_type == "od":
        if main:
            file_url = f"{base}{state}/od/{state}_od_main_{job_type}_{year}.csv.gz"
        else:
            file_url = f"{base}{state}/od/{state}_od_aux_{job_type}_{year}.csv.gz"
        dtype_mapping = {"h_geocode": str, "w_geocode": str}
    else:
        file_url = f"{base}{state}/{data_type}/{state}_{data_type}_{segment}_{job_type}_{year}.csv.gz"
        dtype_mapping = {"h_geocode": str} if data_type == "rac" else {"w_geocode": str}
    return pd.read_csv(file_url, dtype=dtype_mapping)


# Example usage
rac_data = read_lodes_data("rac", "nj", "SA01", "JT00", 2004)
wac_data = read_lodes_data("wac", "nj", "SA01", "JT00", 2004)
od_data = read_lodes_data("od", "nj", None, "JT00", 2004)

print(od_data.head())

>>          w_geocode        h_geocode  S000  SA01  ...  SI01  SI02  SI03  createdate
>> 0  340010001001001  340010001001013     1     0  ...     0     0     1    20230321
>> 1  340010001001001  340010001002004     2     1  ...     0     0     2    20230321
>> 2  340010001001001  340010001002013     1     1  ...     0     0     1    20230321
>> 3  340010001001001  340010002002011     1     1  ...     0     0     1    20230321
>> 4  340010001001001  340010003001008     1     1  ...     0     0     1    20230321
>> 
>> [5 rows x 13 columns]

R

library(readr)

read_lodes_data <- function(data_type,
                            state,
                            segment,
                            job_type,
                            year,
                            main = TRUE,
                            base = "https://lehd.ces.census.gov/data/lodes/LODES8/") {
  #' Reads LODES RAC, WAC, or OD data based on the specified type.
  #'
  #' @param data_type One of 'rac', 'wac', or 'od' to specify the dataset.
  #' @param state Lowercase two-character state abbreviation (e.g., 'nj').
  #' @param segment Segment type (e.g., 'SA01'). Ignored if 'data_type' is 'od'.
  #' @param job_type Job type (e.g., 'JT00').
  #' @param year Year of data (e.g., 2004).
  #' @param main Whether to read the 'main' file type. Only used if 'data_type' is 'od'.
  #' @param base Base file path to read data from. Can be URL or local file path.
  #'
  #' @returns The requested LODES dataset.
  # Validate data type
  if (!(data_type %in% c("rac", "wac", "od"))) {
    stop("Invalid data type. Choose from 'rac', 'wac', or 'od'.")
  }

  # Construct file URL based on data type
  if (data_type == "od") {
    file_url <- if (main) {
      paste0(base, state, "/od/", state, "_od_main_", job_type, "_", year, ".csv.gz")
    } else {
      paste0(base, state, "/od/", state, "_od_aux_", job_type, "_", year, ".csv.gz")
    }
    col_types <- cols(h_geocode = col_character(), w_geocode = col_character())
  } else {
    file_url <- sprintf(
      "%s%s/%s/%s_%s_%s_%s_%d.csv.gz",
      base,
      state,
      data_type,
      state,
      data_type,
      segment,
      job_type,
      year
    )
    col_types <- if (data_type == "rac") {
      cols(h_geocode = col_character())
    } else {
      cols(w_geocode = col_character())
    }
  }

  # Read CSV with specified column types
  df <- read_csv(file_url, col_types = col_types)

  return(df)
}

# Example usage
rac_data <- read_lodes_data("rac", "nj", "SA01", "JT00", 2004)
wac_data <- read_lodes_data("wac", "nj", "SA01", "JT00", 2004)
od_data <- read_lodes_data("od", "nj", NULL, "JT00", 2004)

print(head(od_data))

>> # A tibble: 6 × 13
>>   w_geocode      h_geocode  S000  SA01  SA02  SA03  SE01  SE02  SE03  SI01  SI02
>>   <chr>          <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
>> 1 3400100010010… 34001000…     1     0     1     0     0     0     1     0     0
>> 2 3400100010010… 34001000…     2     1     0     1     1     0     1     0     0
>> 3 3400100010010… 34001000…     1     1     0     0     1     0     0     0     0
>> 4 3400100010010… 34001000…     1     1     0     0     1     0     0     0     0
>> 5 3400100010010… 34001000…     1     1     0     0     1     0     0     0     0
>> 6 3400100010010… 34001000…     1     1     0     0     1     0     0     0     0
>> # ℹ 2 more variables: SI03 <dbl>, createdate <dbl>

The process for reading a Geography Crosswalk is also relatively simple. There is one crosswalk for each state, so we can write a function (which will also be reused throughout these examples) to read a crosswalk given the lowercase two-digit state abbreviation. The example code directly beneath the function reads the Geography Crosswalk for New Jersey.

Note

The Geography Crosswalks are very wide. They contain more than 15 different geography types with name and codes for each geography. Therefore, it is recommended to read in only the columns are important to the problem at hand. The function below provides this functionality.

Python

import pandas as pd


def read_crosswalk(
    state, cols="all", base="https://lehd.ces.census.gov/data/lodes/LODES8/"
):
    """
    Reads LODES Geography Crosswalk given state.

    Parameters:
    - state (str): Lowercase two-character state abbreviation (e.g., 'nj').
    - cols (list): List of columns to read from the crosswalk in addition to tabblk2020 (Block ID). If "all", all columns will be read and returned.
    - base (str): Base file path to read data from. Can be URL or local file path.

    Returns:
    - pd.DataFrame: The requested LODES crosswalk as a pandas DataFrame.
    """
    file_url = f"{base}{state}/{state}_xwalk.csv.gz"
    dtype_mapping = str
    if cols == "all":
        return pd.read_csv(file_url, dtype=dtype_mapping)
    cols.insert(0, "tabblk2020")
    return pd.read_csv(file_url, dtype=dtype_mapping, usecols=cols)


crosswalk = read_crosswalk("nj", cols=["cty", "ctyname"])
print(crosswalk.head())

>>         tabblk2020    cty              ctyname
>> 0  340010001001000  34001  Atlantic County, NJ
>> 1  340010001001001  34001  Atlantic County, NJ
>> 2  340010001001002  34001  Atlantic County, NJ
>> 3  340010001001003  34001  Atlantic County, NJ
>> 4  340010001001004  34001  Atlantic County, NJ

R

library(readr)
library(dplyr)

read_crosswalk <- function(state, cols = "all", base = "https://lehd.ces.census.gov/data/lodes/LODES8/") {
  #' Reads LODES Geography Crosswalk given state.
  #'
  #' @param state Lowercase two-character state abbreviatoin (e.g., 'nj').
  #' @param cols List of columns to read from the crosswalk in addition to tabblk2020 (Block ID). If "all", all columns will be read and returned.
  #' @param base Base file path to read data from. Can be URL or local file path.
  #'
  #' @returns The requested LODES crosswalk.


  # Construct file URL
  file_url <- paste0(base, state, "/", state, "_xwalk.csv.gz")
  # Read the full dataset first if all columns are needed
  if (identical(cols, "all")) {
    return(read_csv(file_url, col_types = cols(tabblk2020 = col_character())))
  }
  # Ensure tabblk2020 (GEOID) is included in the selected columns
  cols <- unique(c("tabblk2020", cols))
  # Read only the specified columns
  return(read_csv(
    file_url,
    col_types = cols(.default = col_character()),
    col_select = all_of(cols)
  ))
}

# Example usage
crosswalk <- read_crosswalk("nj", cols = c("cty", "ctyname"))
print(head(crosswalk))

>> # A tibble: 6 × 3
>>   tabblk2020      cty   ctyname            
>>   <chr>           <chr> <chr>              
>> 1 340010001001000 34001 Atlantic County, NJ
>> 2 340010001001001 34001 Atlantic County, NJ
>> 3 340010001001002 34001 Atlantic County, NJ
>> 4 340010001001003 34001 Atlantic County, NJ
>> 5 340010001001004 34001 Atlantic County, NJ
>> 6 340010001001005 34001 Atlantic County, NJ

1.1.2. Using Output from the LED Extraction Tool

Reading data from output from the LED Extraction Tool is very similar. The extraction tool allows users to get both broader and narrower sets of data than the raw files. For example, users can combine multiple geographies and years, instead of having to read and combine multiple files. The selection of specific geographies means that users do not have to receive full state data if they only need a specific geography. In the example below, a user has selected to receive RAC and WAC data from Camden County, NJ and the zip code of 10019 (a ZIP Code in Manhattan). The data was requested for four columns (total and broken into three age categories), across two years, and for primary jobs. There are a few differences to note from the raw files (as can be seen from the code and output below):

The naming of the files is based on a unique character identifier, rather than the human readable format of the raw data files. This unique character identifier will be on the name of the folder, the data files, and the crosswalk.
The extraction tool path for area characteristics data produces the data for RAC and WAC, as well as a geographic crosswalk, which is clipped to contain only the blocks in the selected geographies.
The resulting data files have a different selection of columns when compared to the raw data. Here, there are only four characteristic columns, because only four were selected. On the other hand, there are two columns not in the data, segment and year, which allow for clarity in this output, especially given that you can select multiple years of data.

Note

If you would like to recreate the data in this code sample, you can recreate the query in the LED Extraction Tool using this settings file. The extraction tool provides an easy format for saving and uploading settings files. These files were downloaded by clicking the “CSVs & Metadata (ZIP)” link.

Python

import pandas as pd

# read in wac and rac data
rac_data = pd.read_csv(
    "data/ac_f5e832282bacfdea48d8690947888efb/rac_f5e832282bacfdea48d8690947888efb.csv"
)
wac_data = pd.read_csv(
    "data/ac_f5e832282bacfdea48d8690947888efb/wac_f5e832282bacfdea48d8690947888efb.csv"
)

# read in selected crosswalk
crosswalk = pd.read_csv(
    "data/ac_f5e832282bacfdea48d8690947888efb/xwalk_f5e832282bacfdea48d8690947888efb.csv"
)

# Sample 10 rows from rac_data with fixed seed for reproducibility
print(rac_data.sample(n=10, random_state=4334))

>>              h_geocode segment  year  C000  CA01  CA02  CA03
>> 1698   340076008001012    S000  2019    10     6     3     1
>> 493    340076012004006    S000  2018    11     3     6     2
>> 5824   340076039013007    S000  2019    10     0     6     4
>> 11415  340076103002026    S000  2018    10     5     4     1
>> 528    340076013002026    S000  2018    29    13    10     6
>> 12396  340076089011078    S000  2019    11     2     7     2
>> 7483   340076075053001    S000  2018     8     0     5     3
>> 12048  340076086002031    S000  2019    11     2     8     1
>> 6138   340076051001027    S000  2019    36     6    20    10
>> 13105  340076105002040    S000  2019    20     5     8     7

R

library(readr)
library(dplyr)

# Read in wac and rac data
rac_data <- read_csv(
  "data/ac_f5e832282bacfdea48d8690947888efb/rac_f5e832282bacfdea48d8690947888efb.csv"
)
wac_data <- read_csv(
  "data/ac_f5e832282bacfdea48d8690947888efb/wac_f5e832282bacfdea48d8690947888efb.csv"
)

# Read in selected crosswalk
crosswalk <- read_csv(
  "data/ac_f5e832282bacfdea48d8690947888efb/xwalk_f5e832282bacfdea48d8690947888efb.csv"
)

# Sample 10 rows from rac_data with fixed seed for reproducibility
set.seed(4334)
print(sample_n(rac_data, 10))

>> # A tibble: 10 × 7
>>    h_geocode segment  year  C000  CA01  CA02  CA03
>>        <dbl> <chr>   <dbl> <dbl> <dbl> <dbl> <dbl>
>>  1   3.40e14 S000     2019     5     2     1     2
>>  2   3.40e14 S000     2018    13     6     6     1
>>  3   3.40e14 S000     2018    12     1     4     7
>>  4   3.40e14 S000     2018    14     2     6     6
>>  5   3.40e14 S000     2019     5     0     0     5
>>  6   3.40e14 S000     2018    24     8    10     6
>>  7   3.40e14 S000     2018    18     5     9     4
>>  8   3.40e14 S000     2019    28     6    17     5
>>  9   3.40e14 S000     2018    19     3    10     6
>> 10   3.40e14 S000     2018    22     7     7     8

1.3. Spatial and Geographic Analysis

LODES data can be treated as geographic in nature, due to how small the base geography (Census Blocks) is. Users may wish to do spatial aggregations, merges, and/or other analyses using the geographic nature of LODES. The crosswalk files contain the latitude and longitude for an internal point of each block. This internal point is used by the OnTheMap application to assist in selecting blocks. While it is not a centroid, and the only guarantee is that the point is inside the block, it is useful to use the point in spatial analyses.

1.3.1. Case Study: Finding all the Jobs Within 5 KM of a Point

All Census Blocks with 5 KM of Point — This image shows the goal of the first part of the code below (finding all blocks within 5 kilometers of the specific point). For the Python code that generated this image, see Visualizing Blocks around a Point in New Jersey.

The code below finds all jobs in 2018 where workers are 29 years or younger, whose workplace is within 5 km of a specific point (Latitude: 40.736679° N, Longitude: 74.175975° W) and whose residence is in Camden County, NJ. The following steps are required:

Read in the appropriate OD file and Geography Crosswalk. The OD file is needed because to answer the question, both residence and workplace are required. For the data question, the “main” OD file for New Jersey in 2018 for job type JT00 is required. The Geography Crosswalk used is the New Jersey crosswalk. Note that for the crosswalk, we include the longitude and latitude for each block.
The crosswalk must be converted to a data format that can handle spatial calculations. This includes using a proper projection. In this example, the projection ESRI:102003 is used, which is the same projection used in the OnTheMap application.
Limit the OD rows to those that have a workplace within 5 km of the specified point (find all blocks that have their internal points within this radius).
Limit further the OD rows to those whose residence is in Camden county.
Sum the column “SA01”, which is the segment of jobs whose worker is 29 years of age or younger.

Note

In dealing with spatial data and conducting spatial analyses, projections are incredibly important. In this example, projection ESRI:102003 is used, which is a projection that is optimized for the contiguous United States. However, if an analysis is localized to a specific smaller geographic area, a different projection may be more accurate. For more on projections and how to choose a projection, take a look at this resource.

Python

import pandas as pd
import geopandas as gpd
from shapely import Point

# Read the OD file
od = read_lodes_data("od", "nj", None, "JT00", 2018)

# Read the Crosswalk file
crosswalk = read_crosswalk("nj", cols=["cty", "blklatdd", "blklondd"])

# Convert Crosswalk to GeoDataFrame and convert to appropriate projection
crosswalk = gpd.GeoDataFrame(
    crosswalk,
    geometry=gpd.points_from_xy(
        x=crosswalk["blklondd"], y=crosswalk["blklatdd"], crs="EPSG:4326"
    ),
).to_crs("ESRI:102003")

# Create point for buffering and change to appropriate projection
reference_point = Point(-74.175975, 40.736679)
reference_point_projected = gpd.GeoSeries([reference_point], crs="EPSG:4326").to_crs(
    "ESRI:102003"
)

# Create 5 km buffer polygon around point
buffer_5km = reference_point_projected.buffer(5000)

# Find list of blocks within buffer
blocks_within_5km = crosswalk.loc[
    crosswalk.geometry.within(buffer_5km.iloc[0]), "tabblk2020"
].to_list()

# Limit OD to just blocks whose workplace are within buffer
nj_blocks_in_buffer = od[od["w_geocode"].isin(blocks_within_5km)]

# Find list of blocks in Camden County, NJ
camden_blocks = crosswalk.loc[crosswalk["cty"] == "34007", "tabblk2020"].to_list()

# Limit OD to just blocks whose residence are in Camden County
nj_buffer_camden = nj_blocks_in_buffer[
    nj_blocks_in_buffer["h_geocode"].isin(camden_blocks)
]
num_jobs = nj_buffer_camden["SA01"].sum()

print(f"There are {num_jobs} jobs meeting the selected criteria.")

>> There are 242 jobs meeting the selected criteria.

R

library(sf)
library(dplyr)
library(readr)


# Read the OD file
od <- read_lodes_data("od", "nj", None, "JT00", 2018)

# Read the Crosswalk file
crosswalk <- read_crosswalk("nj", cols = c("cty", "blklatdd", "blklondd"))

# Convert Crosswalk to sf object and transform to EPSG:102003
crosswalk_sf <- crosswalk %>%
  filter(!is.na(blklatdd) &
    !is.na(blklondd)) %>% # Remove missing lat/lon
  st_as_sf(coords = c("blklondd", "blklatdd"), crs = 4326) %>%
  st_transform(crs = 102003)

# Create reference point and buffer in EPSG:102003
reference_point <- st_sfc(st_point(c(-74.175975, 40.736679)), crs = 4326) %>%
  st_transform(crs = 102003)

# Create 5 km buffer polygon around point
buffer_5km <- st_buffer(reference_point, 5000)

# Find list of blocks within buffer
blocks_within_5km <- crosswalk_sf %>%
  filter(as.logical(st_within(geometry, buffer_5km, sparse = FALSE))) %>%
  pull(tabblk2020)

# Limit OD to just blocks whose workplace is within buffer
nj_blocks_in_buffer <- od %>% filter(w_geocode %in% blocks_within_5km)

# Find list of blocks in Camden County, NJ (FIPS: 34007)
camden_blocks <- crosswalk_sf %>%
  filter(cty == "34007") %>%
  pull(tabblk2020)

# Limit OD to just blocks whose residence is in Camden County
nj_buffer_camden <- nj_blocks_in_buffer %>% filter(h_geocode %in% camden_blocks)

# Sum number of workers
num_jobs <- sum(nj_buffer_camden$SA01, na.rm = TRUE)

cat(sprintf("There are %d jobs meeting the selected criteria.\n", num_jobs))

>> There are 242 jobs meeting the selected criteria.

1.4. Combining Other Data with LODES

Users may desire to combine other data products from the Census Bureau or non-Census data products with LODES. In many cases, merging on FIPS code or GEOID may be feasible, both on Census and non-Census data. In other cases, spatial merges may be required, if the external data is in a spatial format and does not have a GEOID that maps with Census data.

1.4.1. American Community Survery (ACS) Data

First, let’s consider the example of merging LODES data with data from the American Community Survery (ACS). ACS data uses GEOIDs for identifiers, so the process can be relatively simple. However, the tricky part is in aligning geography. If the year of ACS data has the same geography as the current LODES crosswalk, then the process is relatively simple. This is the case with ACS 2023 data and LODES 2022 data, which both rely on 2023 TIGER/Line Shapefiles.

Case Study: Combine the Number of Residences with Median Household Income by Tract

The code below aggregates 2022 RAC data to the tract level and then merges it with 2023 ACS median household income data. In order to run this code, you should request an API key. Note that limited queries can be made without an API key (simply leave off the &key= portion of the request), but that for repeated queries, an API key should be used.

Python

import pandas as pd
import requests

rac = read_lodes_data("rac", "nj", "S000", "JT00", 2022)

# Read the Crosswalk file
crosswalk = read_crosswalk("nj", cols=["trct"])

# Define your API Key (Get one from https://api.census.gov/data/key_signup.html)
API_KEY = "YOUR_API_KEY_HERE"
if API_KEY == "YOUR_API_KEY_HERE":
    raise ValueError("Please create and enter your own API Key before continuing.")


# Define the base URL for ACS 5-Year Estimates
BASE_URL = "https://api.census.gov/data/2023/acs/acs5"

# Define the variables to retrieve (median household income)
VARIABLES = ["B19013_001E"]

# Define the geography (Tract-Level Data in New Jersey)
STATE_FIPS = "34"
GEOGRAPHY = f"for=tract:*&in=state:{STATE_FIPS}"

# Construct the API request URL
url = f"{BASE_URL}?get={','.join(VARIABLES)}&{GEOGRAPHY}&key={API_KEY}"

# Make the request
response = requests.get(url)
data = response.json()

# Convert to DataFrame
acs = pd.DataFrame(data[1:], columns=data[0])

# Add a GEOID column for merging (State FIPS + County FIPS + Tract Code)
acs["GEOID"] = acs["state"] + acs["county"] + acs["tract"]

# Merge and aggregate RAC data
rac_tract = rac.merge(
    crosswalk, how="left", left_on="h_geocode", right_on="tabblk2020", validate="1:1"
)
rac_tract = rac_tract.drop(["h_geocode", "tabblk2020", "createdate"], axis=1)
rac_agg = rac_tract.groupby("trct").sum().reset_index()

# Combine ACS and RAC data
rac_acs = rac_agg.merge(
    acs, how="left", left_on="trct", right_on="GEOID", validate="1:1"
)

# Drop duplicate GEOID column
rac_acs = rac_acs.drop(["GEOID"], axis=1)

print(rac_acs.head())

>>           trct  C000  CA01  CA02  CA03  ...  CS02  B19013_001E  state  county   tract
>> 0  34001000100  1065   235   546   284  ...   530        44870     34     001  000100
>> 1  34001000200  1123   246   520   357  ...   569        54858     34     001  000200
>> 2  34001000300  1514   347   775   392  ...   694        53463     34     001  000300
>> 3  34001000400  1147   218   563   366  ...   542        34895     34     001  000400
>> 4  34001000500  1080   254   576   250  ...   529        50704     34     001  000500
>> 
>> [5 rows x 46 columns]

R

library(httr)
library(jsonlite)
library(dplyr)
library(readr)

# Read the RAC file
rac <- read_lodes_data("rac", "nj", "S000", "JT00", 2022)

# Read the Crosswalk file
crosswalk <- read_crosswalk("nj", cols = c("trct"))

# Define your API Key (replace with your actual API Key)
api_key <- "YOUR_API_KEY_HERE"
if (api_key == "YOUR_API_KEY_HERE") {
  stop("Please create and enter your own API Key before continuing.")
}

# Define the base URL for ACS 5-Year Estimates
BASE_URL <- "https://api.census.gov/data/2023/acs/acs5"

# Define the variables to retrieve (median household income)
VARIABLES <- "B19013_001E"

# Define the geography (Tract-Level Data in New Jersey)
STATE_FIPS <- "34"
GEOGRAPHY <- paste0("for=tract:*&in=state:", STATE_FIPS)

# Construct the API request URL
url <- paste0(BASE_URL, "?get=", VARIABLES, "&", GEOGRAPHY, "&key=", api_key)

# Make the request to the Census API
response <- GET(url)

# Parse the response into JSON
data <- fromJSON(content(response, "text"))

# Convert the response to a data frame
acs <- as.data.frame(data[-1, ], stringsAsFactors = FALSE)
colnames(acs) <- data[1, ]

# Add a GEOID column for merging (State FIPS + County FIPS + Tract Code)
acs$GEOID <- paste0(acs$state, acs$county, acs$tract)

# Merge RAC data with Crosswalk
rac_tract <- rac %>%
  left_join(crosswalk, by = c("h_geocode" = "tabblk2020")) %>%
  select(-h_geocode, -createdate)

# Aggregate RAC data by tract
rac_agg <- rac_tract %>%
  group_by(trct) %>%
  summarise(across(everything(), \(x) sum(x, na.rm = TRUE)))

# Merge ACS data with RAC data
rac_acs <- rac_agg %>%
  left_join(acs, by = c("trct" = "GEOID"))

print(head(rac_acs))

>> # A tibble: 6 × 46
>>   trct    C000  CA01  CA02  CA03  CE01  CE02  CE03 CNS01 CNS02 CNS03 CNS04 CNS05
>>   <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
>> 1 34001…  1065   235   546   284   210   443   412     0     0     8    43    25
>> 2 34001…  1123   246   520   357   197   461   465     1     0     5    26    24
>> 3 34001…  1514   347   775   392   282   699   533     0     0     4    43    37
>> 4 34001…  1147   218   563   366   195   457   495     2     0    10    37    30
>> 5 34001…  1080   254   576   250   232   481   367     2     0     0    23    19
>> 6 34001…   721   161   380   180   180   327   214     1     0     6    12    10
>> # ℹ 33 more variables: CNS06 <dbl>, CNS07 <dbl>, CNS08 <dbl>, CNS09 <dbl>,
>> #   CNS10 <dbl>, CNS11 <dbl>, CNS12 <dbl>, CNS13 <dbl>, CNS14 <dbl>,
>> #   CNS15 <dbl>, CNS16 <dbl>, CNS17 <dbl>, CNS18 <dbl>, CNS19 <dbl>,
>> #   CNS20 <dbl>, CR01 <dbl>, CR02 <dbl>, CR03 <dbl>, CR04 <dbl>, CR05 <dbl>,
>> #   CR07 <dbl>, CT01 <dbl>, CT02 <dbl>, CD01 <dbl>, CD02 <dbl>, CD03 <dbl>,
>> #   CD04 <dbl>, CS01 <dbl>, CS02 <dbl>, B19013_001E <chr>, state <chr>,
>> #   county <chr>, tract <chr>

However, the process of matching LODES and ACS data becomes much more complicated if the ACS year desired does not have the same geography as the current LODES crosswalk. This process would require using the TIGER/Line shapfiles.

1.4.2. TIGER/Line Shapefiles

The TIGER/Line Shapefiles are extracts of selected geographic and cartographic information published by the Census Bureau. The shapefiles include polygon boundaries of geographic areas and features, linear features including roads and hydrography, and point features. The TIGER/Line Shapefiles contain a standard geographic identifier (GEOID) for each entity that links to the GEOID in the data from censuses and surveys, but they do not contain demographic or other data themselves.Users of LODES data may be interested in TIGER/Line Shapefiles if they desire aggregations or tabulations at geographic levels that are not current.

Case Study: Aggregating LODES Data by Former Connecticut Counties

In 2022, the Census Bureau begain adopting Connecticut’s nine planning regions as county-equivalents. Thus, those are the tabulations present in the current crosswalk. If a user desired the previous counties as a tabulation area, those are not published in the LODES geographic crosswalk. However, they are available in historical TIGER/Line Shapefiles. Thus, a user can spatially merge LODES data with that shapefile (by finding the blocks whose internal points are inside each former Connecticut county) and aggregate at that level.

The code below finds the number of primary jobs whose workplace location is in each former county in Connecticut in 2020. This requires a spatial merge of the crosswalk with the TIGER/Line Shapefile and then a column-based merge to the WAC data.

Note

There are several situations in which a user may want to user TIGER/Line Shapefiles or other shapefiles to access historic or different geographies. The process is essentially the same for any geographic file (as long as the projections are appropriate and matching). However, it should be noted that census block boundaries can change over time, and this can make historical comparisons challenging.

Python

import pandas as pd
import geopandas as gpd

# Load counties from TIGER shapefiles
counties = gpd.read_file(
    "https://www2.census.gov/geo/tiger/TIGER2020/COUNTY/tl_2020_us_county.zip"
)

# Filter counties to just those in CT
ct_counties = counties[counties["STATEFP"] == "09"]

# Read WAC data for CT
ct_wac = read_lodes_data("wac", "ct", "S000", "JT01", 2020)

# Read crosswalk data for CT
ct_crosswalk = read_crosswalk("ct", cols=["blklatdd", "blklondd"])

# Convert crosswalk to geographic format
ct_crosswalk = gpd.GeoDataFrame(
    ct_crosswalk,
    geometry=gpd.points_from_xy(ct_crosswalk["blklondd"], ct_crosswalk["blklatdd"]),
    crs="EPSG:4326",
)

# Convert TIGER/Line data to same projection
ct_counties = ct_counties.to_crs("EPSG:4326")

# Spatial join: assign counties based on block internal points (points within polygons)
blocks_with_counties = gpd.sjoin(
    ct_crosswalk, ct_counties, how="left", predicate="within"
)

# Merge block-county with WAC on workplace block ID
merged_wac = ct_wac.merge(
    blocks_with_counties,
    left_on="w_geocode",
    right_on="tabblk2020",
    how="left",
    validate="1:1",
)

jobs_by_ct = (
    merged_wac[["COUNTYFP", "NAME", "C000"]]
    .groupby(["COUNTYFP", "NAME"])
    .sum()
    .reset_index()
)

print(jobs_by_ct)

>>   COUNTYFP        NAME    C000
>> 0      001   Fairfield  351266
>> 1      003    Hartford  460016
>> 2      005  Litchfield   50648
>> 3      007   Middlesex   54315
>> 4      009   New Haven  330795
>> 5      011  New London   95092
>> 6      013     Tolland   34619
>> 7      015     Windham   30589

R

library(sf)
library(dplyr)
library(tigris)
options(tigris_use_cache = TRUE)

# Load CT counties from TIGER shapefiles
ct_counties <- counties(state = "09", year = 2020, cb = FALSE) %>%
  st_transform(crs = 4326)

# Read WAC data for CT
ct_wac <- read_lodes_data("wac", "ct", "S000", "JT01", 2020)

# Read crosswalk data for CT
ct_crosswalk <- read_crosswalk("ct", cols = c("blklondd", "blklatdd"))

# Convert to sf (point geometry)
ct_crosswalk <- st_as_sf(ct_crosswalk, coords = c("blklondd", "blklatdd"), crs = 4326)

# Spatial join: assign counties based on block internal points (points within polygons)
blocks_with_counties <- st_join(ct_crosswalk, ct_counties, join = st_within, left = FALSE)

# Merge block-county with WAC on workplace block ID
merged_wac <- ct_wac %>%
  left_join(blocks_with_counties, by = c("w_geocode" = "tabblk2020"))

jobs_by_ct <- merged_wac %>%
  group_by(COUNTYFP, NAME) %>%
  summarise(C000 = sum(C000, na.rm = TRUE), .groups = "drop")

print(jobs_by_ct)

>> # A tibble: 8 × 3
>>   COUNTYFP NAME         C000
>>   <chr>    <chr>       <dbl>
>> 1 001      Fairfield  351266
>> 2 003      Hartford   460016
>> 3 005      Litchfield  50648
>> 4 007      Middlesex   54315
>> 5 009      New Haven  330795
>> 6 011      New London  95092
>> 7 013      Tolland     34619
>> 8 015      Windham     30589

1. LODES Code Building Blocks

1.1. Loading Data

1.1.1. Using the LODES Raw Data Files

1.1.2. Using Output from the LED Extraction Tool

1.2. Combining and Aggregating Data

1.2.1. Combining Data Files

1.2.2. Combining Geographic Crosswalks

1.2.3. Aggregating to Higher-Level Geographies

Case Study: Counts of Worker Residence by County in New Jersey

Why Can’t I Use the GEOID for Aggregation?

Case Study: County to County Workplace and Residence Pairs

1.3. Spatial and Geographic Analysis

1.3.1. Case Study: Finding all the Jobs Within 5 KM of a Point

1.4. Combining Other Data with LODES

1.4.1. American Community Survery (ACS) Data

Case Study: Combine the Number of Residences with Median Household Income by Tract

1.4.2. TIGER/Line Shapefiles

Case Study: Aggregating LODES Data by Former Connecticut Counties