Post-Secondary Employment Outcomes (PSEO)

Post-Secondary Employment Outcomes (PSEO) are experimental tabulations developed by researchers at the U.S. Census Bureau. PSEO data provide earnings and employment outcomes for college and university graduates by degree level, degree major, post-secondary institution, and state of institution. These statistics are generated by matching university transcript data with a national database of jobs, using state-of-the-art confidentiality protection mechanisms to protect the underlying data. The PSEO are made possible through data sharing partnerships between universities, university systems, State Departments of Education, State Labor Market Information offices, and the U.S. Census Bureau. PSEO data are available for post-secondary institutions whose transcript data have been made available to the Census Bureau through a data-sharing agreement.

Since this documentation is focused on code samples, it will not detail all the specifics of the PSEO data. For more information, users may find the following links/resources useful:

PSEO Explorer: PSEO Explorer is an interactive tool that allows for comparisons of employment outcomes through dynamic grouped bar charts and employment flows through Sankey diagrams.
PSEO Information and Coverage: This webpage provides background and information about which states and universities/university systems contribute to the data.
PSEO Help and Documentation: This webpage provides background information on PSEO methodology and other useful topics.
PSEO Technical Documentation: This document provides information on the construction of the PSEO data as well as information on the disclosure status of the PSEO data product and Census Bureau Disclosure Review Board approval numbers.
PSEO Schema for Latest Release: This schema provides detailed information on file naming, structure, identifiers, and indicators.
PSEO Files in a Directory Structure: The code samples in this documentation utilize the files from this URL.

Reading Data Files

PSEO data can be accessed in two ways, either through the Census API or directly from the online directory structure. Either can be used to conduct analyses, as the data is the same, but users may have a preference for one method or the other depending on preference or project requirements. Below both methods are detailed, but the remaining PSEO code samples will use the data files directly.

Using the Online Directory Structure

Here, the files are read directly from the online directory structure, but the same code can be used for reading the files locally instead. When selecting PSEO files to read/download, there are two main considerations:

Earnings or Flows: PSEO Earnings data provides access to earnings at the 25th, 50th, and 75th percentile at 1, 5, and 10 years post graduation. Flows provide counts of employed graduates by industry and geography, 1, 5, and 10 years after graduation. These files have separate structures, but can be merged at several levels, if needed.
State or All: PSEO earnings and flows data can be downloaded for a specific state, or for “all”. If downloaded for a specific state, the file will contain information on graduates from institutions in that state only. The “all” files contain all data for earnings or flows (essentially, all the individual state files stacked on top of each other).

In the example below, the earnings file for Colorado is read directly from the online directory structure. This file contains earnings data for all institutions in Colorado. After reading in the data, this code limits the file to the highest level aggregation. In this case, that is earnings from all PSEO-participating institutions in Colorado, across all CIP codes (for more on CIP codes, see the National Center for Education Statistics and/or the LEHD Schema), and across all graduation cohorts. Degree level is also present as we do not aggregate across degree levels. Therefore, the earnings numbers below can be interpreted as the median earnings 1 year after graduation by degree level for graduates from institutions in Colorado across all instructional programs (CIP codes) and all cohorts. The results also include the number of graduates we observe with sufficient labor force attachment. (i.e. the number of graduates used to calculate the earnings figures) as well as a status flag. Information on the coding of degree_types can be found in the LEHD Schema, and for an example of joining schema labels to data, see Case Study: How Many Engineering Graduates Stay in Illinois?.

Note

The structure of the PSEO data may not be immediately intuitive to data users. For example, in the code below, the institution of 08 corresponds to “All Institutions in Colorado”. The most helpful resource for understanding the structure of PSEO data is the LEHD Schema. Some potentially helpful places to start are Identifiers for PSEO, Indicators for PSEO Earnings, and Indicators for PSEO Flows.

Python

import pandas as pd

STATE = "co"
BASE_URL = "https://lehd.ces.census.gov/data/pseo/latest_release/"


# Read Colorado earnings CSV (gzipped)
colorado_earnings = pd.read_csv(
    f"{BASE_URL}{STATE}/pseoe_{STATE}.csv.gz",
    dtype={"institution": str, "degree_level": str, "cipcode": str, "grad_cohort": str},
    low_memory=False,
)

# Display 50th percentile earnings for all Colorado Grads for all CIP Codes
# and all graduation cohorts by degree type
all_colorado_grads = colorado_earnings.loc[
    (colorado_earnings["institution"] == "08")
    & (colorado_earnings["cipcode"] == "00")
    & (colorado_earnings["grad_cohort"] == "0000"),
    ["y1_p50_earnings", "degree_level", "y1_grads_earn", "status_y1_earnings"],
]

print(all_colorado_grads)

>>    y1_p50_earnings degree_level  y1_grads_earn  status_y1_earnings
>> 0          34540.0           01        78464.0                   1
>> 1          40177.0           02        26145.0                   1
>> 2          37982.0           03        79020.0                   1
>> 3              NaN           04            NaN                   5
>> 4          40501.0           05       305134.0                   1
>> 5          47119.0           06           62.0                   1
>> 6          62915.0           07        88900.0                   1
>> 7              NaN           08            NaN                   5
>> 8          72856.0           17        10500.0                   1
>> 9          75701.0           18        10524.0                   1

R

library(readr)
library(dplyr)

state <- "co"
base_url <- "https://lehd.ces.census.gov/data/pseo/latest_release/"

# Read Colorado earnings CSV (gzipped)
colorado_earnings <- read_csv(
  paste0(base_url, state, "/pseoe_", state, ".csv.gz"),
  col_types = cols(
    institution = col_character(),
    degree_level = col_character(),
    cipcode = col_character(),
    grad_cohort = col_character()
  )
)

# Display 50th percentile earnings for all Colorado Grads for all CIP Codes
# and all graduation cohorts by degree type
all_colorado_grads <- colorado_earnings %>%
  filter(
    institution == "08",
    cipcode == "00",
    grad_cohort == "0000"
  ) %>%
  select(y1_p50_earnings, degree_level, y1_grads_earn, status_y1_earnings)

print(all_colorado_grads)

>> # A tibble: 10 × 4
>>    y1_p50_earnings degree_level y1_grads_earn status_y1_earnings
>>              <dbl> <chr>                <dbl>              <dbl>
>>  1           34540 01                   78464                  1
>>  2           40177 02                   26145                  1
>>  3           37982 03                   79020                  1
>>  4              NA 04                      NA                  5
>>  5           40501 05                  305134                  1
>>  6           47119 06                      62                  1
>>  7           62915 07                   88900                  1
>>  8              NA 08                      NA                  5
>>  9           72856 17                   10500                  1
>> 10           75701 18                   10524                  1

A very similar method can be used to read the flows data. Here, the Colorado flows data is read, and then is limited to a subset. In the flows data, there is essentially a cross between which industry a graduate is employed in and which geographic region they are employed in. The data displayed below is thus the number of graduates from Masters programs in Colorado who are employed in each Census division 1 year after graduation. Here, the code selects data for “All NAICS Sectors”, but we could also cross geography with industry (i.e. for a specific NAICS sector, how many graduates are employed in each Census division). It’s worth noting the two other columns in the data in addition to y1_grads_emp. First, y1_grads_emp_instate details the number of graduates who are employed in the same state as the degree is from (in this case Colorado). This column will only contain a non-zero number for the national geography flow and for the Census division that contains the institution state (Colorado is in Census division 8). Second, y1_grads_nme indicates the number of graduates who have no (or only marginal) earnings in the LEHD data. This number will only be reported for the national geography flow.

Python

import pandas as pd

STATE = "co"
BASE_URL = "https://lehd.ces.census.gov/data/pseo/latest_release/"


# Read Colorado earnings CSV (gzipped)
colorado_flows = pd.read_csv(
    f"{BASE_URL}{STATE}/pseof_{STATE}.csv.gz",
    dtype={
        "institution": str,
        "degree_level": str,
        "cipcode": str,
        "grad_cohort": str,
        "geography": str,
    },
    low_memory=False,
)

# Limit to Geographic flows for all masters students in Colorado
colorado_flows_limited = colorado_flows[
    (colorado_flows["institution"] == "08")
    & (colorado_flows["grad_cohort"] == "0000")
    & (colorado_flows["degree_level"] == "07")
    & (colorado_flows["cipcode"] == "00")
    & (colorado_flows["industry"] == "00")
]

# View Y1 grad counts by geography
print(
    colorado_flows_limited[
        ["geography", "y1_grads_emp", "y1_grads_emp_instate", "y1_grads_nme"]
    ]
)

>>        geography  y1_grads_emp  y1_grads_emp_instate  y1_grads_nme
>> 6             00       96065.0               67693.0       37376.0
>> 155769         1         735.0                   0.0           NaN
>> 155770         2        2332.0                   0.0           NaN
>> 155771         3        3442.0                   0.0           NaN
>> 155772         4        2408.0                   0.0           NaN
>> 155773         5        4590.0                   0.0           NaN
>> 155774         6         786.0                   0.0           NaN
>> 155775         7        3619.0                   0.0           NaN
>> 155776         8       71319.0               67693.0           NaN
>> 155777         9        6834.0                   0.0           NaN

R

library(readr)
library(dplyr)

state <- "co"
base_url <- "https://lehd.ces.census.gov/data/pseo/latest_release/"

# Read Colorado earnings CSV (gzipped)
colorado_flows <- read_csv(
  paste0(base_url, state, "/pseof_", state, ".csv.gz"),
  col_types = cols(
    institution = col_character(),
    degree_level = col_character(),
    cipcode = col_character(),
    grad_cohort = col_character(),
    geography = col_character()
  )
)

# Limit to Geographic flows for all masters students in Colorado
colorado_flows_limited <- colorado_flows %>%
  filter(
    institution == "08",
    grad_cohort == "0000",
    degree_level == "07",
    cipcode == "00",
    industry == "00"
  )

# View Y1 grad counts by geography
print(colorado_flows_limited %>%
  select(geography, y1_grads_emp, y1_grads_emp_instate, y1_grads_nme))

>> # A tibble: 10 × 4
>>    geography y1_grads_emp y1_grads_emp_instate y1_grads_nme
>>    <chr>            <dbl>                <dbl>        <dbl>
>>  1 00               96065                67693        37376
>>  2 1                  735                    0           NA
>>  3 2                 2332                    0           NA
>>  4 3                 3442                    0           NA
>>  5 4                 2408                    0           NA
>>  6 5                 4590                    0           NA
>>  7 6                  786                    0           NA
>>  8 7                 3619                    0           NA
>>  9 8                71319                67693           NA
>> 10 9                 6834                    0           NA

Using the Census API

To use the Census API, first the user should request an API key. Note that limited queries can be made without an API key (simply leave off the &key= portion of the request), but that for repeated queries, an API key should be used. Documentation on the PSEO API is available here. The API has two endpoints, one for flows and one for earnings. To use the API, a user should be very familiar with the PSEO data structure. This code produces the same data/output as that in Using the Online Directory Structure, but behaves slightly differently. Instead of reading all of the Colorado data from the API, the code tailors the request to the API to produce only the data needed. If a user has a very specific data need or has memory constraints, the API may therefore be a better choice.

Python

import pandas as pd
import requests

# Define your API Key (Get one from https://api.census.gov/data/key_signup.html)
API_KEY = "YOUR_API_KEY_HERE"
if API_KEY == "YOUR_API_KEY_HERE":
    raise ValueError("Please create and enter your own API Key before continuing.")

# Define the base URL for PSEO Earnings
BASE_URL = "https://api.census.gov/data/timeseries/pseo/earnings"

# Define the indicators desired (50 percentile earnings 1 year post graduation, number of grads, and status)
INDICATORS = [
    "Y1_P50_EARNINGS",
    "Y1_GRADS_EARN",
    "STATUS_Y1_EARNINGS",
]

# Want to see results by degree_level
IDENTIFIERS = ["DEGREE_LEVEL"]

# Pick variables
# State level data for all grad cohorts and all cip CODES
STATE = "08"
GRAD_COHORT = "0000"
CIP_CODE = "00"

# Construct the API request URL
url = f"{BASE_URL}?get={','.join(INDICATORS + IDENTIFIERS)}&for=us:1&INSTITUTION={STATE}&GRAD_COHORT={GRAD_COHORT}&CIPCODE={CIP_CODE}&key={API_KEY}"

# Make the request
response = requests.get(url)
data = response.json()

# Convert to DataFrame
pseo_earnings = pd.DataFrame(data[1:], columns=data[0])

# Display 50th percentile earnings for all Colorado Grads for all CIP Codes
# and all graduation cohorts by degree type
print(
    pseo_earnings[
        ["Y1_P50_EARNINGS", "DEGREE_LEVEL", "Y1_GRADS_EARN", "STATUS_Y1_EARNINGS"]
    ]
)

>>   Y1_P50_EARNINGS DEGREE_LEVEL Y1_GRADS_EARN STATUS_Y1_EARNINGS
>> 0           34540           01         78464                  1
>> 1           40177           02         26145                  1
>> 2           37982           03         79020                  1
>> 3            None           04          None                  5
>> 4           40501           05        305134                  1
>> 5           47119           06            62                  1
>> 6           62915           07         88900                  1
>> 7            None           08          None                  5
>> 8           72856           17         10500                  1
>> 9           75701           18         10524                  1

R

library(httr)
library(jsonlite)
library(dplyr)

# API key
api_key <- "YOUR_API_KEY_HERE"
if (api_key == "YOUR_API_KEY_HERE") {
  stop("Please create and enter your own API Key before continuing.")
}

# Base URL and query parameters
base_url <- "https://api.census.gov/data/timeseries/pseo/earnings"

# Define the indicators desired (50 percentile earnings 1 year post graduation, number of grads, and status)
indicators <- c("Y1_P50_EARNINGS", "Y1_GRADS_EARN", "STATUS_Y1_EARNINGS")

# Want to see results by degree_level
identifiers <- c("DEGREE_LEVEL")

# Pick variables
# State level data for all grad cohorts and all cip CODES
state <- "08"
grad_cohort <- "0000"
cip_code <- "00"

# Construct the API request URL
query_url <- paste0(
  base_url,
  "?get=", paste(c(indicators, identifiers), collapse = ","),
  "&for=us:1",
  "&INSTITUTION=", state,
  "&GRAD_COHORT=", grad_cohort,
  "&CIPCODE=", cip_code,
  "&key=", api_key
)

# Make the request
response <- GET(query_url)
json_data <- content(response, "text")
parsed_data <- fromJSON(json_data)

# Convert to data frame
colnames(parsed_data) <- parsed_data[1, ]
pseo_earnings <- as.data.frame(parsed_data[-1, ])



# Display 50th percentile earnings for all Colorado Grads for all CIP Codes
# and all graduation cohorts by degree type
pseo_earnings <- pseo_earnings %>% select(Y1_P50_EARNINGS, DEGREE_LEVEL, Y1_GRADS_EARN, STATUS_Y1_EARNINGS)
print(pseo_earnings)

>>    Y1_P50_EARNINGS DEGREE_LEVEL Y1_GRADS_EARN STATUS_Y1_EARNINGS
>> 1            34540           01         78464                  1
>> 2            40177           02         26145                  1
>> 3            37982           03         79020                  1
>> 4             <NA>           04          <NA>                  5
>> 5            40501           05        305134                  1
>> 6            47119           06            62                  1
>> 7            62915           07         88900                  1
>> 8             <NA>           08          <NA>                  5
>> 9            72856           17         10500                  1
>> 10           75701           18         10524                  1

The flows code from Using the Online Directory Structure is likewise replicated below with the API. Again, just the specific desired data is requested from the API. One difference to note here is that the call below does not produce a total (national) row. To get this row (and the number of graduates “not meaningfully employed”), the “for” statement (i.e. for=division:*) needs to be modified to be for=us:1 to receive national data.

Python

import pandas as pd
import requests

# Define your API Key (Get one from https://api.census.gov/data/key_signup.html)
API_KEY = "YOUR_API_KEY_HERE"
if API_KEY == "YOUR_API_KEY_HERE":
    raise ValueError("Please create and enter your own API Key before continuing.")

# Define the base URL for PSEO Flows
BASE_URL = "https://api.census.gov/data/timeseries/pseo/flows"

# Define the indicators desired (grads employed 1 year after graduation)
INDICATORS = ["Y1_GRADS_EMP", "Y1_GRADS_EMP_INSTATE", "Y1_GRADS_NME"]

# Want to see results by division
IDENTIFIERS = [
    "DIVISION",
]

# Pick variables
# State level data for all grad cohorts and all cip CODES
# All industries
STATE = "08"
GRAD_COHORT = "0000"
DEGREE_LEVEL = "07"
CIPCODE = "00"
NAICS = "00"

# Construct the API request URL
url = f"{BASE_URL}?get={','.join(INDICATORS + IDENTIFIERS)}&for=division:*&INSTITUTION={STATE}&GRAD_COHORT={GRAD_COHORT}&DEGREE_LEVEL={DEGREE_LEVEL}&CIPCODE={CIPCODE}&NAICS={NAICS}&key={API_KEY}"

# Make the request
response = requests.get(url)
data = response.json()

# Convert to DataFrame
pseo_flows = pd.DataFrame(data[1:], columns=data[0])

# View Y1 grad counts by geography
print(pseo_flows[["DIVISION", "Y1_GRADS_EMP", "Y1_GRADS_EMP_INSTATE", "Y1_GRADS_NME"]])

>>   DIVISION Y1_GRADS_EMP Y1_GRADS_EMP_INSTATE Y1_GRADS_NME
>> 0        1          735                    0         None
>> 1        2         2332                    0         None
>> 2        3         3442                    0         None
>> 3        4         2408                    0         None
>> 4        7         3619                    0         None
>> 5        8        71319                67693         None
>> 6        9         6834                    0         None
>> 7        5         4590                    0         None
>> 8        6          786                    0         None

R

library(httr)
library(jsonlite)
library(dplyr)

# Define your API Key (Get one from https://api.census.gov/data/key_signup.html)
api_key <- "YOUR_API_KEY_HERE"
if (api_key == "YOUR_API_KEY_HERE") {
  stop("Please create and enter your own API Key before continuing.")
}

# Define the base URL for PSEO Flows
base_url <- "https://api.census.gov/data/timeseries/pseo/flows"

# Define the indicators desired (grads employed 1 year after graduation)
indicators <- c("Y1_GRADS_EMP", "Y1_GRADS_EMP_INSTATE", "Y1_GRADS_NME")

# Want to see results by division
identifiers <- c("DIVISION")

# Pick variables
# State level data for all grad cohorts and all cip CODES
# All industries
state <- "08"
grad_cohort <- "0000"
degree_level <- "07"
cip_code <- "00"
naics <- "00"

# Construct the API request URL
query_url <- paste0(
  base_url,
  "?get=",
  paste(c(indicators, identifiers), collapse = ","),
  "&for=division:*",
  "&INSTITUTION=", state,
  "&GRAD_COHORT=", grad_cohort,
  "&DEGREE_LEVEL=", degree_level,
  "&CIPCODE=", cip_code,
  "&NAICS=", naics,
  "&key=", api_key
)

# Make the request
response <- GET(query_url)
json_data <- content(response, "text")
parsed_data <- fromJSON(json_data)

# Convert to data frame
colnames(parsed_data) <- parsed_data[1, ]
pseo_flows <- as.data.frame(parsed_data[-1, ])

# View Y1 grad counts by geography
print(pseo_flows %>%
  select(DIVISION, Y1_GRADS_EMP, Y1_GRADS_EMP_INSTATE, Y1_GRADS_NME))

>>   DIVISION Y1_GRADS_EMP Y1_GRADS_EMP_INSTATE Y1_GRADS_NME
>> 1        1          735                    0         <NA>
>> 2        2         2332                    0         <NA>
>> 3        3         3442                    0         <NA>
>> 4        4         2408                    0         <NA>
>> 5        7         3619                    0         <NA>
>> 6        8        71319                67693         <NA>
>> 7        9         6834                    0         <NA>
>> 8        5         4590                    0         <NA>
>> 9        6          786                    0         <NA>

State Data Comparisons

Another data analysis that users may wish to do is comparing flows and/or earnings data across states (which is not easily accessible in PSEO Explorer). This is relatively simple to do, but does require either reading the entire PSEO files, or combining files from different states

Case Study: Comparing Earnings in Agriculture Degrees in the Midwest

Here, we will be obtaining and plotting the earnings for agricultural degree at the state level for selected states in the Midwest. The first step here, is to decide which CIP code to use for our interest in “Agriculture”. There are 3 levels of CIP codes available in PSEO earnings data: the top level is the all CIPs (all together), the middle level is the 2-digit CIP Codes, and the most detailed is the 4-digit CIP Codes. The 4-digit CIP codes are only available on the earnings data, and are not available on the flows data. To see the full list of CIP codes associated with Agriculture, we filter the CIP labels on “Agr”. We can see there is the 2-digit CIP code “01”, that corresponds to “Agricultural/Animal/Plant/Veterinary Science and Related Fields.” This may be too broad for us here. We can also see there are several 4-digit CIP codes that fall under that CIP code, as well as one that falls under the general “Engineering” CIP code. For these purposes, let’s go with the code “01.01”, which is “Agricultural Business and Management”.

Python

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

STATES = ["il", "in", "oh", "ia"]
DATA_BASE_URL = "https://lehd.ces.census.gov/data/pseo/latest_release/"
SCHEMA_BASE_URL = "https://lehd.ces.census.gov/data/schema/latest/"


# Read in data for each state and combine
all_data = []
for state in STATES:
    state_data = pd.read_csv(
        f"{DATA_BASE_URL}{state}/pseoe_{state}.csv.gz",
        dtype={
            "institution": str,
            "degree_level": str,
            "cip_level": str,
            "cipcode": str,
            "grad_cohort": str,
        },
    )
    # limit to only state
    state_data = state_data[state_data["inst_level"] == "S"]
    all_data.append(state_data)

midwest_earnings = pd.concat(all_data)

# Read in labels for cipcodes
cip_label = pd.read_csv(f"{SCHEMA_BASE_URL}label_cipcode.csv")

# Find cip codes that are in the data
cips_in_data = cip_label[
    cip_label["cipcode"].isin(midwest_earnings["cipcode"].unique())
]

# View labels
print(cips_in_data.loc[cips_in_data["label"].str.contains("Agr"), ["cipcode", "label"]])

>>     cipcode                                              label
>> 1        01  Agricultural/Animal/Plant/Veterinary Science a...
>> 2     01.00                               Agriculture, General
>> 4     01.01               Agricultural Business and Management
>> 12    01.02                         Agricultural Mechanization
>> 18    01.03                 Agricultural Production Operations
>> 28    01.04          Agricultural and Food Products Processing
>> 31    01.05          Agricultural and Domestic Animal Services
>> 50    01.07                          International Agriculture
>> 52    01.08                       Agricultural Public Services
>> 86    01.13        Agriculture/Veterinary Preparatory Programs
>> 113   01.99  Agricultural/Animal/Plant/Veterinary Science a...
>> 515   14.03                           Agricultural Engineering

R

library(dplyr)
library(readr)
library(tidyr)
library(ggplot2)

STATES <- c("il", "in", "oh", "ia")
DATA_BASE_URL <- "https://lehd.ces.census.gov/data/pseo/latest_release/"
SCHEMA_BASE_URL <- "https://lehd.ces.census.gov/data/schema/latest/"

# Read in data for each state and combine
all_data <- list()

for (state in STATES) {
  state_url <- paste0(DATA_BASE_URL, state, "/pseoe_", state, ".csv.gz")

  state_data <- read_csv(
    state_url,
    col_types = cols(
      institution = col_character(),
      degree_level = col_character(),
      cip_level = col_character(),
      cipcode = col_character(),
      grad_cohort = col_character()
    )
  )

  # limit to only state
  state_data <- state_data %>% filter(inst_level == "S")
  all_data[[state]] <- state_data
}

midwest_earnings <- bind_rows(all_data)

# Read in labels for cipcodes
cip_label <- read_csv(paste0(SCHEMA_BASE_URL, "label_cipcode.csv"))

# Find cip codes that are in the data
cips_in_data <- cip_label %>%
  filter(cipcode %in% unique(midwest_earnings$cipcode))

# View labels
print(
  cips_in_data %>%
    filter(grepl("Agr", label)) %>%
    select(cipcode, label)
)

>> # A tibble: 12 × 2
>>    cipcode label                                                                
>>    <chr>   <chr>                                                                
>>  1 01      Agricultural/Animal/Plant/Veterinary Science and Related Fields      
>>  2 01.00   Agriculture, General                                                 
>>  3 01.01   Agricultural Business and Management                                 
>>  4 01.02   Agricultural Mechanization                                           
>>  5 01.03   Agricultural Production Operations                                   
>>  6 01.04   Agricultural and Food Products Processing                            
>>  7 01.05   Agricultural and Domestic Animal Services                            
>>  8 01.07   International Agriculture                                            
>>  9 01.08   Agricultural Public Services                                         
>> 10 01.13   Agriculture/Veterinary Preparatory Programs                          
>> 11 01.99   Agricultural/Animal/Plant/Veterinary Science and Related Fields, Oth…
>> 12 14.03   Agricultural Engineering                                             

Now we can limit the data to only the degree we’re interested in. We can also limit it to only Bachelor’s degrees, and we can select the data from all cohorts. We also join the data to the institution schema, which has the names of the states (otherwise we only have the FIPs code). Then we can create a grouped bar chart to show the 25th, 50th, and 75th percentile earnings at 5 years after graduation.

Python

# Filter data to right degree level, cipcode, and all cohorts
midwest_earnings = midwest_earnings[midwest_earnings["degree_level"] == "05"]
midwest_earnings = midwest_earnings[midwest_earnings["cipcode"] == "01.01"]
midwest_earnings = midwest_earnings[midwest_earnings["grad_cohort"] == "0000"]

# Join data to institution schema for state labels
inst_labels = pd.read_csv(
    f"{SCHEMA_BASE_URL}label_institution.csv", dtype={"institution": str}
)
midwest_earnings = midwest_earnings.merge(inst_labels, how="left", on="institution")

# Melt to long format
df_long = midwest_earnings.melt(
    id_vars="label",
    value_vars=["y5_p25_earnings", "y5_p50_earnings", "y5_p75_earnings"],
    var_name="percentile",
    value_name="earnings",
)

# Clean percentile names for display
df_long["percentile"] = df_long["percentile"].str.extract(r"y5_(p\d+)_earnings")[0]

# Pivot to make plotting easier
pivot = df_long.pivot(index="label", columns="percentile", values="earnings")
percentiles = ["p25", "p50", "p75"]  # Ensure order

# Map short codes to nicer legend labels
percentile_labels = {
    "p25": "25th Percentile",
    "p50": "50th Percentile",
    "p75": "75th Percentile",
}

# Plotting
x = np.arange(len(pivot))  # number of states
width = 0.25

fig, ax = plt.subplots(figsize=(10, 6))


for i, p in enumerate(percentiles):
    ax.bar(x + i * width, pivot[p], width=width, label=percentile_labels[p])

# Customize axes
ax.set_xticks(x + width)
ax.set_xticklabels(pivot.index)
ax.set_ylabel("Earnings ($)")
ax.set_title(
    "5-Year Post-Graduation Earnings by State and Percentile for Argiculture Degrees"
)
ax.legend(title="Percentile")
ax.grid(True, axis="y")
plt.tight_layout()

R

# Filter data to right degree level, cipcode, and all cohorts
midwest_earnings <- midwest_earnings %>%
  filter(degree_level == "05", cipcode == "01.01", grad_cohort == "0000")

# Join data to institution schema for state labels
inst_labels <- read_csv(
  paste0(SCHEMA_BASE_URL, "label_institution.csv"),
  col_types = cols(institution = col_character())
)

midwest_earnings <- midwest_earnings %>%
  left_join(inst_labels, by = "institution")

# Melt to long format
df_long <- midwest_earnings %>%
  pivot_longer(
    cols = c(y5_p25_earnings, y5_p50_earnings, y5_p75_earnings),
    names_to = "percentile",
    values_to = "earnings"
  )

# Clean percentile names for display
df_long <- df_long %>%
  mutate(percentile = gsub("y5_(p\\d+)_earnings", "\\1", percentile))

# Ensure ordering
df_long$percentile <- factor(df_long$percentile,
  levels = c("p25", "p50", "p75"),
  labels = c(
    "25th Percentile",
    "50th Percentile",
    "75th Percentile"
  )
)

# Plot
plot <- ggplot(df_long, aes(x = label, y = earnings, fill = percentile)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.8)) +
  labs(
    title = "5-Year Post-Graduation Earnings by State and Percentile for Argiculture Degrees",
    x = "State",
    y = "Earnings ($)",
    fill = "Percentile"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Post-Secondary Employment Outcomes (PSEO)

Reading Data Files

Using the Online Directory Structure

Using the Census API

Visualizing Data Over Time

Case Study: Earnings over Time for Graduates in Engineering

Case Study: How Many Engineering Graduates Stay in Illinois?

State Data Comparisons

Case Study: Comparing Earnings in Agriculture Degrees in the Midwest