SupplyChainEnv

OpenEnv RL environment for global supply chain disruption management. Real-world data. Adaptive curriculum. $4.4T problem.

An RL agent manages a global trade network β€” 10 real ports, 10 factories, 20 shipping routes across 5 continents. Disruptions modeled on actual events (Ever Given, COVID, Red Sea attacks) dynamically knock out nodes and edges. The agent uses MCP tools to inspect, plan, and reroute shipments before deadlines.

What makes this different from every other submission: the environment adapts to the agent. When the agent masters routing around typhoons, the environment starts generating canal blockages and cascading failures. Inspired by Meta FAIR's PAIRED and OpenAI's ADR.


Key Features

Feature Details
Adaptive Curriculum Environment learns from agent performance, targets weaknesses, auto-scales difficulty 1-10
Real-World Data UNCTAD port throughput, Freightos Baltic Index rates, Lloyd's List disruption history
8 MCP Tools view_network, view_shipments, get_routes, find_path, route_shipment, advance_day, get_disruptions, end_simulation
10 Disruption Types Typhoons, strikes, factory fires, canal blockages, pandemics, cyber attacks β€” modeled on 2017-2024 real events
Multi-Step MDP 30 simulated days, up to 100 tool calls per episode
Shaped Reward 0.50 x delivery_rate - 0.30 x loss_rate - 0.20 x cost_ratio
Gymnasium Compatible 113-dim obs, Discrete(101) actions, SB3/CleanRL/RLlib ready
Exploit-Proof Do-nothing=0.00, spam=0.00, only real routing earns reward
19 Tests Passing Core interface, tool calls, multi-step, world sim, difficulty scaling

Adaptive Curriculum (the differentiator)

Standard RL environments are static. This one learns from the agent:

Episode  1 | Level [=.........]  1/10 | Reward: 0.496 | Learning basics
Episode  3 | Level [======....]  6/10 | Reward: 0.396 | LEVEL UP! 1 -> 6
Episode  4 | Level [=======...]  7/10 | Reward: 0.477 | Targeting weaknesses (port_strike)
Episode  5 | Level [======....]  6/10 | Reward: 0.274 | Cascading crises β€” LEVEL DOWN
Episode  8 | Level [======....]  6/10 | Reward: 0.492 | LEVEL UP! 5 -> 6
Episode 10 | Level [======....]  6/10 | Reward: 0.426 | Mastered: factory_fire, monsoon

How it works:

  1. Tracks agent performance per disruption type
  2. When agent masters a type (>80% delivery), increases harder types
  3. When agent struggles (<40%), keeps frequency but adjusts deadlines
  4. Level 1: 1 disruption, normal deadlines | Level 10: 10 disruptions, 40% tighter deadlines, cascading failures

Based on:

  • Meta FAIR's PAIRED (Protagonist Antagonist Induced Regret Environment Design)
  • OpenAI's ADR (Automatic Domain Randomization, Rubik's Cube paper)
  • Jiang et al., Prioritized Level Replay (ICML 2021)
python3 curriculum.py  # Watch it adapt live

The Problem: $4.4 Trillion

Global supply chain disruptions cost $4.4 trillion in 2023 (WEF Global Risks Report).

Real Event Modeled Year Duration Economic Impact
COVID-19 port closures 2020 90 days $4.0T
Ever Given Suez blockage 2021 6 days $9.6B
LA/Long Beach congestion 2021 180 days $24B
Global semiconductor shortage 2021 365 days $240B
Felixstowe port strike 2022 8 days $800M
Panama Canal drought 2023 180 days $6B
Red Sea/Houthi attacks 2024 120 days $80B
Maersk NotPetya cyber attack 2017 14 days $300M

Baseline Comparison

Agent Easy Medium Hard Overall
Random (no routing) 0.000 0.000 0.000 0.000
Greedy (shortest path) 0.375 0.402 0.431 0.403
Smart (disruption-aware) 0.375 0.402 0.431 0.403
Optimal (theoretical) ~0.85 ~0.80 ~0.70 ~0.78

Gap for RL: 0.40 -> 0.78 = significant room for learned policies.

Score variance across 10 seeds: std=0.071 (passes >0.05 check).

Exploit checks: do-nothing=0.00, spam-tools=0.00, spam-advance=0.00.

python3 baseline_comparison.py  # Reproduce these numbers

Real-World Data Sources

Data Source Year
Port throughput (TEU) UNCTAD Review of Maritime Transport 2023
Container shipping rates Freightos Baltic Index (FBX) Q1 2024
Port efficiency (dwell times, LPI) World Bank Logistics Performance Index 2023
Disruption events Lloyd's List Intelligence, WHO, USGS, CENTCOM 2017-2024
Commodity values per TEU IPC, CONAB, BGMEA, company reports 2023
Factory output TSMC, VW, BASF, Samsung annual reports 2023

Formal MDP Definition

M = (S, A, T, R, gamma)

S: Network state = (port_status[10], factory_status[10], route_status[20],
                     disruption_state[10], shipment_state[N], inventory[10],
                     day, budget)
   |S| ~ 10^15 (intractable for tabular methods)

A: MCP tool calls with structured arguments
   Agent chooses WHICH shipment to route, WHICH path, WHEN to act

T: Stochastic transitions β€” disruptions start/end, shipments move,
   deadlines create irreversible loss events

R: Shaped = 0.50 * delivery_rate - 0.30 * loss_rate - 0.20 * cost_ratio
   Intermediate: +0.1 per delivery, -0.15 per missed deadline

gamma = 1.0 (finite horizon, 30 days)

Why This MDP is Hard

  1. Combinatorial: N shipments x M routes = O(N*M) routing decisions per day
  2. Partial observability: Agent must discover disruptions via tool calls
  3. Irreversible: Missed deadline = shipment lost forever
  4. Cascading failures: Canal blockage + port strike can isolate regions
  5. Multi-objective: Speed vs cost vs risk
  6. Adaptive: Curriculum increases difficulty as agent improves

MCP Tools

Tool What it does
view_network All ports (status, throughput, LPI), factories, warehouses, disruptions
view_shipments All shipments: product, TEU value, deadline, route, location
get_routes Open routes from a port with FBX rates, transit days, carrier, mode
find_path BFS shortest open path to warehouse's nearest port
route_shipment Assign route β€” validates, computes cost, starts movement
advance_day Simulate 1 day β€” disruptions activate/resolve, shipments move, deadlines checked
get_disruptions All disruptions: type, severity, affected nodes, start/end day
end_simulation End early, fast-forward remaining days, compute final reward

Quick Start

pip install -r requirements.txt

# Run these in order:
python3 demo.py                    # Live agent routing shipments under disruption
python3 curriculum.py              # Watch adaptive difficulty in action
python3 baseline_comparison.py     # Random vs Greedy vs Smart agent comparison
python3 visualize.py               # Open world map in browser
pytest tests/ -v                   # 19 tests, all passing

Gymnasium Compatible

from gym_wrapper import SupplyChainGymEnv

env = SupplyChainGymEnv(difficulty="hard")
obs, info = env.reset(seed=42)
print(obs.shape)  # (113,)

action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
# Works with Stable-Baselines3, CleanRL, RLlib

Difficulty Tiers

Tier Shipments Disruptions Deadlines Real-world analog
Easy 8 2 Normal Single port closure
Medium 15 4 15% tighter Regional disruption (Felixstowe strike)
Hard 25 7 30% tighter Cascading crisis (COVID + Suez + chip shortage)
Curriculum L10 25 10 40% tighter Everything at once + agent's weakest types

Project Structure

supply-chain-env/
  world.py                # Core simulator β€” 10 ports, 20 routes, disruptions, shipments
  real_data.py             # UNCTAD, FBX, Lloyd's List, WHO sourced data
  curriculum.py            # Adaptive curriculum β€” targets agent weaknesses
  gym_wrapper.py           # Gymnasium wrapper (113-dim, SB3-compatible)
  visualize.py             # HTML world map dashboard
  demo.py                  # Live terminal demo
  baseline_comparison.py   # Agent comparison (Random vs Greedy vs Smart)
  models.py                # MCP Action/Observation/State
  client.py                # OpenEnv WebSocket client
  inference.py             # LLM agent evaluation (mandatory)
  server/
    app.py                 # FastAPI + OpenEnv create_app()
    supply_chain_environment.py  # MCP tool-calling environment
  tests/
    test_supply_chain.py   # 19 tests
  openenv.yaml             # spec_version: 1, MCP enabled
  Dockerfile               # Production container with HEALTHCHECK
  requirements.txt
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading