SupplyChainEnv

OpenEnv RL environment for global supply chain disruption management. Real-world data. Adaptive curriculum. $4.4T problem.

An RL agent manages a global trade network — 10 real ports, 10 factories, 20 shipping routes across 5 continents. Disruptions modeled on actual events (Ever Given, COVID, Red Sea attacks) dynamically knock out nodes and edges. The agent uses MCP tools to inspect, plan, and reroute shipments before deadlines.

What makes this different from every other submission: the environment adapts to the agent. When the agent masters routing around typhoons, the environment starts generating canal blockages and cascading failures. Inspired by Meta FAIR's PAIRED and OpenAI's ADR.

Key Features

Feature	Details
Adaptive Curriculum	Environment learns from agent performance, targets weaknesses, auto-scales difficulty 1-10
Real-World Data	UNCTAD port throughput, Freightos Baltic Index rates, Lloyd's List disruption history
8 MCP Tools	view_network, view_shipments, get_routes, find_path, route_shipment, advance_day, get_disruptions, end_simulation
10 Disruption Types	Typhoons, strikes, factory fires, canal blockages, pandemics, cyber attacks — modeled on 2017-2024 real events
Multi-Step MDP	30 simulated days, up to 100 tool calls per episode
Shaped Reward	0.50 x delivery_rate - 0.30 x loss_rate - 0.20 x cost_ratio
Gymnasium Compatible	113-dim obs, Discrete(101) actions, SB3/CleanRL/RLlib ready
Exploit-Proof	Do-nothing=0.00, spam=0.00, only real routing earns reward
19 Tests Passing	Core interface, tool calls, multi-step, world sim, difficulty scaling

Adaptive Curriculum (the differentiator)

Standard RL environments are static. This one learns from the agent:

Episode  1 | Level [=.........]  1/10 | Reward: 0.496 | Learning basics
Episode  3 | Level [======....]  6/10 | Reward: 0.396 | LEVEL UP! 1 -> 6
Episode  4 | Level [=======...]  7/10 | Reward: 0.477 | Targeting weaknesses (port_strike)
Episode  5 | Level [======....]  6/10 | Reward: 0.274 | Cascading crises — LEVEL DOWN
Episode  8 | Level [======....]  6/10 | Reward: 0.492 | LEVEL UP! 5 -> 6
Episode 10 | Level [======....]  6/10 | Reward: 0.426 | Mastered: factory_fire, monsoon

How it works:

Tracks agent performance per disruption type
When agent masters a type (>80% delivery), increases harder types
When agent struggles (<40%), keeps frequency but adjusts deadlines
Level 1: 1 disruption, normal deadlines | Level 10: 10 disruptions, 40% tighter deadlines, cascading failures

Based on:

Meta FAIR's PAIRED (Protagonist Antagonist Induced Regret Environment Design)
OpenAI's ADR (Automatic Domain Randomization, Rubik's Cube paper)
Jiang et al., Prioritized Level Replay (ICML 2021)

python3 curriculum.py  # Watch it adapt live

The Problem: $4.4 Trillion

Global supply chain disruptions cost $4.4 trillion in 2023 (WEF Global Risks Report).

Real Event Modeled	Year	Duration	Economic Impact
COVID-19 port closures	2020	90 days	$4.0T
Ever Given Suez blockage	2021	6 days	$9.6B
LA/Long Beach congestion	2021	180 days	$24B
Global semiconductor shortage	2021	365 days	$240B
Felixstowe port strike	2022	8 days	$800M
Panama Canal drought	2023	180 days	$6B
Red Sea/Houthi attacks	2024	120 days	$80B
Maersk NotPetya cyber attack	2017	14 days	$300M

Baseline Comparison

Agent	Easy	Medium	Hard	Overall
Random (no routing)	0.000	0.000	0.000	0.000
Greedy (shortest path)	0.375	0.402	0.431	0.403
Smart (disruption-aware)	0.375	0.402	0.431	0.403
Optimal (theoretical)	~0.85	~0.80	~0.70	~0.78

Gap for RL: 0.40 -> 0.78 = significant room for learned policies.

Score variance across 10 seeds: std=0.071 (passes >0.05 check).

Exploit checks: do-nothing=0.00, spam-tools=0.00, spam-advance=0.00.

python3 baseline_comparison.py  # Reproduce these numbers

Real-World Data Sources

Data	Source	Year
Port throughput (TEU)	UNCTAD Review of Maritime Transport	2023
Container shipping rates	Freightos Baltic Index (FBX)	Q1 2024
Port efficiency (dwell times, LPI)	World Bank Logistics Performance Index	2023
Disruption events	Lloyd's List Intelligence, WHO, USGS, CENTCOM	2017-2024
Commodity values per TEU	IPC, CONAB, BGMEA, company reports	2023
Factory output	TSMC, VW, BASF, Samsung annual reports	2023

Formal MDP Definition

M = (S, A, T, R, gamma)

S: Network state = (port_status[10], factory_status[10], route_status[20],
                     disruption_state[10], shipment_state[N], inventory[10],
                     day, budget)
   |S| ~ 10^15 (intractable for tabular methods)

A: MCP tool calls with structured arguments
   Agent chooses WHICH shipment to route, WHICH path, WHEN to act

T: Stochastic transitions — disruptions start/end, shipments move,
   deadlines create irreversible loss events

R: Shaped = 0.50 * delivery_rate - 0.30 * loss_rate - 0.20 * cost_ratio
   Intermediate: +0.1 per delivery, -0.15 per missed deadline

gamma = 1.0 (finite horizon, 30 days)

Why This MDP is Hard

Combinatorial: N shipments x M routes = O(N*M) routing decisions per day
Partial observability: Agent must discover disruptions via tool calls
Irreversible: Missed deadline = shipment lost forever
Cascading failures: Canal blockage + port strike can isolate regions
Multi-objective: Speed vs cost vs risk
Adaptive: Curriculum increases difficulty as agent improves

MCP Tools

Tool	What it does
`view_network`	All ports (status, throughput, LPI), factories, warehouses, disruptions
`view_shipments`	All shipments: product, TEU value, deadline, route, location
`get_routes`	Open routes from a port with FBX rates, transit days, carrier, mode
`find_path`	BFS shortest open path to warehouse's nearest port
`route_shipment`	Assign route — validates, computes cost, starts movement
`advance_day`	Simulate 1 day — disruptions activate/resolve, shipments move, deadlines checked
`get_disruptions`	All disruptions: type, severity, affected nodes, start/end day
`end_simulation`	End early, fast-forward remaining days, compute final reward

Quick Start

pip install -r requirements.txt

# Run these in order:
python3 demo.py                    # Live agent routing shipments under disruption
python3 curriculum.py              # Watch adaptive difficulty in action
python3 baseline_comparison.py     # Random vs Greedy vs Smart agent comparison
python3 visualize.py               # Open world map in browser
pytest tests/ -v                   # 19 tests, all passing

Gymnasium Compatible

from gym_wrapper import SupplyChainGymEnv

env = SupplyChainGymEnv(difficulty="hard")
obs, info = env.reset(seed=42)
print(obs.shape)  # (113,)

action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
# Works with Stable-Baselines3, CleanRL, RLlib

Difficulty Tiers

Tier	Shipments	Disruptions	Deadlines	Real-world analog
Easy	8	2	Normal	Single port closure
Medium	15	4	15% tighter	Regional disruption (Felixstowe strike)
Hard	25	7	30% tighter	Cascading crisis (COVID + Suez + chip shortage)
Curriculum L10	25	10	40% tighter	Everything at once + agent's weakest types

Project Structure

supply-chain-env/
  world.py                # Core simulator — 10 ports, 20 routes, disruptions, shipments
  real_data.py             # UNCTAD, FBX, Lloyd's List, WHO sourced data
  curriculum.py            # Adaptive curriculum — targets agent weaknesses
  gym_wrapper.py           # Gymnasium wrapper (113-dim, SB3-compatible)
  visualize.py             # HTML world map dashboard
  demo.py                  # Live terminal demo
  baseline_comparison.py   # Agent comparison (Random vs Greedy vs Smart)
  models.py                # MCP Action/Observation/State
  client.py                # OpenEnv WebSocket client
  inference.py             # LLM agent evaluation (mandatory)
  server/
    app.py                 # FastAPI + OpenEnv create_app()
    supply_chain_environment.py  # MCP tool-calling environment
  tests/
    test_supply_chain.py   # 19 tests
  openenv.yaml             # spec_version: 1, MCP enabled
  Dockerfile               # Production container with HEALTHCHECK
  requirements.txt

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning