spindleflow-rl / README.md
garvitsachdeva's picture
Add trained SpindleFlow RL policy
901dc66 verified
metadata
license: mit
tags:
  - reinforcement-learning
  - stable-baselines3
  - sb3-contrib
  - gymnasium
  - multi-agent
  - openenv
library_name: stable-baselines3

SpindleFlow RL — Delegation Policy

LSTM PPO agent trained on SpindleFlow-v0 (OpenEnv).

Training summary

Metric Value
Algorithm RecurrentPPO (SB3 + sb3-contrib)
Total timesteps 30,000
Episodes completed 13526
First-5 mean reward 1.2053
Last-5 mean reward 2.2038
Improvement +0.9984
Device cuda

Reward Curve

Load

from sb3_contrib import RecurrentPPO
from huggingface_hub import hf_hub_download
model = RecurrentPPO.load(hf_hub_download("garvitsachdeva/spindleflow-rl", "spindleflow_model.zip"))