SpindleFlow RL — Delegation Policy

LSTM PPO agent trained on SpindleFlow-v0 (OpenEnv).

Training summary

Metric	Value
Algorithm	RecurrentPPO (SB3 + sb3-contrib)
Total timesteps	30,000
Episodes completed	13526
First-5 mean reward	1.2053
Last-5 mean reward	2.2038
Improvement	+0.9984
Device	cuda

Load

from sb3_contrib import RecurrentPPO
from huggingface_hub import hf_hub_download
model = RecurrentPPO.load(hf_hub_download("garvitsachdeva/spindleflow-rl", "spindleflow_model.zip"))

Downloads last month: -

Video Preview

Reinforcement Learning

garvitsachdeva
/

spindleflow-rl

SpindleFlow RL — Delegation Policy

Training summary

Load

Spaces using garvitsachdeva/spindleflow-rl 2