license: other
license_name: polyform-noncommercial-1.0.0
license_link: https://polyformproject.org/licenses/noncommercial/1.0.0/
library_name: safetensors
tags:
- reinforcement-learning
- offline-rl
- mujoco
- gpt
- llama
- autoregressive
- causal-gpt-rl
Causal GPT-RL
First decoder-only transformer (GPT) to reach expert level on Humanoid offline RL from medium data — beyond what Behavior Cloning and Decision Transformer can achieve.
GPT-style transformers (GPT-2, Llama) running as RL policies in continuous-control environments.
action → next state → next action (RL rollouts)
token → next token → next token (LLM generation)
Stable under self-generated rollouts — long-horizon control without the drift that has historically kept transformers from being usable as RL agents.
Bundles in this repository
| Environment | Subfolder | Context length | Return (mean ± std) |
|---|---|---|---|
| Ant-v5 | ant-v5 |
16 | 2614 ± 1515 |
| HalfCheetah-v5 | halfcheetah-v5 |
32 | 3251 ± 1916 |
| Walker2d-v5 | walker2d-v5 |
24 | 2345 ± 879 |
| Humanoid-v5 | humanoid-v5 |
32 | 2371 ± 2850 |
Returns are over 5 episodes with seed=0 (HalfCheetah-v5: 50 episodes), run on CPU via run_episodes.
Quick Start
pip install "causal-gpt-rl[hub,mujoco]"
import gymnasium as gym
from causal_gpt_rl.inference import load_runner_from_hub, run_episodes
env = gym.make("Ant-v5")
runner = load_runner_from_hub(
repo_id="ccnets/causal-gpt-rl",
subfolder="ant-v5",
)
stats = run_episodes(env, runner, num_episodes=5, seed=0)
print(stats["return_mean"], stats["return_std"])
Bundle contents
Each subfolder contains:
model.safetensors— model state dict for inferenceconfig.json— model config, observation specs, action specs, context lengthstate_normalizer.safetensors— state normalization statistics
Model details
Llama-style transformer decoder, 4 layers, 8 heads. Hidden size 192 for Ant/HalfCheetah/Walker2d, 256 for Humanoid.
Training data
Minari mujoco/{env}/simple-v0 + mujoco/{env}/medium-v0 per environment (expert split not used).
Links
- Code: github.com/ccnets-team/causal-gpt-rl
- Training logs (W&B): wandb.ai/junhopark/Causal GPT-RL
- Website: ccnets.org
License
PolyForm Noncommercial License 1.0.0. For commercial use, contact via ccnets.org.