Causal GPT-RL

GPT-style transformers (GPT-2, Llama) running as RL policies in continuous-control environments.

action → next state → next action      (RL rollouts)
token  → next token  → next token      (LLM generation)

Stable under self-generated rollouts — long-horizon control without the drift that has historically kept transformers from being usable as RL agents.

Bundles in this repository

Environment Subfolder Return (mean ± std)
Ant-v5 ant-v5 3033 ± 895
HalfCheetah-v5 halfcheetah-v5 2066 ± 2776
Walker2d-v5 walker2d-v5 2961 ± 756
Humanoid-v5 humanoid-v5 3634 ± 2152

Returns are over 5 episodes with seed=0, run on CPU via run_episodes.

Quick Start

pip install "causal-gpt-rl[hub,mujoco]"
import gymnasium as gym
from causal_gpt_rl.inference import load_runner_from_hub, run_episodes

env = gym.make("Ant-v5")
runner = load_runner_from_hub(
    repo_id="ccnets/causal-gpt-rl",
    subfolder="ant-v5",
)
stats = run_episodes(env, runner, num_episodes=5, seed=0)
print(stats["return_mean"], stats["return_std"])

Bundle contents

Each subfolder contains:

  • model.safetensors — model state dict for inference
  • config.json — model config, observation specs, action specs, context length
  • state_normalizer.safetensors — state normalization statistics

Links

License

PolyForm Noncommercial License 1.0.0. For commercial use, contact via ccnets.org.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading