--- title: Causal GPT-RL emoji: 🤖 colorFrom: indigo colorTo: green sdk: static pinned: false --- # Causal GPT-RL GPT-style transformers (GPT-2, Llama) running as RL policies in continuous-control environments. ```text action → next state → next action (RL rollouts) token → next token → next token (LLM generation) ``` Stable under self-generated rollouts — long-horizon control without the drift that has historically kept transformers from being usable as RL agents. ## Get started ```bash pip install "causal-gpt-rl[hub,mujoco]" ``` ```python import gymnasium as gym from causal_gpt_rl.inference import load_runner_from_hub, run_episodes env = gym.make("Ant-v5") runner = load_runner_from_hub( repo_id="ccnets/causal-gpt-rl", subfolder="ant-v5", device="cpu", ) stats = run_episodes(env, runner, num_episodes=5, seed=0) ``` **Available bundles:** Ant-v5, HalfCheetah-v5, Walker2d-v5, Humanoid-v5 - **Code:** [github.com/ccnets-team/causal-gpt-rl](https://github.com/ccnets-team/causal-gpt-rl) - **Training logs (W&B, public):** [wandb.ai/junhopark/Causal GPT-RL](https://wandb.ai/junhopark/Causal%20GPT-RL?nw) - **Website:** [ccnets.org](https://ccnets.org) Released under PolyForm Noncommercial 1.0.0.