| --- |
| title: Causal GPT-RL |
| emoji: π€ |
| colorFrom: indigo |
| colorTo: green |
| sdk: static |
| pinned: false |
| --- |
| |
| # Causal GPT-RL |
|
|
| GPT-style transformers (GPT-2, Llama) running as RL policies in continuous-control environments. |
|
|
| ```text |
| action β next state β next action (RL rollouts) |
| token β next token β next token (LLM generation) |
| ``` |
|
|
| Stable under self-generated rollouts β long-horizon control without the drift that has historically kept transformers from being usable as RL agents. |
|
|
| ## Get started |
|
|
| ```bash |
| pip install "causal-gpt-rl[hub,mujoco]" |
| ``` |
|
|
| ```python |
| import gymnasium as gym |
| from causal_gpt_rl.inference import load_runner_from_hub, run_episodes |
| |
| env = gym.make("Ant-v5") |
| runner = load_runner_from_hub( |
| repo_id="ccnets/causal-gpt-rl", |
| subfolder="ant-v5", |
| device="cpu", |
| ) |
| stats = run_episodes(env, runner, num_episodes=5, seed=0) |
| ``` |
|
|
| **Available bundles:** Ant-v5, HalfCheetah-v5, Walker2d-v5, Humanoid-v5 |
|
|
| - **Code:** [github.com/ccnets-team/causal-gpt-rl](https://github.com/ccnets-team/causal-gpt-rl) |
| - **Training logs (W&B, public):** [wandb.ai/junhopark/Causal GPT-RL](https://wandb.ai/junhopark/Causal%20GPT-RL?nw) |
| - **Website:** [ccnets.org](https://ccnets.org) |
|
|
| Released under PolyForm Noncommercial 1.0.0. |