File size: 4,838 Bytes
12bc5e2 81429ea 12bc5e2 4389f5f 12bc5e2 5f12af4 12bc5e2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | ---
license: other
license_name: polyform-noncommercial-1.0.0
license_link: https://polyformproject.org/licenses/noncommercial/1.0.0
library_name: pytorch
tags:
- reinforcement-learning
- gymnasium
- mujoco
- causal-gpt-rl
---
# Causal GPT-RL
GPT-style transformers (GPT-2, Llama) running as RL policies in continuous-control environments.
Both LLM generation and RL interaction are autoregressive:
```text
token → next token (LLM generation)
(state, action) → (next state from env, next action) (RL rollout)
```
Causal GPT-RL policies act stably under their own rollouts — long-horizon control without the drift that has historically kept transformers from being usable as RL agents.
A single autoregressive model drives full-episode rollouts via KV cache — no critic, no auxiliary networks at inference.
This repository is the public inference runtime. It loads policy bundles, runs Gymnasium/MuJoCo rollouts, and provides small evaluation helpers.
- **Code (GitHub):** [ccnets-team/causal-gpt-rl](https://github.com/ccnets-team/causal-gpt-rl)
- **Run logs (W&B, public):** [wandb.ai/junhopark/Causal GPT-RL](https://wandb.ai/junhopark/Causal%20GPT-RL)
- **Hugging Face org:** https://huggingface.co/ccnets
- Website: https://ccnets.org
- LinkedIn: https://www.linkedin.com/company/ccnets
## Install
For Hub loading and MuJoCo environments:
```bash
pip install "causal-gpt-rl[hub,mujoco]"
```
For local development:
```bash
git clone https://github.com/ccnets-team/causal-gpt-rl.git
cd causal-gpt-rl
python -m pip install -e ".[hub,mujoco]"
```
For private bundles, authenticate first:
```bash
hf auth login
```
## Quick Start
```python
import gymnasium as gym
from causal_gpt_rl.inference import load_runner_from_hub, run_episodes
env = gym.make("Ant-v5")
runner = load_runner_from_hub(
repo_id="ccnets/causal-gpt-rl",
subfolder="ant-v5",
)
stats = run_episodes(env, runner, num_episodes=5, seed=0)
env.close()
print(stats["return_mean"], stats["return_std"])
```
Notebook version: [examples/hub_quickstart.ipynb](https://github.com/ccnets-team/causal-gpt-rl/blob/main/examples/hub_quickstart.ipynb)
## Supported Environments
| Env | Bundle | Ctx | Return | Norm. | Medium Ref. |
|---|---|---:|---:|---:|---:|
| `Ant-v5` | `ant-v5` | 32 | 3339.51±1115.40 | 50.56±16.54 | 86.54 |
| `HalfCheetah-v5` | `halfcheetah-v5` | 32 | 5989.04±1902.22 | 37.86±11.53 | 74.83 |
| `Hopper-v5` | `hopper-v5` | 32 | 2836.28±987.67 | 73.40±25.72 | 72.91 |
| `Walker2d-v5` | `walker2d-v5` | 32 | 3883.30±684.09 | 56.69±9.99 | 83.26 |
| `Humanoid-v5` | `humanoid-v5` | 32 | 6089.64±2512.73 | 70.41±29.58 | 81.30 |
Training data is expert-free: bundles are trained using Minari simple and medium datasets only; expert trajectories are not used for training.
`Return` and `Norm.` are mean±std over 50 episodes with seeds `0..49`. `Ctx` is context length. `max_steps=1000`, and KV cache max length is capped to `Ctx`.
Normalized scores use random=0 and expert=100:
```text
100 * (return - random_ref) / (expert_ref - random_ref)
```
Medium reference scores are shown for context and are not the normalization baseline.
Evaluation runtime:
```text
causal-gpt-rl 0.2.1
torch 2.12.0+cu132
gymnasium 1.2.2
mujoco 3.8.1
minari 0.5.3
```
## Bundle Format
All public bundles include:
```text
bundle/
model.safetensors
config.json
state_normalizer.safetensors
```
- `model.safetensors` — model state dict for inference.
- `config.json` — model config, observation specs, action specs, context length,
and optional `env_id`.
- `state_normalizer.safetensors` — state normalization statistics used by the policy.
## Hugging Face Layout
Recommended layout:
```text
ccnets/causal-gpt-rl/
ant-v5/
model.safetensors
config.json
state_normalizer.safetensors
README.md
```
For local bundles, use `load_runner("path/to/bundle")`.
## API
```python
from causal_gpt_rl.inference import (
PolicyRunner, # step-wise rollout policy with KV cache
load_runner, # load runner from a local bundle directory
load_runner_from_hub, # load runner from a Hugging Face Hub repo
run_episodes, # evaluate over N episodes; returns stats dict
export_bundle, # write a bundle directory from a runner
convert_legacy_bundle_to_safetensors, # migrate legacy bundles to the safetensors format
)
```
## Development Checks
```bash
python -m compileall -q causal_gpt_rl
python -m unittest discover -s tests
python -m build
python -m twine check dist/*
```
## License
Released under PolyForm Noncommercial License 1.0.0. See `LICENSE` for details. For commercial licensing, contact the maintainers via ccnets.org.
|