kissin42 commited on
Commit
12bc5e2
·
verified ·
1 Parent(s): fbde1d3

Sync README: clarify autoregressive structure, move license to footer

Browse files
Files changed (1) hide show
  1. README.md +165 -167
README.md CHANGED
@@ -1,167 +1,165 @@
1
- ---
2
- license: other
3
- license_name: polyform-noncommercial-1.0.0
4
- license_link: https://polyformproject.org/licenses/noncommercial/1.0.0
5
- library_name: pytorch
6
- tags:
7
- - reinforcement-learning
8
- - gymnasium
9
- - mujoco
10
- - causal-gpt-rl
11
- ---
12
-
13
- # Causal GPT-RL
14
-
15
- GPT-style transformers (GPT-2, Llama) running as RL policies in continuous-control environments.
16
-
17
- The autoregressive structure is the same on both sides:
18
-
19
- ```text
20
- action → next state next action (RL rollouts)
21
- token → next token → next token (LLM generation)
22
- ```
23
-
24
- Causal GPT-RL policies act stably under their own rollouts — long-horizon control without the drift that has historically kept transformers from being usable as RL agents.
25
-
26
- A single autoregressive model drives full-episode rollouts via KV cache — no critic, no auxiliary networks at inference.
27
-
28
- This repository is the public inference runtime. It loads policy bundles, runs Gymnasium/MuJoCo rollouts, and provides small evaluation helpers.
29
-
30
- - **Code (GitHub):** [ccnets-team/causal-gpt-rl](https://github.com/ccnets-team/causal-gpt-rl)
31
- - **Run logs (W&B, public):** [wandb.ai/junhopark/Causal GPT-RL](https://wandb.ai/junhopark/Causal%20GPT-RL)
32
- - **Hugging Face org:** https://huggingface.co/ccnets
33
- - Website: https://ccnets.org
34
- - LinkedIn: https://www.linkedin.com/company/ccnets
35
-
36
- Released under PolyForm Noncommercial 1.0.0. For commercial licensing, contact the maintainers via ccnets.org.
37
-
38
- ## Install
39
-
40
- For Hub loading and MuJoCo environments:
41
-
42
- ```bash
43
- pip install "causal-gpt-rl[hub,mujoco]"
44
- ```
45
-
46
- For local development:
47
-
48
- ```bash
49
- git clone https://github.com/ccnets-team/causal-gpt-rl.git
50
- cd causal-gpt-rl
51
- python -m pip install -e ".[hub,mujoco]"
52
- ```
53
-
54
- For private bundles, authenticate first:
55
-
56
- ```bash
57
- hf auth login
58
- ```
59
-
60
- ## Quick Start
61
-
62
- ```python
63
- import gymnasium as gym
64
-
65
- from causal_gpt_rl.inference import load_runner_from_hub, run_episodes
66
-
67
- env = gym.make("Ant-v5")
68
- runner = load_runner_from_hub(
69
- repo_id="ccnets/causal-gpt-rl",
70
- subfolder="ant-v5",
71
- )
72
-
73
- stats = run_episodes(env, runner, num_episodes=5, seed=0)
74
- env.close()
75
- print(stats["return_mean"], stats["return_std"])
76
- ```
77
-
78
- Notebook version: [examples/hub_quickstart.ipynb](https://github.com/ccnets-team/causal-gpt-rl/blob/main/examples/hub_quickstart.ipynb)
79
-
80
- ## Supported Environments
81
-
82
- | Env | Bundle | Ctx | Return | Norm. | Medium Ref. |
83
- |---|---|---:|---:|---:|---:|
84
- | `Ant-v5` | `ant-v5` | 32 | 3339.51±1115.40 | 50.56±16.54 | 86.54 |
85
- | `HalfCheetah-v5` | `halfcheetah-v5` | 32 | 4877.39±1899.50 | 31.12±11.51 | 74.83 |
86
- | `Hopper-v5` | `hopper-v5` | 32 | 2836.28±987.67 | 73.40±25.72 | 72.91 |
87
- | `Walker2d-v5` | `walker2d-v5` | 32 | 3883.30±684.09 | 56.69±9.99 | 83.26 |
88
- | `Humanoid-v5` | `humanoid-v5` | 32 | 6089.64±2512.73 | 70.41±29.58 | 81.30 |
89
-
90
- Training data is expert-free: bundles are trained using Minari simple and medium datasets only; expert trajectories are not used for training.
91
-
92
- `Return` and `Norm.` are mean±std over 50 episodes with seeds `0..49`. `Ctx` is context length. `max_steps=1000`, and KV cache max length is capped to `Ctx`.
93
-
94
- Normalized scores use random=0 and expert=100:
95
-
96
- ```text
97
- 100 * (return - random_ref) / (expert_ref - random_ref)
98
- ```
99
-
100
- Medium reference scores are shown for context and are not the normalization baseline.
101
-
102
- Evaluation runtime:
103
-
104
- ```text
105
- causal-gpt-rl 0.2.1
106
- torch 2.12.0+cu132
107
- gymnasium 1.2.2
108
- mujoco 3.8.1
109
- minari 0.5.3
110
- ```
111
-
112
- ## Bundle Format
113
-
114
- All public bundles include:
115
-
116
- ```text
117
- bundle/
118
- model.safetensors
119
- config.json
120
- state_normalizer.safetensors
121
- ```
122
-
123
- - `model.safetensors` — model state dict for inference.
124
- - `config.json` — model config, observation specs, action specs, context length,
125
- and optional `env_id`.
126
- - `state_normalizer.safetensors` state normalization statistics used by the policy.
127
-
128
- ## Hugging Face Layout
129
-
130
- Recommended layout:
131
-
132
- ```text
133
- ccnets/causal-gpt-rl/
134
- ant-v5/
135
- model.safetensors
136
- config.json
137
- state_normalizer.safetensors
138
- README.md
139
- ```
140
-
141
- For local bundles, use `load_runner("path/to/bundle")`.
142
-
143
- ## API
144
-
145
- ```python
146
- from causal_gpt_rl.inference import (
147
- PolicyRunner, # step-wise rollout policy with KV cache
148
- load_runner, # load runner from a local bundle directory
149
- load_runner_from_hub, # load runner from a Hugging Face Hub repo
150
- run_episodes, # evaluate over N episodes; returns stats dict
151
- export_bundle, # write a bundle directory from a runner
152
- convert_legacy_bundle_to_safetensors, # migrate legacy bundles to the safetensors format
153
- )
154
- ```
155
-
156
- ## Development Checks
157
-
158
- ```bash
159
- python -m compileall -q causal_gpt_rl
160
- python -m unittest discover -s tests
161
- python -m build
162
- python -m twine check dist/*
163
- ```
164
-
165
- ## License
166
-
167
- PolyForm Noncommercial License 1.0.0. See `LICENSE` for details.
 
1
+ ---
2
+ license: other
3
+ license_name: polyform-noncommercial-1.0.0
4
+ license_link: https://polyformproject.org/licenses/noncommercial/1.0.0
5
+ library_name: pytorch
6
+ tags:
7
+ - reinforcement-learning
8
+ - gymnasium
9
+ - mujoco
10
+ - causal-gpt-rl
11
+ ---
12
+
13
+ # Causal GPT-RL
14
+
15
+ GPT-style transformers (GPT-2, Llama) running as RL policies in continuous-control environments.
16
+
17
+ The autoregressive structure is the same on both sides:
18
+
19
+ ```text
20
+ (state, action)(next state from env, next action) (RL rollout)
21
+ token → next token (LLM generation)
22
+ ```
23
+
24
+ Causal GPT-RL policies act stably under their own rollouts — long-horizon control without the drift that has historically kept transformers from being usable as RL agents.
25
+
26
+ A single autoregressive model drives full-episode rollouts via KV cache — no critic, no auxiliary networks at inference.
27
+
28
+ This repository is the public inference runtime. It loads policy bundles, runs Gymnasium/MuJoCo rollouts, and provides small evaluation helpers.
29
+
30
+ - **Code (GitHub):** [ccnets-team/causal-gpt-rl](https://github.com/ccnets-team/causal-gpt-rl)
31
+ - **Run logs (W&B, public):** [wandb.ai/junhopark/Causal GPT-RL](https://wandb.ai/junhopark/Causal%20GPT-RL)
32
+ - **Hugging Face org:** https://huggingface.co/ccnets
33
+ - Website: https://ccnets.org
34
+ - LinkedIn: https://www.linkedin.com/company/ccnets
35
+
36
+ ## Install
37
+
38
+ For Hub loading and MuJoCo environments:
39
+
40
+ ```bash
41
+ pip install "causal-gpt-rl[hub,mujoco]"
42
+ ```
43
+
44
+ For local development:
45
+
46
+ ```bash
47
+ git clone https://github.com/ccnets-team/causal-gpt-rl.git
48
+ cd causal-gpt-rl
49
+ python -m pip install -e ".[hub,mujoco]"
50
+ ```
51
+
52
+ For private bundles, authenticate first:
53
+
54
+ ```bash
55
+ hf auth login
56
+ ```
57
+
58
+ ## Quick Start
59
+
60
+ ```python
61
+ import gymnasium as gym
62
+
63
+ from causal_gpt_rl.inference import load_runner_from_hub, run_episodes
64
+
65
+ env = gym.make("Ant-v5")
66
+ runner = load_runner_from_hub(
67
+ repo_id="ccnets/causal-gpt-rl",
68
+ subfolder="ant-v5",
69
+ )
70
+
71
+ stats = run_episodes(env, runner, num_episodes=5, seed=0)
72
+ env.close()
73
+ print(stats["return_mean"], stats["return_std"])
74
+ ```
75
+
76
+ Notebook version: [examples/hub_quickstart.ipynb](https://github.com/ccnets-team/causal-gpt-rl/blob/main/examples/hub_quickstart.ipynb)
77
+
78
+ ## Supported Environments
79
+
80
+ | Env | Bundle | Ctx | Return | Norm. | Medium Ref. |
81
+ |---|---|---:|---:|---:|---:|
82
+ | `Ant-v5` | `ant-v5` | 32 | 3339.51±1115.40 | 50.56±16.54 | 86.54 |
83
+ | `HalfCheetah-v5` | `halfcheetah-v5` | 32 | 4877.39±1899.50 | 31.12±11.51 | 74.83 |
84
+ | `Hopper-v5` | `hopper-v5` | 32 | 2836.28±987.67 | 73.40±25.72 | 72.91 |
85
+ | `Walker2d-v5` | `walker2d-v5` | 32 | 3883.30±684.09 | 56.69±9.99 | 83.26 |
86
+ | `Humanoid-v5` | `humanoid-v5` | 32 | 6089.64±2512.73 | 70.41±29.58 | 81.30 |
87
+
88
+ Training data is expert-free: bundles are trained using Minari simple and medium datasets only; expert trajectories are not used for training.
89
+
90
+ `Return` and `Norm.` are mean±std over 50 episodes with seeds `0..49`. `Ctx` is context length. `max_steps=1000`, and KV cache max length is capped to `Ctx`.
91
+
92
+ Normalized scores use random=0 and expert=100:
93
+
94
+ ```text
95
+ 100 * (return - random_ref) / (expert_ref - random_ref)
96
+ ```
97
+
98
+ Medium reference scores are shown for context and are not the normalization baseline.
99
+
100
+ Evaluation runtime:
101
+
102
+ ```text
103
+ causal-gpt-rl 0.2.1
104
+ torch 2.12.0+cu132
105
+ gymnasium 1.2.2
106
+ mujoco 3.8.1
107
+ minari 0.5.3
108
+ ```
109
+
110
+ ## Bundle Format
111
+
112
+ All public bundles include:
113
+
114
+ ```text
115
+ bundle/
116
+ model.safetensors
117
+ config.json
118
+ state_normalizer.safetensors
119
+ ```
120
+
121
+ - `model.safetensors` — model state dict for inference.
122
+ - `config.json` — model config, observation specs, action specs, context length,
123
+ and optional `env_id`.
124
+ - `state_normalizer.safetensors` — state normalization statistics used by the policy.
125
+
126
+ ## Hugging Face Layout
127
+
128
+ Recommended layout:
129
+
130
+ ```text
131
+ ccnets/causal-gpt-rl/
132
+ ant-v5/
133
+ model.safetensors
134
+ config.json
135
+ state_normalizer.safetensors
136
+ README.md
137
+ ```
138
+
139
+ For local bundles, use `load_runner("path/to/bundle")`.
140
+
141
+ ## API
142
+
143
+ ```python
144
+ from causal_gpt_rl.inference import (
145
+ PolicyRunner, # step-wise rollout policy with KV cache
146
+ load_runner, # load runner from a local bundle directory
147
+ load_runner_from_hub, # load runner from a Hugging Face Hub repo
148
+ run_episodes, # evaluate over N episodes; returns stats dict
149
+ export_bundle, # write a bundle directory from a runner
150
+ convert_legacy_bundle_to_safetensors, # migrate legacy bundles to the safetensors format
151
+ )
152
+ ```
153
+
154
+ ## Development Checks
155
+
156
+ ```bash
157
+ python -m compileall -q causal_gpt_rl
158
+ python -m unittest discover -s tests
159
+ python -m build
160
+ python -m twine check dist/*
161
+ ```
162
+
163
+ ## License
164
+
165
+ Released under PolyForm Noncommercial License 1.0.0. See `LICENSE` for details. For commercial licensing, contact the maintainers via ccnets.org.