Add organization card
Browse files
README.md
CHANGED
|
@@ -1,10 +1,46 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Causal GPT-RL
|
| 3 |
+
emoji: 🤖
|
| 4 |
+
colorFrom: indigo
|
| 5 |
+
colorTo: green
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# Causal GPT-RL
|
| 11 |
+
|
| 12 |
+
GPT-style transformers (GPT-2, Llama) running as RL policies in continuous-control environments.
|
| 13 |
+
|
| 14 |
+
```text
|
| 15 |
+
action → next state → next action (RL rollouts)
|
| 16 |
+
token → next token → next token (LLM generation)
|
| 17 |
+
```
|
| 18 |
+
|
| 19 |
+
Stable under self-generated rollouts — long-horizon control without the drift that has historically kept transformers from being usable as RL agents.
|
| 20 |
+
|
| 21 |
+
## Get started
|
| 22 |
+
|
| 23 |
+
```bash
|
| 24 |
+
pip install "causal-gpt-rl[hub,mujoco]"
|
| 25 |
+
```
|
| 26 |
+
|
| 27 |
+
```python
|
| 28 |
+
import gymnasium as gym
|
| 29 |
+
from causal_gpt_rl.inference import load_runner_from_hub, run_episodes
|
| 30 |
+
|
| 31 |
+
env = gym.make("Ant-v5")
|
| 32 |
+
runner = load_runner_from_hub(
|
| 33 |
+
repo_id="ccnets/causal-gpt-rl",
|
| 34 |
+
subfolder="ant-v5",
|
| 35 |
+
device="cpu",
|
| 36 |
+
)
|
| 37 |
+
stats = run_episodes(env, runner, num_episodes=5, seed=0)
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
**Available bundles:** Ant-v5, HalfCheetah-v5, Walker2d-v5, Humanoid-v5
|
| 41 |
+
|
| 42 |
+
- **Code:** [github.com/ccnets-team/causal-gpt-rl](https://github.com/ccnets-team/causal-gpt-rl)
|
| 43 |
+
- **Training logs (W&B, public):** [wandb.ai/junhopark/Causal GPT-RL](https://wandb.ai/junhopark/Causal%20GPT-RL?nw)
|
| 44 |
+
- **Website:** [ccnets.org](https://ccnets.org)
|
| 45 |
+
|
| 46 |
+
Released under PolyForm Noncommercial 1.0.0.
|