kuds commited on
Commit
3a28045
·
verified ·
1 Parent(s): 8281b1b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -125
README.md CHANGED
@@ -1,171 +1,99 @@
1
  ---
2
- license: mit
3
- language:
4
- - en
5
  library_name: stable-baselines3
6
  tags:
7
  - reinforcement-learning
8
- - mujoco
9
- - locomotion
10
- - robotics
11
- - curriculum-learning
12
- - dinosaurs
13
- - gymnasium
14
  model-index:
15
- - name: PPO-Velociraptor
16
  results:
17
  - task:
18
  type: reinforcement-learning
19
  name: reinforcement-learning
20
  dataset:
21
- name: MesozoicLabs/Raptor-v0
22
- type: MesozoicLabs/Raptor-v0
23
  metrics:
24
  - type: mean_reward
25
- value: 1366.19 +/- 76.29
26
  name: mean_reward
27
  verified: false
28
- - type: success_rate
29
- value: 93.3%
30
- name: strike_success_rate
31
- verified: false
32
  ---
33
 
34
- # **PPO** Agents for Robotic Dinosaur Locomotion — **Mesozoic Labs**
35
-
36
- ![Trained PPO Agent](/results/velociraptor/ppo/stage1_balance.gif)
37
-
38
- This repository contains **PPO** (Proximal Policy Optimization) agents trained to control robotic dinosaurs in MuJoCo physics simulation. Each species is trained using a 3-stage curriculum learning approach.
39
-
40
- - [GitHub Repository](https://github.com/kuds/mesozoic-labs)
41
- - [Documentation](https://mesozoiclabs.com)
42
- - [Blog: From Zero to Dino-Roar](https://www.findingtheta.com/blog/from-zero-to-dino-roar-teaching-a-t-rex-to-walk-with-mujoco-and-reinforcement-learning)
43
-
44
- ## Species & Training Results
45
-
46
- ### Velociraptor (PPO) — All 3 stages passed | 22M steps | 11:25:15 total
47
-
48
- A bipedal predator with sickle claws, trained on 3 curriculum stages:
49
-
50
- | Stage | Name | Best Reward | Avg Forward Vel | Success Rate | Time |
51
- |-------|------|-------------|-----------------|--------------|------|
52
- | 1 | Balance | 1964.43 +/- 27.39 | 0.11 m/s | — | 2:57:25 |
53
- | 2 | Locomotion | 2678.68 +/- 4.07 | 3.47 m/s | — | 4:35:55 |
54
- | 3 | Strike | 1366.19 +/- 76.29 | 2.02 m/s | 93.3% | 3:51:54 |
55
-
56
- ## Training Details
57
-
58
- - **Algorithm:** PPO (Proximal Policy Optimization) via [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3)
59
- - **Physics Engine:** [MuJoCo](https://mujoco.org/) (>= 3.0)
60
- - **Environment Framework:** [Gymnasium](https://gymnasium.farama.org/) (>= 0.29)
61
- - **Hardware:** Google Colab L4 GPU
62
- - **Seed:** 42
63
- - **Parallel Envs:** 4
64
- - **Curriculum:** 3-stage progressive training (Balance → Locomotion → Species-specific task)
65
 
66
- ## Environment Details
67
 
68
- | Species | Observation Dims | Action Dims | Gymnasium ID |
69
- |---------|-----------------|-------------|--------------|
70
- | Velociraptor | 67 | 22 | `MesozoicLabs/Raptor-v0` |
71
-
72
- ## Usage
73
-
74
- ### Installation
75
-
76
- ```bash
77
- git clone https://github.com/kuds/mesozoic-labs.git
78
- cd mesozoic-labs
79
-
80
- python -m venv venv
81
- source venv/bin/activate
82
-
83
- # Install with training dependencies
84
- pip install -e ".[train]"
85
- ```
86
-
87
- ### Loading a Trained Model
88
-
89
- ```python
90
- from stable_baselines3 import PPO
91
  import gymnasium as gym
 
 
 
 
92
 
93
- # Register Mesozoic Labs environments
94
- import environments
95
-
96
- # Load the trained model (e.g., velociraptor stage 3)
97
- model = PPO.load("path/to/best_model.zip")
98
 
99
- # Create the environment
100
- env = gym.make("MesozoicLabs/Raptor-v0", render_mode="human")
 
 
101
 
102
- # Run the trained agent
103
  obs, info = env.reset()
 
 
104
  for _ in range(1000):
105
  action, _states = model.predict(obs, deterministic=True)
106
- obs, reward, terminated, truncated, info = env.step(action)
107
  if terminated or truncated:
108
  obs, info = env.reset()
 
109
  env.close()
110
  ```
111
 
112
- ### Training from Scratch
113
-
114
- ```bash
115
- # Full 3-stage curriculum for velociraptor
116
- cd environments/velociraptor
117
- python scripts/train_sb3.py curriculum --algorithm ppo
118
-
119
- # Single stage training
120
- python scripts/train_sb3.py train --stage 1 --timesteps 6000000 --n-envs 4
121
- ```
122
 
123
- ### Loading from Hugging Face Hub
124
 
125
- ```bash
126
  pip install huggingface_hub
127
  ```
128
 
129
- ```python
 
 
130
  from huggingface_hub import hf_hub_download
131
- from stable_baselines3 import PPO
132
  import gymnasium as gym
133
- import environments
 
 
 
 
 
 
 
134
 
135
- # Download the model from the Hub
136
- model_path = hf_hub_download(
137
- repo_id="kuds/mesozoic-labs",
138
- filename="results/velociraptor/ppo/best_model.zip"
139
- )
140
 
141
- # Load the model
 
142
  model = PPO.load(model_path)
143
 
144
- # Create the environment
145
- env = gym.make("MesozoicLabs/Raptor-v0", render_mode="human")
 
 
146
 
147
- # Run the trained agent
148
- obs, info = env.reset()
149
- for _ in range(1000):
150
  action, _states = model.predict(obs, deterministic=True)
151
- obs, reward, terminated, truncated, info = env.step(action)
152
- if terminated or truncated:
153
- obs, info = env.reset()
154
- env.close()
155
- ```
156
-
157
- ## Citation
158
 
159
- ```bibtex
160
- @misc{mesozoic-labs,
161
- author = {Mesozoic Labs Contributors},
162
- title = {Mesozoic Labs: Robotic Dinosaur Locomotion with Reinforcement Learning},
163
- year = {2026},
164
- publisher = {GitHub / Hugging Face},
165
- url = {https://github.com/kuds/mesozoic-labs}
166
- }
167
  ```
168
 
169
- ## License
170
-
171
- MIT License
 
1
  ---
 
 
 
2
  library_name: stable-baselines3
3
  tags:
4
  - reinforcement-learning
5
+ - BreakoutNoFrameskip-v4
 
 
 
 
 
6
  model-index:
7
+ - name: PPO
8
  results:
9
  - task:
10
  type: reinforcement-learning
11
  name: reinforcement-learning
12
  dataset:
13
+ name: BreakoutNoFrameskip-v4
14
+ type: BreakoutNoFrameskip-v4
15
  metrics:
16
  - type: mean_reward
17
+ value: 187.80 +/- 114.62
18
  name: mean_reward
19
  verified: false
 
 
 
 
20
  ---
21
 
22
+ # **PPO** Agent playing **BreakoutNoFrameskip-v4**
23
+ - [Github Repository](https://github.com/kuds/rl-atari-breakout)
24
+ - [Google Colab Notebook](https://colab.research.google.com/github/kuds/rl-atari-breakout/blob/main/%5BAtari%20Breakout%5D%20Single-Agent%20Reinforcement%20Learning%20PPO.ipynb)
25
+ - [Finding Theta - Blog Post](https://www.findingtheta.com/blog/beginners-guide-to-model-based-reinforcement-learning-mbrl-with-ataris-breakout)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
+ Then, you can load the model using the following Python code:
28
 
29
+ ```python
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  import gymnasium as gym
31
+ from stable_baselines3 import PPO
32
+ from stable_baselines3.common.env_util import make_atari_env
33
+ from stable_baselines3.common.vec_env import VecTransposeImage
34
+ from stable_baselines3.common.atari_wrappers import WarpFrame
35
 
36
+ # Load the trained model
37
+ model = PPO.load("best-model.zip")
 
 
 
38
 
39
+ # Create the environment
40
+ env = make_atari_env("BreakoutNoFrameskip-v4", n_envs=1)
41
+ env = VecFrameStack(env, n_stack=4)
42
+ env = VecTransposeImage(env)
43
 
44
+ # Reset the environment
45
  obs, info = env.reset()
46
+
47
+ # Enjoy the trained agent
48
  for _ in range(1000):
49
  action, _states = model.predict(obs, deterministic=True)
50
+ obs, rewards, terminated, truncated, info = env.step(action)
51
  if terminated or truncated:
52
  obs, info = env.reset()
53
+ env.render()
54
  env.close()
55
  ```
56
 
57
+ ### Hugging Face Hub
 
 
 
 
 
 
 
 
 
58
 
59
+ You can also use the Hugging Face Hub to load the model. First, you need to install the Hugging Face Hub library:
60
 
61
+ ```bash
62
  pip install huggingface_hub
63
  ```
64
 
65
+ Then, you can load the model from the hub using the following code:
66
+
67
+ ```python
68
  from huggingface_hub import hf_hub_download
69
+ import torch as th
70
  import gymnasium as gym
71
+ from stable_baselines3 import PPO
72
+ from stable_baselines3.common.env_util import make_atari_env
73
+ from stable_baselines3.common.vec_env import VecTransposeImage
74
+ from stable_baselines3.common.atari_wrappers import WarpFrame
75
+
76
+ # Download the model from the Hub
77
+ model_path = hf_hub_download(repo_id="kuds/atari-breakout-v4-ppo", filename="best-model.zip")
78
+
79
 
 
 
 
 
 
80
 
81
+
82
+ # Load the model
83
  model = PPO.load(model_path)
84
 
85
+ # Create the environment
86
+ env = make_atari_env("BreakoutNoFrameskip-v4", n_envs=1)
87
+ env = VecFrameStack(env, n_stack=4)
88
+ env = VecTransposeImage(env)
89
 
90
+ # Enjoy the trained agent
91
+ obs = env.reset()
92
+ for i in range(1000):
93
  action, _states = model.predict(obs, deterministic=True)
94
+ obs, rewards, dones, info = env.step(action)
95
+ env.render("human")
 
 
 
 
 
96
 
97
+ env.close()
 
 
 
 
 
 
 
98
  ```
99