kuds
/

atari-breakout-v4-ppo

@@ -1,171 +1,99 @@
 ---
-license: mit
-language:
-- en
 library_name: stable-baselines3
 tags:
 - reinforcement-learning
-- mujoco
-- locomotion
-- robotics
-- curriculum-learning
-- dinosaurs
-- gymnasium
 model-index:
-- name: PPO-Velociraptor
   results:
   - task:
       type: reinforcement-learning
       name: reinforcement-learning
     dataset:
-      name: MesozoicLabs/Raptor-v0
-      type: MesozoicLabs/Raptor-v0
     metrics:
     - type: mean_reward
-      value: 1366.19 +/- 76.29
       name: mean_reward
       verified: false
-    - type: success_rate
-      value: 93.3%
-      name: strike_success_rate
-      verified: false
 ---
-# **PPO** Agents for Robotic Dinosaur Locomotion — **Mesozoic Labs**
-![Trained PPO Agent](/results/velociraptor/ppo/stage1_balance.gif)
-This repository contains **PPO** (Proximal Policy Optimization) agents trained to control robotic dinosaurs in MuJoCo physics simulation. Each species is trained using a 3-stage curriculum learning approach.
-- [GitHub Repository](https://github.com/kuds/mesozoic-labs)
-- [Documentation](https://mesozoiclabs.com)
-- [Blog: From Zero to Dino-Roar](https://www.findingtheta.com/blog/from-zero-to-dino-roar-teaching-a-t-rex-to-walk-with-mujoco-and-reinforcement-learning)
-## Species & Training Results
-### Velociraptor (PPO) — All 3 stages passed | 22M steps | 11:25:15 total
-A bipedal predator with sickle claws, trained on 3 curriculum stages:
-| Stage | Name | Best Reward | Avg Forward Vel | Success Rate | Time |
-|-------|------|-------------|-----------------|--------------|------|
-| 1 | Balance | 1964.43 +/- 27.39 | 0.11 m/s | — | 2:57:25 |
-| 2 | Locomotion | 2678.68 +/- 4.07 | 3.47 m/s | — | 4:35:55 |
-| 3 | Strike | 1366.19 +/- 76.29 | 2.02 m/s | 93.3% | 3:51:54 |
-## Training Details
-- **Algorithm:** PPO (Proximal Policy Optimization) via [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3)
-- **Physics Engine:** [MuJoCo](https://mujoco.org/) (>= 3.0)
-- **Environment Framework:** [Gymnasium](https://gymnasium.farama.org/) (>= 0.29)
-- **Hardware:** Google Colab L4 GPU
-- **Seed:** 42
-- **Parallel Envs:** 4
-- **Curriculum:** 3-stage progressive training (Balance → Locomotion → Species-specific task)
-## Environment Details
-| Species | Observation Dims | Action Dims | Gymnasium ID |
-|---------|-----------------|-------------|--------------|
-| Velociraptor | 67 | 22 | `MesozoicLabs/Raptor-v0` |
-## Usage
-### Installation
-```bash
-git clone https://github.com/kuds/mesozoic-labs.git
-cd mesozoic-labs
-python -m venv venv
-source venv/bin/activate
-# Install with training dependencies
-pip install -e ".[train]"
-```
-### Loading a Trained Model
-```python
-from stable_baselines3 import PPO
 import gymnasium as gym
-# Register Mesozoic Labs environments
-import environments
-# Load the trained model (e.g., velociraptor stage 3)
-model = PPO.load("path/to/best_model.zip")
-# Create the environment
-env = gym.make("MesozoicLabs/Raptor-v0", render_mode="human")
-# Run the trained agent
 obs, info = env.reset()
 for _ in range(1000):
     action, _states = model.predict(obs, deterministic=True)
-    obs, reward, terminated, truncated, info = env.step(action)
     if terminated or truncated:
         obs, info = env.reset()
 env.close()
 ```
-### Training from Scratch
-```bash
-# Full 3-stage curriculum for velociraptor
-cd environments/velociraptor
-python scripts/train_sb3.py curriculum --algorithm ppo
-# Single stage training
-python scripts/train_sb3.py train --stage 1 --timesteps 6000000 --n-envs 4
-```
-### Loading from Hugging Face Hub
-```bash
 pip install huggingface_hub
 ```
-```python
 from huggingface_hub import hf_hub_download
-from stable_baselines3 import PPO
 import gymnasium as gym
-import environments
-# Download the model from the Hub
-model_path = hf_hub_download(
-    repo_id="kuds/mesozoic-labs",
-    filename="results/velociraptor/ppo/best_model.zip"
-)
-# Load the model
 model = PPO.load(model_path)
-# Create the environment
-env = gym.make("MesozoicLabs/Raptor-v0", render_mode="human")
-# Run the trained agent
-obs, info = env.reset()
-for _ in range(1000):
     action, _states = model.predict(obs, deterministic=True)
-    obs, reward, terminated, truncated, info = env.step(action)
-    if terminated or truncated:
-        obs, info = env.reset()
-env.close()
-```
-## Citation
-```bibtex
-@misc{mesozoic-labs,
-  author = {Mesozoic Labs Contributors},
-  title = {Mesozoic Labs: Robotic Dinosaur Locomotion with Reinforcement Learning},
-  year = {2026},
-  publisher = {GitHub / Hugging Face},
-  url = {https://github.com/kuds/mesozoic-labs}
-}
 ```
-## License
-MIT License

 ---
 library_name: stable-baselines3
 tags:
 - reinforcement-learning
+- BreakoutNoFrameskip-v4
 model-index:
+- name: PPO
   results:
   - task:
       type: reinforcement-learning
       name: reinforcement-learning
     dataset:
+      name: BreakoutNoFrameskip-v4
+      type: BreakoutNoFrameskip-v4
     metrics:
     - type: mean_reward
+      value: 187.80 +/- 114.62
       name: mean_reward
       verified: false
 ---
+ # **PPO** Agent playing **BreakoutNoFrameskip-v4**
+- [Github Repository](https://github.com/kuds/rl-atari-breakout)
+- [Google Colab Notebook](https://colab.research.google.com/github/kuds/rl-atari-breakout/blob/main/%5BAtari%20Breakout%5D%20Single-Agent%20Reinforcement%20Learning%20PPO.ipynb)
+- [Finding Theta - Blog Post](https://www.findingtheta.com/blog/beginners-guide-to-model-based-reinforcement-learning-mbrl-with-ataris-breakout)
+ Then, you can load the model using the following Python code:
+ ```python
 import gymnasium as gym
+from stable_baselines3 import PPO
+from stable_baselines3.common.env_util import make_atari_env
+from stable_baselines3.common.vec_env import VecTransposeImage
+from stable_baselines3.common.atari_wrappers import WarpFrame
+ # Load the trained model
+model = PPO.load("best-model.zip")
+ # Create the environment
+env = make_atari_env("BreakoutNoFrameskip-v4", n_envs=1)
+env = VecFrameStack(env, n_stack=4)
+env = VecTransposeImage(env)
+ # Reset the environment
 obs, info = env.reset()
+ # Enjoy the trained agent
 for _ in range(1000):
     action, _states = model.predict(obs, deterministic=True)
+    obs, rewards, terminated, truncated, info = env.step(action)
     if terminated or truncated:
         obs, info = env.reset()
+    env.render()
 env.close()
 ```
+ ### Hugging Face Hub
+ You can also use the Hugging Face Hub to load the model. First, you need to install the Hugging Face Hub library:
+ ```bash
 pip install huggingface_hub
 ```
+ Then, you can load the model from the hub using the following code:
+ ```python
 from huggingface_hub import hf_hub_download
+import torch as th
 import gymnasium as gym
+from stable_baselines3 import PPO
+from stable_baselines3.common.env_util import make_atari_env
+from stable_baselines3.common.vec_env import VecTransposeImage
+from stable_baselines3.common.atari_wrappers import WarpFrame
+ # Download the model from the Hub
+model_path = hf_hub_download(repo_id="kuds/atari-breakout-v4-ppo", filename="best-model.zip")
+ # Load the model
 model = PPO.load(model_path)
+ # Create the environment
+env = make_atari_env("BreakoutNoFrameskip-v4", n_envs=1)
+env = VecFrameStack(env, n_stack=4)
+env = VecTransposeImage(env)
+ # Enjoy the trained agent
+obs = env.reset()
+for i in range(1000):
     action, _states = model.predict(obs, deterministic=True)
+    obs, rewards, dones, info = env.step(action)
+    env.render("human")
+env.close()
 ```