PPO Agents for Robotic Dinosaur Locomotion β Mesozoic Labs
This repository contains PPO (Proximal Policy Optimization) agents trained to control robotic dinosaurs in MuJoCo physics simulation. Each species is trained using a 3-stage curriculum learning approach.
Species & Training Results
Velociraptor (PPO) β All 3 stages passed | 22M steps | 11:25:15 total
A bipedal predator with sickle claws, trained on 3 curriculum stages:
| Stage | Name | Best Reward | Avg Forward Vel | Success Rate | Time |
|---|---|---|---|---|---|
| 1 | Balance | 1964.43 +/- 27.39 | 0.11 m/s | β | 2:57:25 |
| 2 | Locomotion | 2678.68 +/- 4.07 | 3.47 m/s | β | 4:35:55 |
| 3 | Strike | 1366.19 +/- 76.29 | 2.02 m/s | 93.3% | 3:51:54 |
Training Details
- Algorithm: PPO (Proximal Policy Optimization) via Stable-Baselines3
- Physics Engine: MuJoCo (>= 3.0)
- Environment Framework: Gymnasium (>= 0.29)
- Hardware: Google Colab L4 GPU
- Seed: 42
- Parallel Envs: 4
- Curriculum: 3-stage progressive training (Balance β Locomotion β Species-specific task)
Environment Details
| Species | Observation Dims | Action Dims | Gymnasium ID |
|---|---|---|---|
| Velociraptor | 67 | 22 | MesozoicLabs/Raptor-v0 |
Usage
Installation
git clone https://github.com/kuds/mesozoic-labs.git
cd mesozoic-labs
python -m venv venv
source venv/bin/activate
# Install with training dependencies
pip install -e ".[train]"
Loading a Trained Model
from stable_baselines3 import PPO
import gymnasium as gym
# Register Mesozoic Labs environments
import environments
# Load the trained model (e.g., velociraptor stage 3)
model = PPO.load("path/to/best_model.zip")
# Create the environment
env = gym.make("MesozoicLabs/Raptor-v0", render_mode="human")
# Run the trained agent
obs, info = env.reset()
for _ in range(1000):
action, _states = model.predict(obs, deterministic=True)
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
env.close()
Training from Scratch
# Full 3-stage curriculum for velociraptor
cd environments/velociraptor
python scripts/train_sb3.py curriculum --algorithm ppo
# Single stage training
python scripts/train_sb3.py train --stage 1 --timesteps 6000000 --n-envs 4
Loading from Hugging Face Hub
pip install huggingface_hub
from huggingface_hub import hf_hub_download
from stable_baselines3 import PPO
import gymnasium as gym
import environments
# Download the model from the Hub
model_path = hf_hub_download(
repo_id="kuds/mesozoic-labs",
filename="results/velociraptor/ppo/best_model.zip"
)
# Load the model
model = PPO.load(model_path)
# Create the environment
env = gym.make("MesozoicLabs/Raptor-v0", render_mode="human")
# Run the trained agent
obs, info = env.reset()
for _ in range(1000):
action, _states = model.predict(obs, deterministic=True)
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
env.close()
Citation
@misc{mesozoic-labs,
author = {Mesozoic Labs Contributors},
title = {Mesozoic Labs: Robotic Dinosaur Locomotion with Reinforcement Learning},
year = {2026},
publisher = {GitHub / Hugging Face},
url = {https://github.com/kuds/mesozoic-labs}
}
License
MIT License
- Downloads last month
- -
Evaluation results
- mean_reward on MesozoicLabs/Raptor-v0self-reported1366.19 +/- 76.29
- strike_success_rate on MesozoicLabs/Raptor-v0self-reported93.3%
