PPO Agents for Robotic Dinosaur Locomotion β€” Mesozoic Labs

Trained PPO Agent

This repository contains PPO (Proximal Policy Optimization) agents trained to control robotic dinosaurs in MuJoCo physics simulation. Each species is trained using a 3-stage curriculum learning approach.

Species & Training Results

Velociraptor (PPO) β€” All 3 stages passed | 22M steps | 11:25:15 total

A bipedal predator with sickle claws, trained on 3 curriculum stages:

Stage Name Best Reward Avg Forward Vel Success Rate Time
1 Balance 1964.43 +/- 27.39 0.11 m/s β€” 2:57:25
2 Locomotion 2678.68 +/- 4.07 3.47 m/s β€” 4:35:55
3 Strike 1366.19 +/- 76.29 2.02 m/s 93.3% 3:51:54

Training Details

  • Algorithm: PPO (Proximal Policy Optimization) via Stable-Baselines3
  • Physics Engine: MuJoCo (>= 3.0)
  • Environment Framework: Gymnasium (>= 0.29)
  • Hardware: Google Colab L4 GPU
  • Seed: 42
  • Parallel Envs: 4
  • Curriculum: 3-stage progressive training (Balance β†’ Locomotion β†’ Species-specific task)

Environment Details

Species Observation Dims Action Dims Gymnasium ID
Velociraptor 67 22 MesozoicLabs/Raptor-v0

Usage

Installation

git clone https://github.com/kuds/mesozoic-labs.git
cd mesozoic-labs

python -m venv venv
source venv/bin/activate

# Install with training dependencies
pip install -e ".[train]"

Loading a Trained Model

from stable_baselines3 import PPO
import gymnasium as gym

# Register Mesozoic Labs environments
import environments

# Load the trained model (e.g., velociraptor stage 3)
model = PPO.load("path/to/best_model.zip")

# Create the environment
env = gym.make("MesozoicLabs/Raptor-v0", render_mode="human")

# Run the trained agent
obs, info = env.reset()
for _ in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()
env.close()

Training from Scratch

# Full 3-stage curriculum for velociraptor
cd environments/velociraptor
python scripts/train_sb3.py curriculum --algorithm ppo

# Single stage training
python scripts/train_sb3.py train --stage 1 --timesteps 6000000 --n-envs 4

Loading from Hugging Face Hub

pip install huggingface_hub
from huggingface_hub import hf_hub_download
from stable_baselines3 import PPO
import gymnasium as gym
import environments

# Download the model from the Hub
model_path = hf_hub_download(
    repo_id="kuds/mesozoic-labs",
    filename="results/velociraptor/ppo/best_model.zip"
)

# Load the model
model = PPO.load(model_path)

# Create the environment
env = gym.make("MesozoicLabs/Raptor-v0", render_mode="human")

# Run the trained agent
obs, info = env.reset()
for _ in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()
env.close()

Citation

@misc{mesozoic-labs,
  author = {Mesozoic Labs Contributors},
  title = {Mesozoic Labs: Robotic Dinosaur Locomotion with Reinforcement Learning},
  year = {2026},
  publisher = {GitHub / Hugging Face},
  url = {https://github.com/kuds/mesozoic-labs}
}

License

MIT License

Downloads last month
-
Video Preview
loading

Evaluation results

  • mean_reward on MesozoicLabs/Raptor-v0
    self-reported
    1366.19 +/- 76.29
  • strike_success_rate on MesozoicLabs/Raptor-v0
    self-reported
    93.3%