PPO Agents for Robotic Dinosaur Locomotion — Mesozoic Labs

This repository contains PPO (Proximal Policy Optimization) agents trained to control robotic dinosaurs in MuJoCo physics simulation. Each species is trained using a 3-stage curriculum learning approach.

Species & Training Results

Velociraptor (PPO) — All 3 stages passed | 22M steps | 11:25:15 total

A bipedal predator with sickle claws, trained on 3 curriculum stages:

Stage	Name	Best Reward	Avg Forward Vel	Success Rate	Time
1	Balance	1964.43 +/- 27.39	0.11 m/s	—	2:57:25
2	Locomotion	2678.68 +/- 4.07	3.47 m/s	—	4:35:55
3	Strike	1366.19 +/- 76.29	2.02 m/s	93.3%	3:51:54

Training Details

Algorithm: PPO (Proximal Policy Optimization) via Stable-Baselines3
Physics Engine: MuJoCo (>= 3.0)
Environment Framework: Gymnasium (>= 0.29)
Hardware: Google Colab L4 GPU
Seed: 42
Parallel Envs: 4
Curriculum: 3-stage progressive training (Balance → Locomotion → Species-specific task)

Environment Details

Species	Observation Dims	Action Dims	Gymnasium ID
Velociraptor	67	22	`MesozoicLabs/Raptor-v0`

Usage

Installation

git clone https://github.com/kuds/mesozoic-labs.git
cd mesozoic-labs

python -m venv venv
source venv/bin/activate

# Install with training dependencies
pip install -e ".[train]"

Loading a Trained Model

from stable_baselines3 import PPO
import gymnasium as gym

# Register Mesozoic Labs environments
import environments

# Load the trained model (e.g., velociraptor stage 3)
model = PPO.load("path/to/best_model.zip")

# Create the environment
env = gym.make("MesozoicLabs/Raptor-v0", render_mode="human")

# Run the trained agent
obs, info = env.reset()
for _ in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()
env.close()

Training from Scratch

# Full 3-stage curriculum for velociraptor
cd environments/velociraptor
python scripts/train_sb3.py curriculum --algorithm ppo

# Single stage training
python scripts/train_sb3.py train --stage 1 --timesteps 6000000 --n-envs 4

Loading from Hugging Face Hub

pip install huggingface_hub

from huggingface_hub import hf_hub_download
from stable_baselines3 import PPO
import gymnasium as gym
import environments

# Download the model from the Hub
model_path = hf_hub_download(
    repo_id="kuds/mesozoic-labs",
    filename="results/velociraptor/ppo/best_model.zip"
)

# Load the model
model = PPO.load(model_path)

# Create the environment
env = gym.make("MesozoicLabs/Raptor-v0", render_mode="human")

# Run the trained agent
obs, info = env.reset()
for _ in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()
env.close()

Citation

@misc{mesozoic-labs,
  author = {Mesozoic Labs Contributors},
  title = {Mesozoic Labs: Robotic Dinosaur Locomotion with Reinforcement Learning},
  year = {2026},
  publisher = {GitHub / Hugging Face},
  url = {https://github.com/kuds/mesozoic-labs}
}

License

MIT License

Downloads last month: -

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on MesozoicLabs/Raptor-v0
self-reported

1366.19 +/- 76.29
strike_success_rate on MesozoicLabs/Raptor-v0
self-reported

93.3%