PPO on LunarLander-v2 (best checkpoint)

Trained with Stable-Baselines3 (PPO). VecNormalize stats per checkpoint were not preserved; evaluation may differ slightly across machines.

Files

best_model.zip: SB3 model (policy + hyperparameters)
vecnormalize.pkl: (optional) observation-normalization stats, if available

Quick usage

import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

def make_env(seed=0):
    def _init():
        env = gym.make("LunarLander-v2")
        env.reset(seed=seed)
        return Monitor(env)
    return _init

venv = DummyVecEnv([make_env(1234)])
try:
    env = VecNormalize.load("vecnormalize.pkl", venv)
except Exception:
    env = VecNormalize(venv, norm_obs=True, norm_reward=False, clip_obs=10.0)
env.training = False
env.norm_reward = False

model = PPO.load("best_model.zip", env=env, device="cpu")

Downloads last month: 5

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on LunarLander-v2
self-reported

281.60 +/- 18.90