PPO on LunarLander-v2 (best checkpoint)
Trained with Stable-Baselines3 (PPO). VecNormalize stats per checkpoint were not preserved; evaluation may differ slightly across machines.
Files
best_model.zip: SB3 model (policy + hyperparameters)vecnormalize.pkl: (optional) observation-normalization stats, if available
Quick usage
import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
def make_env(seed=0):
def _init():
env = gym.make("LunarLander-v2")
env.reset(seed=seed)
return Monitor(env)
return _init
venv = DummyVecEnv([make_env(1234)])
try:
env = VecNormalize.load("vecnormalize.pkl", venv)
except Exception:
env = VecNormalize(venv, norm_obs=True, norm_reward=False, clip_obs=10.0)
env.training = False
env.norm_reward = False
model = PPO.load("best_model.zip", env=env, device="cpu")
- Downloads last month
- 2
Evaluation results
- mean_reward on LunarLander-v2self-reported281.60 +/- 18.90