--- license: mit language: - en pipeline_tag: reinforcement-learning tags: - mario - rl --- # Mario PPO Model This is a PPO agent trained using Stable Baselines3 and Gymnasium on a Mario-like environment. ## Environment Details - Action Space: Simple discrete NES-style actions (7 total) - Observation: Grayscale, 250×264 - Frame Stack: 4 frames ## Training Info - Algorithm: PPO - Framework: Stable Baselines3 - Timesteps: 20 million - Environment: Gymnasium (`v0`) - Device: MPS / CUDA / CPU ## Training Timesteps & Checkpoints | Checkpoint | Timesteps | Notes | | ---------------------------------------------------------------- | ---------- | -------------------- | | [25M Steps](checkpoints/simple/25M_steps/mario_ppo_25000000.zip) | 25,000,000 | Early-stage learning | | [50M Steps](checkpoints/simple/50M_steps/mario_ppo.zip) | 50,000,000 | Better stability | ## Usage ```python from stable_baselines3 import PPO from huggingface_hub import hf_hub_download model_path = hf_hub_download(repo_id="akantox/mario-rl-model", filename="mario_ppo.zip") model = PPO.load(model_path) ```