| license: mit | |
| language: | |
| - en | |
| pipeline_tag: reinforcement-learning | |
| tags: | |
| - mario | |
| - rl | |
| # Mario PPO Model | |
| This is a PPO agent trained using Stable Baselines3 and Gymnasium on a Mario-like environment. | |
| ## Environment Details | |
| - Action Space: Simple discrete NES-style actions (7 total) | |
| - Observation: Grayscale, 250×264 | |
| - Frame Stack: 4 frames | |
| ## Training Info | |
| - Algorithm: PPO | |
| - Framework: Stable Baselines3 | |
| - Timesteps: 20 million | |
| - Environment: Gymnasium (`v0`) | |
| - Device: MPS / CUDA / CPU | |
| ## Training Timesteps & Checkpoints | |
| | Checkpoint | Timesteps | Notes | | |
| | ---------------------------------------------------------------- | ---------- | -------------------- | | |
| | [25M Steps](checkpoints/simple/25M_steps/mario_ppo_25000000.zip) | 25,000,000 | Early-stage learning | | |
| | [50M Steps](checkpoints/simple/50M_steps/mario_ppo.zip) | 50,000,000 | Better stability | | |
| ## Usage | |
| ```python | |
| from stable_baselines3 import PPO | |
| from huggingface_hub import hf_hub_download | |
| model_path = hf_hub_download(repo_id="akantox/mario-rl-model", filename="mario_ppo.zip") | |
| model = PPO.load(model_path) | |
| ``` | |