mario-rl-model / README.md
Attila Kanto
Update README.md: add 25M steps checkpoint and adjust training timesteps
73a98b8
---
license: mit
language:
- en
pipeline_tag: reinforcement-learning
tags:
- mario
- rl
---
# Mario PPO Model
This is a PPO agent trained using Stable Baselines3 and Gymnasium on a Mario-like environment.
## Environment Details
- Action Space: Simple discrete NES-style actions (7 total)
- Observation: Grayscale, 250×264
- Frame Stack: 4 frames
## Training Info
- Algorithm: PPO
- Framework: Stable Baselines3
- Timesteps: 20 million
- Environment: Gymnasium (`v0`)
- Device: MPS / CUDA / CPU
## Training Timesteps & Checkpoints
| Checkpoint | Timesteps | Notes |
| ---------------------------------------------------------------- | ---------- | -------------------- |
| [25M Steps](checkpoints/simple/25M_steps/mario_ppo_25000000.zip) | 25,000,000 | Early-stage learning |
| [50M Steps](checkpoints/simple/50M_steps/mario_ppo.zip) | 50,000,000 | Better stability |
## Usage
```python
from stable_baselines3 import PPO
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(repo_id="akantox/mario-rl-model", filename="mario_ppo.zip")
model = PPO.load(model_path)
```