File size: 1,187 Bytes
ac6fdde 1c9eaf1 ac6fdde 1c9eaf1 ac6fdde 1c9eaf1 ac6fdde c9a7658 ac6fdde 1c9eaf1 ac6fdde 1c9eaf1 ac6fdde c9a7658 73a98b8 c9a7658 ac6fdde 1c9eaf1 ac6fdde 1c9eaf1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | ---
license: mit
language:
- en
pipeline_tag: reinforcement-learning
tags:
- mario
- rl
---
# Mario PPO Model
This is a PPO agent trained using Stable Baselines3 and Gymnasium on a Mario-like environment.
## Environment Details
- Action Space: Simple discrete NES-style actions (7 total)
- Observation: Grayscale, 250×264
- Frame Stack: 4 frames
## Training Info
- Algorithm: PPO
- Framework: Stable Baselines3
- Timesteps: 20 million
- Environment: Gymnasium (`v0`)
- Device: MPS / CUDA / CPU
## Training Timesteps & Checkpoints
| Checkpoint | Timesteps | Notes |
| ---------------------------------------------------------------- | ---------- | -------------------- |
| [25M Steps](checkpoints/simple/25M_steps/mario_ppo_25000000.zip) | 25,000,000 | Early-stage learning |
| [50M Steps](checkpoints/simple/50M_steps/mario_ppo.zip) | 50,000,000 | Better stability |
## Usage
```python
from stable_baselines3 import PPO
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(repo_id="akantox/mario-rl-model", filename="mario_ppo.zip")
model = PPO.load(model_path)
```
|