File size: 1,187 Bytes
ac6fdde
 
 
1c9eaf1
ac6fdde
 
1c9eaf1
 
ac6fdde
1c9eaf1
ac6fdde
 
 
 
c9a7658
 
 
 
 
 
ac6fdde
1c9eaf1
ac6fdde
 
1c9eaf1
ac6fdde
 
 
c9a7658
 
73a98b8
 
 
 
c9a7658
ac6fdde
 
 
 
 
 
1c9eaf1
ac6fdde
1c9eaf1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
license: mit
language:
  - en
pipeline_tag: reinforcement-learning
tags:
  - mario
  - rl
---

# Mario PPO Model

This is a PPO agent trained using Stable Baselines3 and Gymnasium on a Mario-like environment.

## Environment Details

- Action Space: Simple discrete NES-style actions (7 total)
- Observation: Grayscale, 250×264
- Frame Stack: 4 frames

## Training Info

- Algorithm: PPO
- Framework: Stable Baselines3
- Timesteps: 20 million
- Environment: Gymnasium (`v0`)
- Device: MPS / CUDA / CPU

## Training Timesteps & Checkpoints

| Checkpoint                                                       | Timesteps  | Notes                |
| ---------------------------------------------------------------- | ---------- | -------------------- |
| [25M Steps](checkpoints/simple/25M_steps/mario_ppo_25000000.zip) | 25,000,000 | Early-stage learning |
| [50M Steps](checkpoints/simple/50M_steps/mario_ppo.zip)          | 50,000,000 | Better stability     |

## Usage

```python
from stable_baselines3 import PPO
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(repo_id="akantox/mario-rl-model", filename="mario_ppo.zip")
model = PPO.load(model_path)
```