Reinforcement Learning
stable-baselines3
ppo
stable-retro
rlab
super-mario-bros
nes
SuperMarioBros-Nes-v0
Instructions to use tsilva/SuperMarioBros-NES_Level1-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- stable-baselines3
How to use tsilva/SuperMarioBros-NES_Level1-1 with stable-baselines3:
from huggingface_sb3 import load_from_hub checkpoint = load_from_hub( repo_id="tsilva/SuperMarioBros-NES_Level1-1", filename="{MODEL FILENAME}.zip", ) - Notebooks
- Google Colab
- Kaggle
SuperMarioBros-NES Level 1-1
PPO policy checkpoint for completing SuperMarioBros-Nes-v0 Level1-1 with Stable Retro, trained with rlab.
Quick Start
Install rlab once, import the ROM, then play or evaluate this checkpoint directly from Hugging Face:
uv tool install --from git+https://github.com/tsilva/rlab rlab
rlab import-roms ~/roms --game SuperMarioBros-Nes-v0
rlab play hf://tsilva/SuperMarioBros-NES_Level1-1
rlab eval hf://tsilva/SuperMarioBros-NES_Level1-1
Evaluation Results
eval_profile |
episodes |
seed_start |
completion_rate |
max_x_max |
reward_mean |
death_count |
checkpoint_step |
|---|---|---|---|---|---|---|---|
mario_level1_v1, stochastic policy sampling |
100 | 10007 | 100/100 |
6,264 | 3398.55 | 0 | 4,500,000 |
Environment Details
| Setting | Value |
|---|---|
env_provider |
stable-retro-turbo |
env_id |
SuperMarioBros-Nes-v0 |
game |
SuperMarioBros-Nes-v0 |
state |
Level1-1 |
preprocessing |
crop top 32 px, grayscale, resize to 84 x 84 |
frame_stack |
4 |
frame_skip |
4 |
max_pool_frames |
enabled |
policy_observation_layout |
channel-first (4, 84, 84) |
action_set |
simple |
actions |
noop, right, right_b, right_a, right_a_b, a, left |
reward_mode |
score |
reward_shaping |
reward_mode=score; Stable Retro score reward; no explicit terminal reward in metadata |
max_episode_steps |
2500 |
done_on_events |
life loss and completion |
sticky_action_prob |
disabled (sticky_action_prob=0.0) |
Training Recipe
| Setting | Value |
|---|---|
seed |
23 |
n_envs |
16 |
n_steps |
512 |
batch_size |
512 |
n_epochs |
10 |
learning_rate |
0.00015 |
ent_coef |
0.01 -> 0.0003 over 2,000,000 timesteps |
gamma |
0.9 |
gae_lambda |
1.0 |
clip_range |
0.15 |
target_kl |
0.12 |
reward_mode |
score |
completion_x_threshold |
3160 |
done_on_events |
life loss and completion |
Provenance
| Item | Value |
|---|---|
source_project |
rlab |
wandb_run |
b31_post12_loosekl_5m_stop100ep100_clip015_targetkl012_clippeddx_seed23_20260618_192135 |
wandb_run_id |
9j4r2h3g |
wandb_artifact |
tsilva/SuperMarioBros-NES/b31_post12_loosekl_5m_stop100ep100_clip015_targetkl012_clippeddx_seed23_20260618_192135-checkpoint:v44 |
artifact_alias |
step-4500000 |
model_sha256 |
75eb50015295f887c7faae7dbbb80b9a024052581c443fbc0ce5b72e0be47f11 |
- Downloads last month
- 91