SuperMarioBros-NES Level 1-1

PPO policy checkpoint for completing SuperMarioBros-Nes-v0 Level1-1 with Stable Retro, trained with rlab.

Quick Start

Install rlab once, import the ROM, then play or evaluate this checkpoint directly from Hugging Face:

uv tool install --from git+https://github.com/tsilva/rlab rlab
rlab import-roms ~/roms --game SuperMarioBros-Nes-v0
rlab play hf://tsilva/SuperMarioBros-NES_Level1-1
rlab eval hf://tsilva/SuperMarioBros-NES_Level1-1

Evaluation Results

`eval_profile`	`episodes`	`seed_start`	`completion_rate`	`max_x_max`	`reward_mean`	`death_count`	`checkpoint_step`
`mario_level1_v1`, stochastic policy sampling	100	10007	`100/100`	6,264	3398.55	0	4,500,000

Environment Details

Setting	Value
`env_provider`	`stable-retro-turbo`
`env_id`	`SuperMarioBros-Nes-v0`
`game`	`SuperMarioBros-Nes-v0`
`state`	`Level1-1`
`preprocessing`	crop top `32` px, grayscale, resize to `84 x 84`
`frame_stack`	`4`
`frame_skip`	`4`
`max_pool_frames`	enabled
`policy_observation_layout`	channel-first `(4, 84, 84)`
`action_set`	`simple`
`actions`	`noop`, `right`, `right_b`, `right_a`, `right_a_b`, `a`, `left`
`reward_mode`	`score`
`reward_shaping`	reward_mode=`score`; Stable Retro score reward; no explicit terminal reward in metadata
`max_episode_steps`	`2500`
`done_on_events`	life loss and completion
`sticky_action_prob`	disabled (`sticky_action_prob=0.0`)

Training Recipe

Setting	Value
`seed`	23
`n_envs`	16
`n_steps`	512
`batch_size`	512
`n_epochs`	10
`learning_rate`	`0.00015`
`ent_coef`	`0.01 -> 0.0003` over `2,000,000` timesteps
`gamma`	`0.9`
`gae_lambda`	`1.0`
`clip_range`	`0.15`
`target_kl`	`0.12`
`reward_mode`	`score`
`completion_x_threshold`	`3160`
`done_on_events`	life loss and completion

Provenance

Item	Value
`source_project`	`rlab`
`wandb_run`	`b31_post12_loosekl_5m_stop100ep100_clip015_targetkl012_clippeddx_seed23_20260618_192135`
`wandb_run_id`	`9j4r2h3g`
`wandb_artifact`	`tsilva/SuperMarioBros-NES/b31_post12_loosekl_5m_stop100ep100_clip015_targetkl012_clippeddx_seed23_20260618_192135-checkpoint:v44`
`artifact_alias`	`step-4500000`
`model_sha256`	`75eb50015295f887c7faae7dbbb80b9a024052581c443fbc0ce5b72e0be47f11`

Downloads last month: 91

Video Preview

Reinforcement Learning