Attila Kanto

Update README.md: add 25M steps checkpoint and adjust training timesteps

73a98b8 11 months ago

1.19 kB

	---
	license: mit
	language:
	- en
	pipeline_tag: reinforcement-learning
	tags:
	- mario
	- rl
	---

	# Mario PPO Model

	This is a PPO agent trained using Stable Baselines3 and Gymnasium on a Mario-like environment.

	## Environment Details

	- Action Space: Simple discrete NES-style actions (7 total)
	- Observation: Grayscale, 250×264
	- Frame Stack: 4 frames

	## Training Info

	- Algorithm: PPO
	- Framework: Stable Baselines3
	- Timesteps: 20 million
	- Environment: Gymnasium (`v0`)
	- Device: MPS / CUDA / CPU

	## Training Timesteps & Checkpoints

	\| Checkpoint \| Timesteps \| Notes \|
	\| ---------------------------------------------------------------- \| ---------- \| -------------------- \|
	\| [25M Steps](checkpoints/simple/25M_steps/mario_ppo_25000000.zip) \| 25,000,000 \| Early-stage learning \|
	\| [50M Steps](checkpoints/simple/50M_steps/mario_ppo.zip) \| 50,000,000 \| Better stability \|

	## Usage

	```python
	from stable_baselines3 import PPO
	from huggingface_hub import hf_hub_download

	model_path = hf_hub_download(repo_id="akantox/mario-rl-model", filename="mario_ppo.zip")
	model = PPO.load(model_path)
	```