You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Attention! This is a proof-of-concept model deployed here just for research demonstration. Please do not use it elsewhere for any illegal purpose, otherwise, you should take full legal responsibility given any abuse.

PPO Agent playing BreakoutNoFrameskip-v4

This is a trained model of a PPO agent playing BreakoutNoFrameskip-v4 using the stable-baselines3 library and the RL Zoo.

The RL Zoo is a training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.

Usage (with SB3 RL Zoo)

RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo
SB3: https://github.com/DLR-RM/stable-baselines3
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

# Download model and save it into the logs/ folder
python -m rl_zoo3.load_from_hub --algo ppo --env BreakoutNoFrameskip-v4 -orga sb3 -f logs/
python enjoy.py --algo ppo --env BreakoutNoFrameskip-v4  -f logs/

Training (with the RL Zoo)

python train.py --algo ppo --env BreakoutNoFrameskip-v4 -f logs/
# Upload the model and generate video (when possible)
python -m rl_zoo3.push_to_hub --algo ppo --env BreakoutNoFrameskip-v4 -f logs/ -orga sb3

Hyperparameters

OrderedDict([('batch_size', 256),
             ('clip_range', 'lin_0.1'),
             ('ent_coef', 0.01),
             ('env_wrapper',
              ['stable_baselines3.common.atari_wrappers.AtariWrapper']),
             ('frame_stack', 4),
             ('learning_rate', 'lin_2.5e-4'),
             ('n_envs', 8),
             ('n_epochs', 4),
             ('n_steps', 128),
             ('n_timesteps', 10000000.0),
             ('policy', 'CnnPolicy'),
             ('vf_coef', 0.5),
             ('normalize', False)])

Downloads last month: 45

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on BreakoutNoFrameskip-v4
self-reported

398.00 +/- 16.30