SAC Agent playing BipedalWalker-v3

This is a trained model of a SAC agent playing BipedalWalker-v3 using the stable-baselines3 library.

Usage (with Stable-baselines3)

TODO: Add your code

from stable_baselines3 import ...
from huggingface_sb3 import load_from_hub

...

Well he does ok but still gets stuck on the rocks. Here are my hyperparameters not that they did me much good 😂:

def linear_schedule(initial_value, final_value=0.00001):
    def func(progress_remaining):
        """Progress will decrease from 1 (beginning) to 0 (end)"""
        return final_value + (initial_value - final_value) * progress_remaining
    return func

initial_learning_rate = 7.3e-4

model = SAC(
    policy='MlpPolicy',
    env=env,
    learning_rate=linear_schedule(initial_learning_rate),
    buffer_size=1000000,
    batch_size=256,
    ent_coef=0.005,
    gamma=0.99,
    tau=0.01,
    train_freq=1,
    gradient_steps=1,
    learning_starts=10000,
    policy_kwargs=dict(net_arch=[400, 300]),
    verbose=1
)

These are pretty well tuned but SAC leads to too much exploration and the agent is unable to exploit the required actions to complete the course. I suspect TD3 will be more successful so plan to turn back to that

Downloads last month: 4

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on BipedalWalker-v3
self-reported

-31.49 +/- 60.03