Reinforcement Learning
stable-baselines3
BipedalWalker-v3
deep-reinforcement-learning
Eval Results (legacy)
Instructions to use MattStammers/SAC-Bipedal_Walker_v3-HardcoreTrained with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- stable-baselines3
How to use MattStammers/SAC-Bipedal_Walker_v3-HardcoreTrained with stable-baselines3:
from huggingface_sb3 import load_from_hub checkpoint = load_from_hub( repo_id="MattStammers/SAC-Bipedal_Walker_v3-HardcoreTrained", filename="{MODEL FILENAME}.zip", ) - Notebooks
- Google Colab
- Kaggle
SAC Agent playing BipedalWalker-v3
This is a trained model of a SAC agent playing BipedalWalker-v3 using the stable-baselines3 library.
Usage (with Stable-baselines3)
TODO: Add your code
from stable_baselines3 import ...
from huggingface_sb3 import load_from_hub
...
Well he does ok but still gets stuck on the rocks. Here are my hyperparameters not that they did me much good 😂:
def linear_schedule(initial_value, final_value=0.00001):
def func(progress_remaining):
"""Progress will decrease from 1 (beginning) to 0 (end)"""
return final_value + (initial_value - final_value) * progress_remaining
return func
initial_learning_rate = 7.3e-4
model = SAC(
policy='MlpPolicy',
env=env,
learning_rate=linear_schedule(initial_learning_rate),
buffer_size=1000000,
batch_size=256,
ent_coef=0.005,
gamma=0.99,
tau=0.01,
train_freq=1,
gradient_steps=1,
learning_starts=10000,
policy_kwargs=dict(net_arch=[400, 300]),
verbose=1
)
These are pretty well tuned but SAC leads to too much exploration and the agent is unable to exploit the required actions to complete the course. I suspect TD3 will be more successful so plan to turn back to that
- Downloads last month
- 4
Evaluation results
- mean_reward on BipedalWalker-v3self-reported-31.49 +/- 60.03