HuggingFace RL Course
Collection
Models I created/optimized during my learning of the Deep RL course by HuggingFace
•
9 items
•
Updated
This is a trained model of a PPO agent playing LunarLander-v2 using the stable-baselines3 library.
Hyperparameters used to train were optimized with Optuna.
{
"learning_rate": 0.00038779746460731866,
"n_steps": 2048,
"batch_size": 128,
"n_epochs": 13,
"gamma": 0.9927390555180292,
"gae_lambda": 0.9353501463066322,
"clip_range": clip_range,
"ent_coef": 0.007068533587811773,
"policy_kwargs": {
"net_arch": {'pi': [512, 512], 'vf': [512, 512]},
"activation_fn": nn.Tanh
},
}
Learning rate was used as an initial value for a linear scheduler during training. See this github issue for more information
import gymnasium as gym
from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.monitor import Monitor
env_id = "LunarLander-v2"
model_fp = load_from_hub(
repo_id="reeeemo/ppo-LunarLander-v2",
filename="ppo-LunarLander-v2-optimized.zip",
)
model = PPO.load(model_fp, print_system_info=True)
eval_env = Monitor(gym.make(env_id))
mean_reward, std_reward = evaluate_policy(
model, eval_env, n_eval_episodes=10, deterministic=True
)
print(f"Results: {mean_reward-std_reward:.2f}")
print(f"mean_reward: {mean_reward:.2f} +/- {std_reward}")
...