sb3
/

ppo-Pendulum-v1

Reinforcement Learning

stable-baselines3

deep-reinforcement-learning

Eval Results (legacy)

Model card Files Files and versions

Antonin Raffin commited on May 24, 2022

Commit

3bd1c71

·

1 Parent(s): 7e1fcbf

Update README.md

Files changed (1) hide show

README.md +24 -0

README.md CHANGED Viewed

@@ -48,6 +48,30 @@ python train.py --algo ppo --env Pendulum-v1 -f logs/
 python -m utils.push_to_hub --algo ppo --env Pendulum-v1 -f logs/ -orga sb3
 ```
 ## Hyperparameters
 ```python
 OrderedDict([('clip_range', 0.2),

 python -m utils.push_to_hub --algo ppo --env Pendulum-v1 -f logs/ -orga sb3
 ```
+```python
+from stable_baselines3 import PPO
+from stable_baselines3.common.env_util import make_vec_env
+# Create the environment
+env_id = "Pendulum-v1"
+env = make_vec_env(env_id, n_envs=1)
+# Instantiate the agent
+model = PPO(
+    "MlpPolicy",
+    env,
+    gamma=0.98,
+    # Using https://proceedings.mlr.press/v164/raffin22a.html
+    use_sde=True,
+    sde_sample_freq=4,
+    learning_rate=1e-3,
+    verbose=1,
+)
+# Train the agent
+model.learn(total_timesteps=int(1e5))
+```
 ## Hyperparameters
 ```python
 OrderedDict([('clip_range', 0.2),