JulioSnchezD commited on
Commit
60eca6c
·
verified ·
1 Parent(s): 816b682

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -37
README.md CHANGED
@@ -1,37 +1,86 @@
1
- ---
2
- library_name: stable-baselines3
3
- tags:
4
- - LunarLander-v2
5
- - deep-reinforcement-learning
6
- - reinforcement-learning
7
- - stable-baselines3
8
- model-index:
9
- - name: PPO
10
- results:
11
- - task:
12
- type: reinforcement-learning
13
- name: reinforcement-learning
14
- dataset:
15
- name: LunarLander-v2
16
- type: LunarLander-v2
17
- metrics:
18
- - type: mean_reward
19
- value: 242.08 +/- 19.81
20
- name: mean_reward
21
- verified: false
22
- ---
23
-
24
- # **PPO** Agent playing **LunarLander-v2**
25
- This is a trained model of a **PPO** agent playing **LunarLander-v2**
26
- using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
27
-
28
- ## Usage (with Stable-baselines3)
29
- TODO: Add your code
30
-
31
-
32
- ```python
33
- from stable_baselines3 import ...
34
- from huggingface_sb3 import load_from_hub
35
-
36
- ...
37
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: stable-baselines3
3
+ tags:
4
+ - LunarLander-v2
5
+ - deep-reinforcement-learning
6
+ - reinforcement-learning
7
+ - stable-baselines3
8
+ model-index:
9
+ - name: PPO
10
+ results:
11
+ - task:
12
+ type: reinforcement-learning
13
+ name: reinforcement-learning
14
+ dataset:
15
+ name: LunarLander-v2
16
+ type: LunarLander-v2
17
+ metrics:
18
+ - type: mean_reward
19
+ value: 242.08 +/- 19.81
20
+ name: mean_reward
21
+ verified: false
22
+ ---
23
+
24
+ # **PPO** Agent playing **LunarLander-v2**
25
+ This is a trained model of a **PPO** agent playing **LunarLander-v2**
26
+ using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
27
+
28
+ ## Usage (with Stable-baselines3)
29
+
30
+
31
+ ```python
32
+ import gymnasium as gym
33
+ from time import sleep
34
+ from huggingface_sb3 import package_to_hub
35
+ from stable_baselines3 import PPO
36
+ from stable_baselines3.common.env_util import make_vec_env
37
+ from stable_baselines3.common.evaluation import evaluate_policy
38
+ from stable_baselines3.common.monitor import Monitor
39
+ from stable_baselines3.common.vec_env import DummyVecEnv
40
+
41
+ # Create the environment
42
+ env = make_vec_env("LunarLander-v2", n_envs=16)
43
+
44
+ # We added some parameters to accelerate the training
45
+ model = PPO(
46
+ policy="MlpPolicy",
47
+ env=env,
48
+ n_steps=1024,
49
+ batch_size=64,
50
+ n_epochs=4,
51
+ gamma=0.999,
52
+ gae_lambda=0.98,
53
+ ent_coef=0.01,
54
+ verbose=1,
55
+ )
56
+
57
+ # Train it for 1,000,000 timesteps
58
+ model.learn(total_timesteps=1000000)
59
+ # Save the model
60
+ model.save(model_name)
61
+
62
+ # Test the model
63
+ # model = PPO.load(model_name)
64
+ eval_env = Monitor(gym.make("LunarLander-v2"))
65
+ mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
66
+ print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
67
+
68
+ # Visualize the model
69
+ env = gym.make("LunarLander-v2", render_mode='human')
70
+
71
+ state, _ = env.reset()
72
+ stop = False
73
+
74
+ while not stop:
75
+ action, _ = model.predict(state)
76
+ state, reward, terminated, truncated, info = env.step(action)
77
+ stop = terminated or truncated
78
+ env.render()
79
+ sleep(0.05)
80
+
81
+ if terminated or truncated:
82
+ observation, info = env.reset()
83
+
84
+ env.close()
85
+ ...
86
+ ```