Vishand03 commited on
Commit
311d5f4
Β·
verified Β·
1 Parent(s): 594c8e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -35
README.md CHANGED
@@ -1,52 +1,84 @@
1
- # πŸš€ LunarLander PPO Agent
2
-
3
- This repository contains a **PPO (Proximal Policy Optimization)** agent trained on the **LunarLander-v2** environment using **Stable-Baselines3**.
4
-
5
  ---
6
-
7
- ## πŸ“Œ Model Details
8
- - **Algorithm**: PPO (Proximal Policy Optimization)
9
- - **Environment**: LunarLander-v2 (Box2D)
10
- - **Framework**: Stable-Baselines3 + Gymnasium
11
- - **Reward Goal**: Successfully land the Lunar Module smoothly on the landing pad πŸš€
12
-
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
- ## πŸŽ₯ Demo Video
16
 
17
- Watch the trained agent in action:
18
 
19
- <video controls autoplay loop>
20
- <source src="https://huggingface.co/Vishand03/lunarlander-ppo/resolve/main/replay.mp4" type="video/mp4">
21
- Your browser does not support the video tag.
22
- </video>
23
 
24
- ---
 
 
25
 
26
- ## πŸ“‚ Files in this Repo
27
- - `replay.mp4` β†’ Video of the trained agent
28
- - `README.md` β†’ This documentation
29
- - (Optional) Model weights and training logs can also be pushed
 
 
 
30
 
31
- ---
 
32
 
33
- ## πŸ› οΈ How to Use
34
 
35
  ```python
36
  import gymnasium as gym
37
  from stable_baselines3 import PPO
 
 
 
38
 
39
- # Load environment
40
- env = gym.make("LunarLander-v2")
 
 
 
41
 
42
- # Load model
43
- model = PPO.load("path_to_model.zip", env=env)
44
 
45
- # Run agent
 
 
 
 
 
 
 
 
46
  obs, _ = env.reset()
47
- for _ in range(1000):
48
- action, _ = model.predict(obs)
49
- obs, reward, done, truncated, info = env.step(action)
50
- env.render()
51
- if done or truncated:
52
- obs, _ = env.reset()
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: stable-baselines3
3
+ tags:
4
+ - LunarLander-v2
5
+ - deep-reinforcement-learning
6
+ - reinforcement-learning
7
+ - stable-baselines3
8
+ model-index:
9
+ - name: PPO
10
+ results:
11
+ - task:
12
+ type: reinforcement-learning
13
+ name: reinforcement-learning
14
+ dataset:
15
+ name: LunarLander-v2
16
+ type: LunarLander-v2
17
+ metrics:
18
+ - type: mean_reward
19
+ name: mean_reward
20
+ value: 288.92 +/- 21.79
21
+ verified: false
22
  ---
23
 
24
+ # πŸš€ PPO Agent for LunarLander-v2
25
 
26
+ This is a trained **PPO agent** for the **LunarLander-v2** environment using Stable-Baselines3.
27
 
28
+ ## Developer
29
+ **Vishand S (@Vishand03)**
 
 
30
 
31
+ ## Frameworks
32
+ - Stable-Baselines3
33
+ - PyTorch
34
 
35
+ ## Training Details
36
+ - Algorithm: PPO
37
+ - Timesteps: 2.5M
38
+ - Mean Reward: ~288.9
39
+ - Discount factor (Ξ³): 0.99
40
+ - Learning rate: 3e-4
41
+ - Optimizer: Adam
42
 
43
+ ## πŸŽ₯ Demo
44
+ ![LunarLander](lunarlander.gif)
45
 
46
+ ## πŸ›  Usage
47
 
48
  ```python
49
  import gymnasium as gym
50
  from stable_baselines3 import PPO
51
+ from stable_baselines3.common.monitor import Monitor
52
+ from stable_baselines3.common.evaluation import evaluate_policy
53
+ from huggingface_hub import hf_hub_download
54
 
55
+ # -------------------------
56
+ # Environment Setup
57
+ # -------------------------
58
+ # Environment for human rendering
59
+ env = gym.make("LunarLander-v2", render_mode="human")
60
 
61
+ # Environment for evaluation (no render)
62
+ eval_env = Monitor(gym.make("LunarLander-v2"))
63
 
64
+ # -------------------------
65
+ # Load pretrained model from Hugging Face Hub
66
+ # -------------------------
67
+ model_path = hf_hub_download("Vishand03/lunarlander-ppo", "model.zip")
68
+ model = PPO.load(model_path)
69
+
70
+ # -------------------------
71
+ # Run a single episode
72
+ # -------------------------
73
  obs, _ = env.reset()
74
+ done = False
75
+ while not done:
76
+ action, _ = model.predict(obs, deterministic=True)
77
+ obs, reward, terminated, truncated, _ = env.step(action)
78
+ done = terminated or truncated
79
+
80
+ # -------------------------
81
+ # Evaluate policy
82
+ # -------------------------
83
+ mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
84
+ print(f"Mean Reward: {mean_reward:.2f} +/- {std_reward:.2f}")