sam522 commited on
Commit
74c66c0
·
verified ·
1 Parent(s): 8102936

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - LunarLander-v3
4
+ - ppo
5
+ - deep-reinforcement-learning
6
+ - reinforcement-learning
7
+ - stable-baselines3
8
+ library_name: stable-baselines3
9
+ ---
10
+
11
+ # PPO Agent playing LunarLander-v3
12
+
13
+ This is a **PPO** agent trained on the **LunarLander-v3** environment.
14
+
15
+ ## Usage
16
+
17
+ ```python
18
+ import torch
19
+ import gymnasium as gym
20
+ from pathlib import Path
21
+
22
+ # Load the model
23
+ checkpoint = torch.load("model.pth")
24
+ network = Network(config) # You need to define the Network class
25
+ network.load_state_dict(checkpoint['model_state_dict'])
26
+
27
+ # Test the agent
28
+ env = gym.make("LunarLander-v3")
29
+ state, _ = env.reset()
30
+ done = False
31
+ total_reward = 0
32
+
33
+ while not done:
34
+ action, _, _, _ = network.get_action_and_value(state)
35
+ state, reward, terminated, truncated, _ = env.step(action)
36
+ total_reward += reward
37
+ done = terminated or truncated
38
+
39
+ print(f"Total reward: {total_reward}")
40
+ ```
41
+
42
+ ## Training Results
43
+
44
+ - **Environment**: LunarLander-v3
45
+ - **Training Episodes**: 3000
46
+ - **Final Performance**: 212.4 ± 113.1
47
+ - **Best Episode**: 332.4307750590245
48
+
49
+ ## Algorithm Details
50
+
51
+ - **Algorithm**: Proximal Policy Optimization (PPO)
52
+ - **Network Architecture**: Actor-Critic with shared features
53
+ - **Learning Rate**: 0.0003
54
+ - **Clip Epsilon**: 0.2
55
+ - **Training Episodes**: 3000
56
+