CharithAnupama commited on
Commit
9e777ca
·
verified ·
1 Parent(s): c88ca4e

Unit 8 - PPO LunarLander

Browse files
README.md CHANGED
@@ -1,10 +1,10 @@
1
  ---
2
- library_name: stable-baselines3
3
  tags:
4
  - LunarLander-v2
5
- - deep-reinforcement-learning
6
  - reinforcement-learning
7
- - stable-baselines3
 
8
  model-index:
9
  - name: PPO
10
  results:
@@ -16,22 +16,36 @@ model-index:
16
  type: LunarLander-v2
17
  metrics:
18
  - type: mean_reward
19
- value: 267.49 +/- 17.43
20
  name: mean_reward
21
  verified: false
22
  ---
 
23
 
24
- # **PPO** Agent playing **LunarLander-v2**
25
- This is a trained model of a **PPO** agent playing **LunarLander-v2**
26
- using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
27
 
28
- ## Usage (with Stable-baselines3)
29
- TODO: Add your code
 
30
 
31
-
32
- ```python
33
- from stable_baselines3 import ...
34
- from huggingface_sb3 import load_from_hub
35
-
36
- ...
37
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  tags:
3
  - LunarLander-v2
4
+ - ppo
5
  - reinforcement-learning
6
+ - deep-rl-course
7
+ - custom-implementation
8
  model-index:
9
  - name: PPO
10
  results:
 
16
  type: LunarLander-v2
17
  metrics:
18
  - type: mean_reward
19
+ value: 8.59 +/- 73.34
20
  name: mean_reward
21
  verified: false
22
  ---
23
+ # PPO Agent Playing LunarLander-v2
24
 
25
+ Trained with a PPO implementation from scratch (CleanRL-style) in PyTorch.
 
 
26
 
27
+ ## Results
28
+ - Mean reward: 8.59
29
+ - Std reward: 73.34
30
 
31
+ ## Hyperparameters
32
+ - **exp_name**: ppo_from_scratch
33
+ - **seed**: 1
34
+ - **cuda**: 1
35
+ - **env_id**: LunarLander-v2
36
+ - **total_timesteps**: 400000
37
+ - **learning_rate**: 0.00025
38
+ - **num_envs**: 8
39
+ - **num_steps**: 128
40
+ - **anneal_lr**: 1
41
+ - **gamma**: 0.99
42
+ - **gae_lambda**: 0.95
43
+ - **num_minibatches**: 4
44
+ - **update_epochs**: 4
45
+ - **clip_coef**: 0.2
46
+ - **ent_coef**: 0.01
47
+ - **vf_coef**: 0.5
48
+ - **max_grad_norm**: 0.5
49
+ - **repo_id**: CharithAnupama/ppo-LunarLander-v2
50
+ - **batch_size**: 1024
51
+ - **minibatch_size**: 256
logs/events.out.tfevents.1766052476.1d358b537c13.11731.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3a7a5cc171622cd93ebd16816656d99bc7b36863888d172b60290e6dfd006ec
3
+ size 88
model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c5d98b6437421e6ef0e36154eb388164095d74cd2df9055ba48f366f02ae2b7
3
+ size 43419
replay.mp4 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:133bde4452d3d5503c0195cd6ecf54235fdf271816c2e98fe11d077d6486f3ff
3
- size 145104
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:288fca0eabde9af1426338462b92b1d7a86209b89f077d65e3c1d9916ca991bf
3
+ size 217586
results.json CHANGED
@@ -1 +1 @@
1
- {"mean_reward": 267.4945156, "std_reward": 17.43157326554414, "is_deterministic": true, "n_eval_episodes": 10, "eval_datetime": "2025-11-13T05:49:21.774743"}
 
1
+ {"env_id": "LunarLander-v2", "mean_reward": 8.593038541298025, "std_reward": 73.3423435822931, "n_evaluation_episodes": 10, "eval_datetime": "2025-12-18T10:10:52.975563"}