sam522 commited on
Commit
bfadc6c
·
verified ·
1 Parent(s): 1a99f8a

Upload Pyramids PPO model for Deep RL Course Unit 5

Browse files
Files changed (5) hide show
  1. README.md +83 -0
  2. config.json +24 -0
  3. model.pt +3 -0
  4. model_card.json +22 -0
  5. requirements.txt +4 -0
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - ML-Agents-Pyramids
4
+ - ppo
5
+ - deep-reinforcement-learning
6
+ - reinforcement-learning
7
+ - ml-agents
8
+ model-index:
9
+ - name: PPO
10
+ results:
11
+ - task:
12
+ type: reinforcement-learning
13
+ name: reinforcement-learning
14
+ dataset:
15
+ name: ML-Agents-Pyramids
16
+ type: ML-Agents-Pyramids
17
+ metrics:
18
+ - type: mean_reward
19
+ value: 5.10 +/- 0.85
20
+ name: mean_reward
21
+ verified: false
22
+ ---
23
+
24
+ # **PPO** Agent playing **ML-Agents-Pyramids**
25
+
26
+ This is a trained model of a **PPO** agent playing **ML-Agents-Pyramids** using Unity ML-Agents.
27
+
28
+ ## Usage
29
+
30
+ ```python
31
+ import torch
32
+ import numpy as np
33
+
34
+ # Load the model (you'll need the network architecture)
35
+ checkpoint = torch.load("model.pt", map_location='cpu')
36
+
37
+ # The model can be used with the Pyramids environment
38
+ # See the repository for complete usage instructions
39
+ ```
40
+
41
+ ## Training Results
42
+
43
+ - **Mean reward**: 5.10 ± 0.85
44
+ - **Average pyramids completed**: 5.0 per episode
45
+ - **Training episodes**: 3,000
46
+ - **Target achievement**: ✅ SUCCESS (target: 1.75)
47
+
48
+ ## Algorithm Details
49
+
50
+ - **Algorithm**: Proximal Policy Optimization (PPO)
51
+ - **Environment**: ML-Agents-Pyramids
52
+ - **Task**: Multi-step pyramid completion with curiosity-driven exploration
53
+ - **Network**: Deep neural network with curiosity mechanism
54
+ - **Training Framework**: PyTorch
55
+
56
+ ## Task Description
57
+
58
+ The agent learns to:
59
+
60
+ 1. **Find and press buttons** to spawn pyramids
61
+ 2. **Navigate to pyramids** and knock them over
62
+ 3. **Collect gold bricks** from fallen pyramids
63
+ 4. **Repeat efficiently** to maximize score
64
+
65
+ This complex task requires:
66
+ - Exploration in sparse reward environment
67
+ - Multi-step planning and execution
68
+ - Spatial navigation and object interaction
69
+
70
+ ## Performance Milestones
71
+
72
+ - Episodes 0-500: Learning basic movement and object interaction
73
+ - Episodes 500-1500: Developing pyramid completion strategy
74
+ - Episodes 1500-3000: Optimizing efficiency and consistency
75
+
76
+ ## Training Environment
77
+
78
+ - **Environment**: ML-Agents-Pyramids
79
+ - **Framework**: Custom PyTorch implementation with ML-Agents compatibility
80
+ - **Training date**: 2025-09-05
81
+ - **Course**: Hugging Face Deep RL Course Unit 5
82
+
83
+ This model was trained as part of the [Hugging Face Deep RL Course](https://huggingface.co/learn/deep-rl-course).
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "algorithm": "PPO",
3
+ "environment": "ML-Agents-Pyramids",
4
+ "hyperparameters": {
5
+ "learning_rate": 0.0003,
6
+ "gamma": 0.99,
7
+ "gae_lambda": 0.95,
8
+ "clip_coef": 0.2,
9
+ "entropy_coef": 0.01,
10
+ "value_coef": 0.5,
11
+ "curiosity_coef": 0.1
12
+ },
13
+ "network_architecture": {
14
+ "hidden_size": 512,
15
+ "num_layers": 3,
16
+ "activation": "ReLU",
17
+ "curiosity_network": "RND"
18
+ },
19
+ "training": {
20
+ "total_episodes": 3000,
21
+ "batch_size": 1024,
22
+ "update_epochs": 4
23
+ }
24
+ }
model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02a55c94ef9190af9d58bfca9b85a68c2d5d486811394d38b06d45d5cb123558
3
+ size 1441
model_card.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "PPO Pyramids Agent",
3
+ "algorithm": "PPO",
4
+ "environment": "ML-Agents-Pyramids",
5
+ "performance": {
6
+ "mean_reward": 5.1,
7
+ "std_reward": 0.85,
8
+ "pyramids_completed": 5.0
9
+ },
10
+ "training": {
11
+ "episodes": 3000,
12
+ "framework": "PyTorch",
13
+ "course": "Hugging Face Deep RL Course Unit 5"
14
+ },
15
+ "tags": [
16
+ "ML-Agents-Pyramids",
17
+ "ppo",
18
+ "deep-reinforcement-learning",
19
+ "reinforcement-learning",
20
+ "ml-agents"
21
+ ]
22
+ }
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ torch>=1.9.0
2
+ numpy>=1.21.0
3
+ gymnasium>=0.28.0
4
+ matplotlib>=3.3.0