File size: 2,239 Bytes
bfadc6c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
tags:
- ML-Agents-Pyramids
- ppo
- deep-reinforcement-learning
- reinforcement-learning
- ml-agents
model-index:
- name: PPO
  results:
  - task:
      type: reinforcement-learning
      name: reinforcement-learning
    dataset:
      name: ML-Agents-Pyramids
      type: ML-Agents-Pyramids
    metrics:
    - type: mean_reward
      value: 5.10 +/- 0.85
      name: mean_reward
      verified: false
---

# **PPO** Agent playing **ML-Agents-Pyramids**

This is a trained model of a **PPO** agent playing **ML-Agents-Pyramids** using Unity ML-Agents.

## Usage

```python
import torch
import numpy as np

# Load the model (you'll need the network architecture)
checkpoint = torch.load("model.pt", map_location='cpu')

# The model can be used with the Pyramids environment
# See the repository for complete usage instructions
```

## Training Results

- **Mean reward**: 5.10 ± 0.85
- **Average pyramids completed**: 5.0 per episode
- **Training episodes**: 3,000
- **Target achievement**: ✅ SUCCESS (target: 1.75)

## Algorithm Details

- **Algorithm**: Proximal Policy Optimization (PPO)
- **Environment**: ML-Agents-Pyramids
- **Task**: Multi-step pyramid completion with curiosity-driven exploration
- **Network**: Deep neural network with curiosity mechanism
- **Training Framework**: PyTorch

## Task Description

The agent learns to:

1. **Find and press buttons** to spawn pyramids
2. **Navigate to pyramids** and knock them over
3. **Collect gold bricks** from fallen pyramids  
4. **Repeat efficiently** to maximize score

This complex task requires:
- Exploration in sparse reward environment
- Multi-step planning and execution
- Spatial navigation and object interaction

## Performance Milestones

- Episodes 0-500: Learning basic movement and object interaction
- Episodes 500-1500: Developing pyramid completion strategy
- Episodes 1500-3000: Optimizing efficiency and consistency

## Training Environment

- **Environment**: ML-Agents-Pyramids
- **Framework**: Custom PyTorch implementation with ML-Agents compatibility
- **Training date**: 2025-09-05
- **Course**: Hugging Face Deep RL Course Unit 5

This model was trained as part of the [Hugging Face Deep RL Course](https://huggingface.co/learn/deep-rl-course).