File size: 6,651 Bytes
9cac426
 
 
 
 
 
 
 
ec3c4bb
9cac426
ec3c4bb
9cac426
ec3c4bb
9cac426
ec3c4bb
9cac426
ec3c4bb
 
 
 
 
 
 
9cac426
ec3c4bb
9cac426
ec3c4bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9cac426
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
library_name: ml-agents
tags:
- SnowballTarget
- deep-reinforcement-learning
- reinforcement-learning
- ML-Agents-SnowballTarget
---
  # PPO-SnowballTarget Reinforcement Learning Model

## Model Description

This model is a Proximal Policy Optimization (PPO) agent trained to play the SnowballTarget environment from Unity ML-Agents. The agent, named Julien the Bear 🐻, learns to accurately throw snowballs at spawning targets to maximize rewards.

## Model Details

### Model Architecture
- **Algorithm**: Proximal Policy Optimization (PPO)
- **Framework**: Unity ML-Agents with PyTorch backend
- **Agent**: Julien the Bear (3D character)
- **Policy Network**: Actor-Critic architecture
  - Actor: Outputs action probabilities
  - Critic: Estimates state values for advantage calculation

### Environment: SnowballTarget

SnowballTarget is an environment created at Hugging Face using assets from Kay Lousberg where you train an agent called Julien the bear 🐻 that learns to hit targets with snowballs.

**Environment Details:**
- **Objective**: Train Julien the Bear to accurately throw snowballs at targets
- **Setting**: 3D winter environment with spawning targets
- **Agent**: Single agent (Julien the Bear)
- **Targets**: Dynamically spawning targets that need to be hit with snowballs

### Observation Space
The agent observes:
- Agent's position and rotation
- Target positions and states
- Snowball trajectory information
- Environmental spatial relationships
- Ray-cast sensors for spatial awareness

### Action Space
- **Continuous Actions**: Aiming direction and throw force
- **Action Dimensions**: Typically 2-3 continuous values
  - Horizontal aiming angle
  - Vertical aiming angle  
  - Throw force/power

### Reward Structure
- **Positive Rewards**: 
  - +1.0 for hitting a target
  - Distance-based reward bonuses for accurate shots
- **Negative Rewards**:
  - Small time penalty to encourage efficiency
  - Penalty for missing targets

## Training Configuration

### PPO Hyperparameters
- **Algorithm**: Proximal Policy Optimization (PPO)
- **Training Framework**: Unity ML-Agents
- **Batch Size**: Typical ML-Agents default (1024-2048)
- **Learning Rate**: Adaptive (typically 3e-4)
- **Entropy Coefficient**: Encourages exploration
- **Value Function Coefficient**: Balances actor-critic training
- **PPO Clipping**: Ξ΅ = 0.2 (standard PPO clipping range)

### Training Process
- **Environment**: Unity ML-Agents SnowballTarget
- **Training Method**: Parallel environment instances
- **Episode Length**: Variable (until all targets hit or timeout)
- **Success Criteria**: Consistent target hitting accuracy

## Performance Metrics

The model is evaluated based on:
- **Hit Accuracy**: Percentage of targets successfully hit
- **Average Reward**: Cumulative reward per episode
- **Training Stability**: Consistent improvement over training steps
- **Efficiency**: Time to hit targets (faster is better)

### Expected Performance
- **Target Hit Rate**: >80% accuracy on target hitting
- **Convergence**: Stable policy after sufficient training episodes
- **Generalization**: Ability to hit targets in various positions

## Usage

### Loading the Model
```python
from mlagents_envs import UnityToPythonWrapper
from mlagents_envs.side_channel.engine_configuration_channel import EngineConfigurationChannel

# Load the trained model
# Model files should include .onnx policy file and configuration
```
### Resume the training
```bash
mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume
```

### Running Inference
```python
# The model can be used directly in Unity ML-Agents environments
# or deployed to Unity builds for real-time inference
```

## Technical Implementation

### PPO Algorithm Features
- **Policy Clipping**: Prevents large policy updates
- **Advantage Estimation**: GAE (Generalized Advantage Estimation)
- **Value Function**: Shared network with actor for efficiency
- **Batch Training**: Multiple parallel environments for sample efficiency

### Unity ML-Agents Integration
- **Python API**: Training through Python interface
- **Unity Side**: Real-time environment simulation
- **Observation Collection**: Automated sensor data gathering
- **Action Execution**: Smooth character animation and physics

## Files Structure

```
β”œβ”€β”€ SnowballTarget.onnx          # Trained policy network
β”œβ”€β”€ configuration.yaml          # Training configuration
β”œβ”€β”€ run_logs/                   # Training metrics and logs
└── results/                    # Training results and statistics
```

## Limitations and Considerations

1. **Environment Specific**: Model is trained specifically for SnowballTarget environment
2. **Unity Dependency**: Requires Unity ML-Agents framework for deployment
3. **Physics Sensitivity**: Performance may vary with different physics settings
4. **Target Patterns**: May not generalize to significantly different target spawn patterns

## Applications

- **Game AI**: Can be integrated into Unity games as intelligent NPC behavior
- **Educational**: Demonstrates reinforcement learning in 3D environments
- **Research**: Benchmark for continuous control and aiming tasks
- **Interactive Demos**: Can be deployed in web builds for demonstrations

## Ethical Considerations

This model represents a benign gaming scenario with no ethical concerns:
- **Content**: Family-friendly winter sports theme
- **Violence**: Non-violent snowball throwing activity
- **Educational Value**: Suitable for learning about AI and reinforcement learning

## Unity ML-Agents Version Compatibility

- **ML-Agents**: Compatible with Unity ML-Agents toolkit
- **Unity Version**: Works with Unity 2021.3+ LTS
- **Python Package**: Requires `mlagents` Python package

## Training Environment

- **Unity Editor**: 3D environment simulation
- **ML-Agents**: Python training interface
- **Hardware**: GPU-accelerated training recommended
- **Parallel Environments**: Multiple instances for efficient training

## Citation

If you use this model, please cite:

```bibtex
@misc{ppo-snowballtarget-2024,
  title={PPO-SnowballTarget: Reinforcement Learning Agent for Unity ML-Agents},
  author={Adilbai},
  year={2024},
  publisher={Hugging Face Hub},
  url={https://huggingface.co/Adilbai/ppo-SnowballTarget}
}
```

## References

- Schulman, J., et al. (2017). Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347.
- Unity Technologies. Unity ML-Agents Toolkit. https://github.com/Unity-Technologies/ml-agents
- Hugging Face Deep RL Course: https://huggingface.co/learn/deep-rl-course
- Kay Lousberg (Environment Assets): https://www.kaylousberg.com/