ppo-SnowballTarget / README.md
Adilbai's picture
Update README.md
ec3c4bb verified
---
library_name: ml-agents
tags:
- SnowballTarget
- deep-reinforcement-learning
- reinforcement-learning
- ML-Agents-SnowballTarget
---
# PPO-SnowballTarget Reinforcement Learning Model
## Model Description
This model is a Proximal Policy Optimization (PPO) agent trained to play the SnowballTarget environment from Unity ML-Agents. The agent, named Julien the Bear 🐻, learns to accurately throw snowballs at spawning targets to maximize rewards.
## Model Details
### Model Architecture
- **Algorithm**: Proximal Policy Optimization (PPO)
- **Framework**: Unity ML-Agents with PyTorch backend
- **Agent**: Julien the Bear (3D character)
- **Policy Network**: Actor-Critic architecture
- Actor: Outputs action probabilities
- Critic: Estimates state values for advantage calculation
### Environment: SnowballTarget
SnowballTarget is an environment created at Hugging Face using assets from Kay Lousberg where you train an agent called Julien the bear 🐻 that learns to hit targets with snowballs.
**Environment Details:**
- **Objective**: Train Julien the Bear to accurately throw snowballs at targets
- **Setting**: 3D winter environment with spawning targets
- **Agent**: Single agent (Julien the Bear)
- **Targets**: Dynamically spawning targets that need to be hit with snowballs
### Observation Space
The agent observes:
- Agent's position and rotation
- Target positions and states
- Snowball trajectory information
- Environmental spatial relationships
- Ray-cast sensors for spatial awareness
### Action Space
- **Continuous Actions**: Aiming direction and throw force
- **Action Dimensions**: Typically 2-3 continuous values
- Horizontal aiming angle
- Vertical aiming angle
- Throw force/power
### Reward Structure
- **Positive Rewards**:
- +1.0 for hitting a target
- Distance-based reward bonuses for accurate shots
- **Negative Rewards**:
- Small time penalty to encourage efficiency
- Penalty for missing targets
## Training Configuration
### PPO Hyperparameters
- **Algorithm**: Proximal Policy Optimization (PPO)
- **Training Framework**: Unity ML-Agents
- **Batch Size**: Typical ML-Agents default (1024-2048)
- **Learning Rate**: Adaptive (typically 3e-4)
- **Entropy Coefficient**: Encourages exploration
- **Value Function Coefficient**: Balances actor-critic training
- **PPO Clipping**: Ξ΅ = 0.2 (standard PPO clipping range)
### Training Process
- **Environment**: Unity ML-Agents SnowballTarget
- **Training Method**: Parallel environment instances
- **Episode Length**: Variable (until all targets hit or timeout)
- **Success Criteria**: Consistent target hitting accuracy
## Performance Metrics
The model is evaluated based on:
- **Hit Accuracy**: Percentage of targets successfully hit
- **Average Reward**: Cumulative reward per episode
- **Training Stability**: Consistent improvement over training steps
- **Efficiency**: Time to hit targets (faster is better)
### Expected Performance
- **Target Hit Rate**: >80% accuracy on target hitting
- **Convergence**: Stable policy after sufficient training episodes
- **Generalization**: Ability to hit targets in various positions
## Usage
### Loading the Model
```python
from mlagents_envs import UnityToPythonWrapper
from mlagents_envs.side_channel.engine_configuration_channel import EngineConfigurationChannel
# Load the trained model
# Model files should include .onnx policy file and configuration
```
### Resume the training
```bash
mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume
```
### Running Inference
```python
# The model can be used directly in Unity ML-Agents environments
# or deployed to Unity builds for real-time inference
```
## Technical Implementation
### PPO Algorithm Features
- **Policy Clipping**: Prevents large policy updates
- **Advantage Estimation**: GAE (Generalized Advantage Estimation)
- **Value Function**: Shared network with actor for efficiency
- **Batch Training**: Multiple parallel environments for sample efficiency
### Unity ML-Agents Integration
- **Python API**: Training through Python interface
- **Unity Side**: Real-time environment simulation
- **Observation Collection**: Automated sensor data gathering
- **Action Execution**: Smooth character animation and physics
## Files Structure
```
β”œβ”€β”€ SnowballTarget.onnx # Trained policy network
β”œβ”€β”€ configuration.yaml # Training configuration
β”œβ”€β”€ run_logs/ # Training metrics and logs
└── results/ # Training results and statistics
```
## Limitations and Considerations
1. **Environment Specific**: Model is trained specifically for SnowballTarget environment
2. **Unity Dependency**: Requires Unity ML-Agents framework for deployment
3. **Physics Sensitivity**: Performance may vary with different physics settings
4. **Target Patterns**: May not generalize to significantly different target spawn patterns
## Applications
- **Game AI**: Can be integrated into Unity games as intelligent NPC behavior
- **Educational**: Demonstrates reinforcement learning in 3D environments
- **Research**: Benchmark for continuous control and aiming tasks
- **Interactive Demos**: Can be deployed in web builds for demonstrations
## Ethical Considerations
This model represents a benign gaming scenario with no ethical concerns:
- **Content**: Family-friendly winter sports theme
- **Violence**: Non-violent snowball throwing activity
- **Educational Value**: Suitable for learning about AI and reinforcement learning
## Unity ML-Agents Version Compatibility
- **ML-Agents**: Compatible with Unity ML-Agents toolkit
- **Unity Version**: Works with Unity 2021.3+ LTS
- **Python Package**: Requires `mlagents` Python package
## Training Environment
- **Unity Editor**: 3D environment simulation
- **ML-Agents**: Python training interface
- **Hardware**: GPU-accelerated training recommended
- **Parallel Environments**: Multiple instances for efficient training
## Citation
If you use this model, please cite:
```bibtex
@misc{ppo-snowballtarget-2024,
title={PPO-SnowballTarget: Reinforcement Learning Agent for Unity ML-Agents},
author={Adilbai},
year={2024},
publisher={Hugging Face Hub},
url={https://huggingface.co/Adilbai/ppo-SnowballTarget}
}
```
## References
- Schulman, J., et al. (2017). Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347.
- Unity Technologies. Unity ML-Agents Toolkit. https://github.com/Unity-Technologies/ml-agents
- Hugging Face Deep RL Course: https://huggingface.co/learn/deep-rl-course
- Kay Lousberg (Environment Assets): https://www.kaylousberg.com/