|
|
--- |
|
|
library_name: ml-agents |
|
|
tags: |
|
|
- SnowballTarget |
|
|
- deep-reinforcement-learning |
|
|
- reinforcement-learning |
|
|
- ML-Agents-SnowballTarget |
|
|
--- |
|
|
# PPO-SnowballTarget Reinforcement Learning Model |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is a Proximal Policy Optimization (PPO) agent trained to play the SnowballTarget environment from Unity ML-Agents. The agent, named Julien the Bear π», learns to accurately throw snowballs at spawning targets to maximize rewards. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Architecture |
|
|
- **Algorithm**: Proximal Policy Optimization (PPO) |
|
|
- **Framework**: Unity ML-Agents with PyTorch backend |
|
|
- **Agent**: Julien the Bear (3D character) |
|
|
- **Policy Network**: Actor-Critic architecture |
|
|
- Actor: Outputs action probabilities |
|
|
- Critic: Estimates state values for advantage calculation |
|
|
|
|
|
### Environment: SnowballTarget |
|
|
|
|
|
SnowballTarget is an environment created at Hugging Face using assets from Kay Lousberg where you train an agent called Julien the bear π» that learns to hit targets with snowballs. |
|
|
|
|
|
**Environment Details:** |
|
|
- **Objective**: Train Julien the Bear to accurately throw snowballs at targets |
|
|
- **Setting**: 3D winter environment with spawning targets |
|
|
- **Agent**: Single agent (Julien the Bear) |
|
|
- **Targets**: Dynamically spawning targets that need to be hit with snowballs |
|
|
|
|
|
### Observation Space |
|
|
The agent observes: |
|
|
- Agent's position and rotation |
|
|
- Target positions and states |
|
|
- Snowball trajectory information |
|
|
- Environmental spatial relationships |
|
|
- Ray-cast sensors for spatial awareness |
|
|
|
|
|
### Action Space |
|
|
- **Continuous Actions**: Aiming direction and throw force |
|
|
- **Action Dimensions**: Typically 2-3 continuous values |
|
|
- Horizontal aiming angle |
|
|
- Vertical aiming angle |
|
|
- Throw force/power |
|
|
|
|
|
### Reward Structure |
|
|
- **Positive Rewards**: |
|
|
- +1.0 for hitting a target |
|
|
- Distance-based reward bonuses for accurate shots |
|
|
- **Negative Rewards**: |
|
|
- Small time penalty to encourage efficiency |
|
|
- Penalty for missing targets |
|
|
|
|
|
## Training Configuration |
|
|
|
|
|
### PPO Hyperparameters |
|
|
- **Algorithm**: Proximal Policy Optimization (PPO) |
|
|
- **Training Framework**: Unity ML-Agents |
|
|
- **Batch Size**: Typical ML-Agents default (1024-2048) |
|
|
- **Learning Rate**: Adaptive (typically 3e-4) |
|
|
- **Entropy Coefficient**: Encourages exploration |
|
|
- **Value Function Coefficient**: Balances actor-critic training |
|
|
- **PPO Clipping**: Ξ΅ = 0.2 (standard PPO clipping range) |
|
|
|
|
|
### Training Process |
|
|
- **Environment**: Unity ML-Agents SnowballTarget |
|
|
- **Training Method**: Parallel environment instances |
|
|
- **Episode Length**: Variable (until all targets hit or timeout) |
|
|
- **Success Criteria**: Consistent target hitting accuracy |
|
|
|
|
|
## Performance Metrics |
|
|
|
|
|
The model is evaluated based on: |
|
|
- **Hit Accuracy**: Percentage of targets successfully hit |
|
|
- **Average Reward**: Cumulative reward per episode |
|
|
- **Training Stability**: Consistent improvement over training steps |
|
|
- **Efficiency**: Time to hit targets (faster is better) |
|
|
|
|
|
### Expected Performance |
|
|
- **Target Hit Rate**: >80% accuracy on target hitting |
|
|
- **Convergence**: Stable policy after sufficient training episodes |
|
|
- **Generalization**: Ability to hit targets in various positions |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Loading the Model |
|
|
```python |
|
|
from mlagents_envs import UnityToPythonWrapper |
|
|
from mlagents_envs.side_channel.engine_configuration_channel import EngineConfigurationChannel |
|
|
|
|
|
# Load the trained model |
|
|
# Model files should include .onnx policy file and configuration |
|
|
``` |
|
|
### Resume the training |
|
|
```bash |
|
|
mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume |
|
|
``` |
|
|
|
|
|
### Running Inference |
|
|
```python |
|
|
# The model can be used directly in Unity ML-Agents environments |
|
|
# or deployed to Unity builds for real-time inference |
|
|
``` |
|
|
|
|
|
## Technical Implementation |
|
|
|
|
|
### PPO Algorithm Features |
|
|
- **Policy Clipping**: Prevents large policy updates |
|
|
- **Advantage Estimation**: GAE (Generalized Advantage Estimation) |
|
|
- **Value Function**: Shared network with actor for efficiency |
|
|
- **Batch Training**: Multiple parallel environments for sample efficiency |
|
|
|
|
|
### Unity ML-Agents Integration |
|
|
- **Python API**: Training through Python interface |
|
|
- **Unity Side**: Real-time environment simulation |
|
|
- **Observation Collection**: Automated sensor data gathering |
|
|
- **Action Execution**: Smooth character animation and physics |
|
|
|
|
|
## Files Structure |
|
|
|
|
|
``` |
|
|
βββ SnowballTarget.onnx # Trained policy network |
|
|
βββ configuration.yaml # Training configuration |
|
|
βββ run_logs/ # Training metrics and logs |
|
|
βββ results/ # Training results and statistics |
|
|
``` |
|
|
|
|
|
## Limitations and Considerations |
|
|
|
|
|
1. **Environment Specific**: Model is trained specifically for SnowballTarget environment |
|
|
2. **Unity Dependency**: Requires Unity ML-Agents framework for deployment |
|
|
3. **Physics Sensitivity**: Performance may vary with different physics settings |
|
|
4. **Target Patterns**: May not generalize to significantly different target spawn patterns |
|
|
|
|
|
## Applications |
|
|
|
|
|
- **Game AI**: Can be integrated into Unity games as intelligent NPC behavior |
|
|
- **Educational**: Demonstrates reinforcement learning in 3D environments |
|
|
- **Research**: Benchmark for continuous control and aiming tasks |
|
|
- **Interactive Demos**: Can be deployed in web builds for demonstrations |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
This model represents a benign gaming scenario with no ethical concerns: |
|
|
- **Content**: Family-friendly winter sports theme |
|
|
- **Violence**: Non-violent snowball throwing activity |
|
|
- **Educational Value**: Suitable for learning about AI and reinforcement learning |
|
|
|
|
|
## Unity ML-Agents Version Compatibility |
|
|
|
|
|
- **ML-Agents**: Compatible with Unity ML-Agents toolkit |
|
|
- **Unity Version**: Works with Unity 2021.3+ LTS |
|
|
- **Python Package**: Requires `mlagents` Python package |
|
|
|
|
|
## Training Environment |
|
|
|
|
|
- **Unity Editor**: 3D environment simulation |
|
|
- **ML-Agents**: Python training interface |
|
|
- **Hardware**: GPU-accelerated training recommended |
|
|
- **Parallel Environments**: Multiple instances for efficient training |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{ppo-snowballtarget-2024, |
|
|
title={PPO-SnowballTarget: Reinforcement Learning Agent for Unity ML-Agents}, |
|
|
author={Adilbai}, |
|
|
year={2024}, |
|
|
publisher={Hugging Face Hub}, |
|
|
url={https://huggingface.co/Adilbai/ppo-SnowballTarget} |
|
|
} |
|
|
``` |
|
|
|
|
|
## References |
|
|
|
|
|
- Schulman, J., et al. (2017). Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347. |
|
|
- Unity Technologies. Unity ML-Agents Toolkit. https://github.com/Unity-Technologies/ml-agents |
|
|
- Hugging Face Deep RL Course: https://huggingface.co/learn/deep-rl-course |
|
|
- Kay Lousberg (Environment Assets): https://www.kaylousberg.com/ |
|
|
|