--- library_name: ml-agents tags: - SnowballTarget - deep-reinforcement-learning - reinforcement-learning - ML-Agents-SnowballTarget --- # PPO-SnowballTarget Reinforcement Learning Model ## Model Description This model is a Proximal Policy Optimization (PPO) agent trained to play the SnowballTarget environment from Unity ML-Agents. The agent, named Julien the Bear 🐻, learns to accurately throw snowballs at spawning targets to maximize rewards. ## Model Details ### Model Architecture - **Algorithm**: Proximal Policy Optimization (PPO) - **Framework**: Unity ML-Agents with PyTorch backend - **Agent**: Julien the Bear (3D character) - **Policy Network**: Actor-Critic architecture - Actor: Outputs action probabilities - Critic: Estimates state values for advantage calculation ### Environment: SnowballTarget SnowballTarget is an environment created at Hugging Face using assets from Kay Lousberg where you train an agent called Julien the bear 🐻 that learns to hit targets with snowballs. **Environment Details:** - **Objective**: Train Julien the Bear to accurately throw snowballs at targets - **Setting**: 3D winter environment with spawning targets - **Agent**: Single agent (Julien the Bear) - **Targets**: Dynamically spawning targets that need to be hit with snowballs ### Observation Space The agent observes: - Agent's position and rotation - Target positions and states - Snowball trajectory information - Environmental spatial relationships - Ray-cast sensors for spatial awareness ### Action Space - **Continuous Actions**: Aiming direction and throw force - **Action Dimensions**: Typically 2-3 continuous values - Horizontal aiming angle - Vertical aiming angle - Throw force/power ### Reward Structure - **Positive Rewards**: - +1.0 for hitting a target - Distance-based reward bonuses for accurate shots - **Negative Rewards**: - Small time penalty to encourage efficiency - Penalty for missing targets ## Training Configuration ### PPO Hyperparameters - **Algorithm**: Proximal Policy Optimization (PPO) - **Training Framework**: Unity ML-Agents - **Batch Size**: Typical ML-Agents default (1024-2048) - **Learning Rate**: Adaptive (typically 3e-4) - **Entropy Coefficient**: Encourages exploration - **Value Function Coefficient**: Balances actor-critic training - **PPO Clipping**: ε = 0.2 (standard PPO clipping range) ### Training Process - **Environment**: Unity ML-Agents SnowballTarget - **Training Method**: Parallel environment instances - **Episode Length**: Variable (until all targets hit or timeout) - **Success Criteria**: Consistent target hitting accuracy ## Performance Metrics The model is evaluated based on: - **Hit Accuracy**: Percentage of targets successfully hit - **Average Reward**: Cumulative reward per episode - **Training Stability**: Consistent improvement over training steps - **Efficiency**: Time to hit targets (faster is better) ### Expected Performance - **Target Hit Rate**: >80% accuracy on target hitting - **Convergence**: Stable policy after sufficient training episodes - **Generalization**: Ability to hit targets in various positions ## Usage ### Loading the Model ```python from mlagents_envs import UnityToPythonWrapper from mlagents_envs.side_channel.engine_configuration_channel import EngineConfigurationChannel # Load the trained model # Model files should include .onnx policy file and configuration ``` ### Resume the training ```bash mlagents-learn --run-id= --resume ``` ### Running Inference ```python # The model can be used directly in Unity ML-Agents environments # or deployed to Unity builds for real-time inference ``` ## Technical Implementation ### PPO Algorithm Features - **Policy Clipping**: Prevents large policy updates - **Advantage Estimation**: GAE (Generalized Advantage Estimation) - **Value Function**: Shared network with actor for efficiency - **Batch Training**: Multiple parallel environments for sample efficiency ### Unity ML-Agents Integration - **Python API**: Training through Python interface - **Unity Side**: Real-time environment simulation - **Observation Collection**: Automated sensor data gathering - **Action Execution**: Smooth character animation and physics ## Files Structure ``` ├── SnowballTarget.onnx # Trained policy network ├── configuration.yaml # Training configuration ├── run_logs/ # Training metrics and logs └── results/ # Training results and statistics ``` ## Limitations and Considerations 1. **Environment Specific**: Model is trained specifically for SnowballTarget environment 2. **Unity Dependency**: Requires Unity ML-Agents framework for deployment 3. **Physics Sensitivity**: Performance may vary with different physics settings 4. **Target Patterns**: May not generalize to significantly different target spawn patterns ## Applications - **Game AI**: Can be integrated into Unity games as intelligent NPC behavior - **Educational**: Demonstrates reinforcement learning in 3D environments - **Research**: Benchmark for continuous control and aiming tasks - **Interactive Demos**: Can be deployed in web builds for demonstrations ## Ethical Considerations This model represents a benign gaming scenario with no ethical concerns: - **Content**: Family-friendly winter sports theme - **Violence**: Non-violent snowball throwing activity - **Educational Value**: Suitable for learning about AI and reinforcement learning ## Unity ML-Agents Version Compatibility - **ML-Agents**: Compatible with Unity ML-Agents toolkit - **Unity Version**: Works with Unity 2021.3+ LTS - **Python Package**: Requires `mlagents` Python package ## Training Environment - **Unity Editor**: 3D environment simulation - **ML-Agents**: Python training interface - **Hardware**: GPU-accelerated training recommended - **Parallel Environments**: Multiple instances for efficient training ## Citation If you use this model, please cite: ```bibtex @misc{ppo-snowballtarget-2024, title={PPO-SnowballTarget: Reinforcement Learning Agent for Unity ML-Agents}, author={Adilbai}, year={2024}, publisher={Hugging Face Hub}, url={https://huggingface.co/Adilbai/ppo-SnowballTarget} } ``` ## References - Schulman, J., et al. (2017). Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347. - Unity Technologies. Unity ML-Agents Toolkit. https://github.com/Unity-Technologies/ml-agents - Hugging Face Deep RL Course: https://huggingface.co/learn/deep-rl-course - Kay Lousberg (Environment Assets): https://www.kaylousberg.com/