File size: 6,651 Bytes

---
library_name: ml-agents
tags:
- SnowballTarget
- deep-reinforcement-learning
- reinforcement-learning
- ML-Agents-SnowballTarget
---
  # PPO-SnowballTarget Reinforcement Learning Model

## Model Description

This model is a Proximal Policy Optimization (PPO) agent trained to play the SnowballTarget environment from Unity ML-Agents. The agent, named Julien the Bear 🐻, learns to accurately throw snowballs at spawning targets to maximize rewards.

## Model Details

### Model Architecture
- **Algorithm**: Proximal Policy Optimization (PPO)
- **Framework**: Unity ML-Agents with PyTorch backend
- **Agent**: Julien the Bear (3D character)
- **Policy Network**: Actor-Critic architecture
  - Actor: Outputs action probabilities
  - Critic: Estimates state values for advantage calculation

### Environment: SnowballTarget

SnowballTarget is an environment created at Hugging Face using assets from Kay Lousberg where you train an agent called Julien the bear 🐻 that learns to hit targets with snowballs.

**Environment Details:**
- **Objective**: Train Julien the Bear to accurately throw snowballs at targets
- **Setting**: 3D winter environment with spawning targets
- **Agent**: Single agent (Julien the Bear)
- **Targets**: Dynamically spawning targets that need to be hit with snowballs

### Observation Space
The agent observes:
- Agent's position and rotation
- Target positions and states
- Snowball trajectory information
- Environmental spatial relationships
- Ray-cast sensors for spatial awareness

### Action Space
- **Continuous Actions**: Aiming direction and throw force
- **Action Dimensions**: Typically 2-3 continuous values
  - Horizontal aiming angle
  - Vertical aiming angle  
  - Throw force/power

### Reward Structure
- **Positive Rewards**: 
  - +1.0 for hitting a target
  - Distance-based reward bonuses for accurate shots
- **Negative Rewards**:
  - Small time penalty to encourage efficiency
  - Penalty for missing targets

## Training Configuration

### PPO Hyperparameters
- **Algorithm**: Proximal Policy Optimization (PPO)
- **Training Framework**: Unity ML-Agents
- **Batch Size**: Typical ML-Agents default (1024-2048)
- **Learning Rate**: Adaptive (typically 3e-4)
- **Entropy Coefficient**: Encourages exploration
- **Value Function Coefficient**: Balances actor-critic training
- **PPO Clipping**: ε = 0.2 (standard PPO clipping range)

### Training Process
- **Environment**: Unity ML-Agents SnowballTarget
- **Training Method**: Parallel environment instances
- **Episode Length**: Variable (until all targets hit or timeout)
- **Success Criteria**: Consistent target hitting accuracy

## Performance Metrics

The model is evaluated based on:
- **Hit Accuracy**: Percentage of targets successfully hit
- **Average Reward**: Cumulative reward per episode
- **Training Stability**: Consistent improvement over training steps
- **Efficiency**: Time to hit targets (faster is better)

### Expected Performance
- **Target Hit Rate**: >80% accuracy on target hitting
- **Convergence**: Stable policy after sufficient training episodes
- **Generalization**: Ability to hit targets in various positions

## Usage

### Loading the Model
```python
from mlagents_envs import UnityToPythonWrapper
from mlagents_envs.side_channel.engine_configuration_channel import EngineConfigurationChannel

# Load the trained model
# Model files should include .onnx policy file and configuration
```
### Resume the training
```bash
mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume
```

### Running Inference
```python
# The model can be used directly in Unity ML-Agents environments
# or deployed to Unity builds for real-time inference
```

## Technical Implementation

### PPO Algorithm Features
- **Policy Clipping**: Prevents large policy updates
- **Advantage Estimation**: GAE (Generalized Advantage Estimation)
- **Value Function**: Shared network with actor for efficiency
- **Batch Training**: Multiple parallel environments for sample efficiency

### Unity ML-Agents Integration
- **Python API**: Training through Python interface
- **Unity Side**: Real-time environment simulation
- **Observation Collection**: Automated sensor data gathering
- **Action Execution**: Smooth character animation and physics

## Files Structure

```
├── SnowballTarget.onnx          # Trained policy network
├── configuration.yaml          # Training configuration
├── run_logs/                   # Training metrics and logs
└── results/                    # Training results and statistics
```

## Limitations and Considerations

1. **Environment Specific**: Model is trained specifically for SnowballTarget environment
2. **Unity Dependency**: Requires Unity ML-Agents framework for deployment
3. **Physics Sensitivity**: Performance may vary with different physics settings
4. **Target Patterns**: May not generalize to significantly different target spawn patterns

## Applications

- **Game AI**: Can be integrated into Unity games as intelligent NPC behavior
- **Educational**: Demonstrates reinforcement learning in 3D environments
- **Research**: Benchmark for continuous control and aiming tasks
- **Interactive Demos**: Can be deployed in web builds for demonstrations

## Ethical Considerations

This model represents a benign gaming scenario with no ethical concerns:
- **Content**: Family-friendly winter sports theme
- **Violence**: Non-violent snowball throwing activity
- **Educational Value**: Suitable for learning about AI and reinforcement learning

## Unity ML-Agents Version Compatibility

- **ML-Agents**: Compatible with Unity ML-Agents toolkit
- **Unity Version**: Works with Unity 2021.3+ LTS
- **Python Package**: Requires `mlagents` Python package

## Training Environment

- **Unity Editor**: 3D environment simulation
- **ML-Agents**: Python training interface
- **Hardware**: GPU-accelerated training recommended
- **Parallel Environments**: Multiple instances for efficient training

## Citation

If you use this model, please cite:

```bibtex
@misc{ppo-snowballtarget-2024,
  title={PPO-SnowballTarget: Reinforcement Learning Agent for Unity ML-Agents},
  author={Adilbai},
  year={2024},
  publisher={Hugging Face Hub},
  url={https://huggingface.co/Adilbai/ppo-SnowballTarget}
}
```

## References

- Schulman, J., et al. (2017). Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347.
- Unity Technologies. Unity ML-Agents Toolkit. https://github.com/Unity-Technologies/ml-agents
- Hugging Face Deep RL Course: https://huggingface.co/learn/deep-rl-course
- Kay Lousberg (Environment Assets): https://www.kaylousberg.com/