--- library_name: ml-agents tags: - SnowballTarget - deep-reinforcement-learning - reinforcement-learning - ML-Agents-SnowballTarget license: apache-2.0 --- ![8s6tgwmc.png](https://cdn-uploads.huggingface.co/production/uploads/6493577a357b252af725bf67/wQNbXcvUaoEuV6FtWu9rS.png) # **ppo** Agent playing **SnowballTarget** This is a trained model of a **ppo** agent playing **SnowballTarget** using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents). ## Usage (with ML-Agents) The Documentation: https://unity-technologies.github.io/ml-agents/ML-Agents-Toolkit-Documentation/ ### Watch the Agent play You can watch the agent **playing directly in your browser** 1. Go to https://huggingface.co/spaces/ThomasSimonini/ML-Agents-SnowballTarget 2. Step 1: Find the model_id: Francesco-A/ppo-SnowballTarget-v1 3. Step 2: Select the *.nn /*.onnx file 4. Click on Watch the agent play ## Training hyperparameters ```python behaviors: SnowballTarget: trainer_type: ppo summary_freq: 10000 keep_checkpoints: 10 checkpoint_interval: 55000 max_steps: 250000 time_horizon: 64 threaded: true hyperparameters: learning_rate: 0.0003 learning_rate_schedule: linear batch_size: 128 buffer_size: 2048 beta: 0.005 epsilon: 0.2 lambd: 0.95 num_epoch: 3 network_settings: normalize: false hidden_units: 256 num_layers: 2 vis_encode_type: simple reward_signals: extrinsic: gamma: 0.99 strength: 1.0 ``` ## Training details | Step | Time Elapsed | Mean Reward | Std of Reward | Status | |---------|--------------|-------------|---------------|-----------| | 10000 | 29.079 s | 3.636 | 1.746 | Training | | 20000 | 55.042 s | 7.164 | 2.661 | Training | | 30000 | 77.884 s | 9.818 | 2.534 | Training | | 40000 | 103.229 s | 11.509 | 2.263 | Training | | 50000 | 127.046 s | 14.659 | 2.495 | Training | | 60000 | 150.811 s | 15.655 | 2.414 | Training | | 70000 | 174.292 s | 16.955 | 2.540 | Training | | 80000 | 198.938 s | 18.091 | 2.481 | Training | | 90000 | 221.915 s | 19.182 | 3.143 | Training | | 100000 | 246.203 s | 21.182 | 2.724 | Training | | 110000 | 271.024 s | 22.463 | 2.250 | Training | | 120000 | 292.551 s | 24.044 | 2.190 | Training | | 130000 | 317.539 s | 24.291 | 2.103 | Training | | 140000 | 340.057 s | 24.455 | 4.423 | Training | | 150000 | 366.645 s | 25.236 | 2.358 | Training | | 160000 | 390.192 s | 25.000 | 1.895 | Training | | 170000 | 414.326 s | 25.273 | 2.482 | Training | | 180000 | 438.103 s | 25.750 | 1.798 | Training | | 190000 | 462.837 s | 25.673 | 1.888 | Training | | 200000 | 485.258 s | 25.295 | 2.380 | Training | | 210000 | 509.542 s | 25.855 | 2.066 | Training | | 220000 | 535.202 s | 26.111 | 1.931 | Training | | 230000 | 556.965 s | 25.644 | 2.252 | Training | | 240000 | 582.135 s | 26.018 | 2.673 | Training | | 250000 | 604.248 s | 26.091 | 1.917 | Training |