Francesco-A
/

ppo-SnowballTarget-v1

Reinforcement Learning

deep-reinforcement-learning

ML-Agents-SnowballTarget

Model card Files Files and versions

Metrics Training metrics Community

ppo-SnowballTarget-v1 / README.md

Francesco-A's picture

Update README.md

4e5f2ae over 2 years ago

|

history blame contribute delete

3.46 kB

	---
	library_name: ml-agents
	tags:
	- SnowballTarget
	- deep-reinforcement-learning
	- reinforcement-learning
	- ML-Agents-SnowballTarget
	license: apache-2.0
	---

	![8s6tgwmc.png](https://cdn-uploads.huggingface.co/production/uploads/6493577a357b252af725bf67/wQNbXcvUaoEuV6FtWu9rS.png)

	# ppo Agent playing SnowballTarget
	This is a trained model of a ppo agent playing SnowballTarget
	using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents).

	## Usage (with ML-Agents)
	The Documentation: https://unity-technologies.github.io/ml-agents/ML-Agents-Toolkit-Documentation/

	### Watch the Agent play
	You can watch the agent playing directly in your browser

	1. Go to https://huggingface.co/spaces/ThomasSimonini/ML-Agents-SnowballTarget
	2. Step 1: Find the model_id: Francesco-A/ppo-SnowballTarget-v1
	3. Step 2: Select the .nn /.onnx file
	4. Click on Watch the agent play

	## Training hyperparameters

	```python
	behaviors:
	SnowballTarget:
	trainer_type: ppo
	summary_freq: 10000
	keep_checkpoints: 10
	checkpoint_interval: 55000
	max_steps: 250000
	time_horizon: 64
	threaded: true
	hyperparameters:
	learning_rate: 0.0003
	learning_rate_schedule: linear
	batch_size: 128
	buffer_size: 2048
	beta: 0.005
	epsilon: 0.2
	lambd: 0.95
	num_epoch: 3
	network_settings:
	normalize: false
	hidden_units: 256
	num_layers: 2
	vis_encode_type: simple
	reward_signals:
	extrinsic:
	gamma: 0.99
	strength: 1.0
	```

	## Training details

	\| Step \| Time Elapsed \| Mean Reward \| Std of Reward \| Status \|
	\|---------\|--------------\|-------------\|---------------\|-----------\|
	\| 10000 \| 29.079 s \| 3.636 \| 1.746 \| Training \|
	\| 20000 \| 55.042 s \| 7.164 \| 2.661 \| Training \|
	\| 30000 \| 77.884 s \| 9.818 \| 2.534 \| Training \|
	\| 40000 \| 103.229 s \| 11.509 \| 2.263 \| Training \|
	\| 50000 \| 127.046 s \| 14.659 \| 2.495 \| Training \|
	\| 60000 \| 150.811 s \| 15.655 \| 2.414 \| Training \|
	\| 70000 \| 174.292 s \| 16.955 \| 2.540 \| Training \|
	\| 80000 \| 198.938 s \| 18.091 \| 2.481 \| Training \|
	\| 90000 \| 221.915 s \| 19.182 \| 3.143 \| Training \|
	\| 100000 \| 246.203 s \| 21.182 \| 2.724 \| Training \|
	\| 110000 \| 271.024 s \| 22.463 \| 2.250 \| Training \|
	\| 120000 \| 292.551 s \| 24.044 \| 2.190 \| Training \|
	\| 130000 \| 317.539 s \| 24.291 \| 2.103 \| Training \|
	\| 140000 \| 340.057 s \| 24.455 \| 4.423 \| Training \|
	\| 150000 \| 366.645 s \| 25.236 \| 2.358 \| Training \|
	\| 160000 \| 390.192 s \| 25.000 \| 1.895 \| Training \|
	\| 170000 \| 414.326 s \| 25.273 \| 2.482 \| Training \|
	\| 180000 \| 438.103 s \| 25.750 \| 1.798 \| Training \|
	\| 190000 \| 462.837 s \| 25.673 \| 1.888 \| Training \|
	\| 200000 \| 485.258 s \| 25.295 \| 2.380 \| Training \|
	\| 210000 \| 509.542 s \| 25.855 \| 2.066 \| Training \|
	\| 220000 \| 535.202 s \| 26.111 \| 1.931 \| Training \|
	\| 230000 \| 556.965 s \| 25.644 \| 2.252 \| Training \|
	\| 240000 \| 582.135 s \| 26.018 \| 2.673 \| Training \|
	\| 250000 \| 604.248 s \| 26.091 \| 1.917 \| Training \|