ppo-Pyramids-Training / docs /Python-PettingZoo-API.md

Second Push

05c9ac2 over 2 years ago

3.21 kB

	# Unity ML-Agents PettingZoo Wrapper

	With the increasing interest in multi-agent training with a gym-like API, we provide a
	PettingZoo Wrapper around the [Petting Zoo API](https://www.pettingzoo.ml/). Our wrapper
	provides interfaces on top of our `UnityEnvironment` class, which is the default way of
	interfacing with a Unity environment via Python.

	## Installation and Examples

	The PettingZoo wrapper is part of the `mlgents_envs` package. Please refer to the
	[mlagents_envs installation instructions](ML-Agents-Envs-README.md).

	[[Colab] PettingZoo Wrapper Example](https://colab.research.google.com/github/Unity-Technologies/ml-agents/blob/develop-python-api-ga/ml-agents-envs/colabs/Colab_PettingZoo.ipynb)

	This colab notebook demonstrates the example usage of the wrapper, including installation,
	basic usages, and an example with our
	[Striker vs Goalie environment](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Examples.md#strikers-vs-goalie)
	which is a multi-agents environment with multiple different behavior names.

	## API interface

	This wrapper is compatible with PettingZoo API. Please check out
	[PettingZoo API page](https://www.pettingzoo.ml/api) for more details.
	Here's an example of interacting with wrapped environment:

	```python
	from mlagents_envs.environment import UnityEnvironment
	from mlagents_envs.envs import UnityToPettingZooWrapper

	unity_env = UnityEnvironment("StrikersVsGoalie")
	env = UnityToPettingZooWrapper(unity_env)
	env.reset()
	for agent in env.agent_iter():
	observation, reward, done, info = env.last()
	action = policy(observation, agent)
	env.step(action)
	```

	## Notes
	- There is support for both [AEC](https://www.pettingzoo.ml/api#interacting-with-environments)
	and [Parallel](https://www.pettingzoo.ml/api#parallel-api) PettingZoo APIs.
	- The AEC wrapper is compatible with PettingZoo (PZ) API interface but works in a slightly
	different way under the hood. For the AEC API, Instead of stepping the environment in every `env.step(action)`,
	the PZ wrapper will store the action, and will only perform environment stepping when all the
	agents requesting for actions in the current step have been assigned an action. This is for
	performance, considering that the communication between Unity and python is more efficient
	when data are sent in batches.
	- Since the actions for the AEC wrapper are stored without applying them to the environment until
	all the actions are queued, some components of the API might behave in unexpected way. For example, a call
	to `env.reward` should return the instantaneous reward for that particular step, but the true
	reward would only be available when an actual environment step is performed. It's recommended that
	you follow the API definition for training (access rewards from `env.last()` instead of
	`env.reward`) and the underlying mechanism shouldn't affect training results.
	- The environments will automatically reset when it's done, so `env.agent_iter(max_step)` will
	keep going on until the specified max step is reached (default: `2**63`). There is no need to
	call `env.reset()` except for the very beginning of instantiating an environment.