| # Unity ML-Agents PettingZoo Wrapper | |
| With the increasing interest in multi-agent training with a gym-like API, we provide a | |
| PettingZoo Wrapper around the [Petting Zoo API](https://www.pettingzoo.ml/). Our wrapper | |
| provides interfaces on top of our `UnityEnvironment` class, which is the default way of | |
| interfacing with a Unity environment via Python. | |
| ## Installation and Examples | |
| The PettingZoo wrapper is part of the `mlgents_envs` package. Please refer to the | |
| [mlagents_envs installation instructions](ML-Agents-Envs-README.md). | |
| [[Colab] PettingZoo Wrapper Example](https://colab.research.google.com/github/Unity-Technologies/ml-agents/blob/develop-python-api-ga/ml-agents-envs/colabs/Colab_PettingZoo.ipynb) | |
| This colab notebook demonstrates the example usage of the wrapper, including installation, | |
| basic usages, and an example with our | |
| [Striker vs Goalie environment](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Examples.md#strikers-vs-goalie) | |
| which is a multi-agents environment with multiple different behavior names. | |
| ## API interface | |
| This wrapper is compatible with PettingZoo API. Please check out | |
| [PettingZoo API page](https://www.pettingzoo.ml/api) for more details. | |
| Here's an example of interacting with wrapped environment: | |
| ```python | |
| from mlagents_envs.environment import UnityEnvironment | |
| from mlagents_envs.envs import UnityToPettingZooWrapper | |
| unity_env = UnityEnvironment("StrikersVsGoalie") | |
| env = UnityToPettingZooWrapper(unity_env) | |
| env.reset() | |
| for agent in env.agent_iter(): | |
| observation, reward, done, info = env.last() | |
| action = policy(observation, agent) | |
| env.step(action) | |
| ``` | |
| ## Notes | |
| - There is support for both [AEC](https://www.pettingzoo.ml/api#interacting-with-environments) | |
| and [Parallel](https://www.pettingzoo.ml/api#parallel-api) PettingZoo APIs. | |
| - The AEC wrapper is compatible with PettingZoo (PZ) API interface but works in a slightly | |
| different way under the hood. For the AEC API, Instead of stepping the environment in every `env.step(action)`, | |
| the PZ wrapper will store the action, and will only perform environment stepping when all the | |
| agents requesting for actions in the current step have been assigned an action. This is for | |
| performance, considering that the communication between Unity and python is more efficient | |
| when data are sent in batches. | |
| - Since the actions for the AEC wrapper are stored without applying them to the environment until | |
| all the actions are queued, some components of the API might behave in unexpected way. For example, a call | |
| to `env.reward` should return the instantaneous reward for that particular step, but the true | |
| reward would only be available when an actual environment step is performed. It's recommended that | |
| you follow the API definition for training (access rewards from `env.last()` instead of | |
| `env.reward`) and the underlying mechanism shouldn't affect training results. | |
| - The environments will automatically reset when it's done, so `env.agent_iter(max_step)` will | |
| keep going on until the specified max step is reached (default: `2**63`). There is no need to | |
| call `env.reset()` except for the very beginning of instantiating an environment. | |