Use at your own risks. Those agents were trained in a weird way as it was same training than previous iteration but the player 2 actions were overridden by different pretrained opponents. Therefore for half of the environments, the agent actions didn't infuence the trajectory