This is a REINFORCE agent trained on Pixelcopter (PLE) using a custom environment wrapper.
Mean reward: 4.30 ± 4.03
-