REINFORCE Agent on Pixelcopter-PLE-v0
This repository contains a REINFORCE (policy gradient) agent trained on Pixelcopter-PLE-v0.
Evaluation
- Mean reward: 48.95 ± 42.79
- Episodes: 20
Algorithm
- Monte Carlo Policy Gradient
- Stochastic policy
- PyTorch implementation