REINFORCE Agent โ Pixelcopter-PLE-v0
Trained with the REINFORCE (Monte-Carlo Policy Gradient) algorithm as part of the HuggingFace Deep RL Course.
Results
| Mean reward | Std reward |
|---|---|
| 27.50 | 21.88 |
Hyperparameters
{
"h_size": 64,
"n_training_episodes": 50000,
"n_evaluation_episodes": 10,
"max_t": 10000,
"gamma": 0.99,
"lr": 0.0001,
"env_id": "Pixelcopter-PLE-v0",
"state_space": 7,
"action_space": 2
}
Evaluation results
- mean_reward on Pixelcopter-PLE-v0self-reported27.50 +/- 21.88