REINFORCE Agent โ€“ Pixelcopter-PLE-v0

Trained with the REINFORCE (Monte-Carlo Policy Gradient) algorithm as part of the HuggingFace Deep RL Course.

Results

Mean reward Std reward
27.50 21.88

Hyperparameters

{
  "h_size": 64,
  "n_training_episodes": 50000,
  "n_evaluation_episodes": 10,
  "max_t": 10000,
  "gamma": 0.99,
  "lr": 0.0001,
  "env_id": "Pixelcopter-PLE-v0",
  "state_space": 7,
  "action_space": 2
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Evaluation results