REINFORCE Agent – Pixelcopter-PLE-v0

Trained with the REINFORCE (Monte-Carlo Policy Gradient) algorithm as part of the HuggingFace Deep RL Course.

Results

Mean reward	Std reward
27.50	21.88

Hyperparameters

{
  "h_size": 64,
  "n_training_episodes": 50000,
  "n_evaluation_episodes": 10,
  "max_t": 10000,
  "gamma": 0.99,
  "lr": 0.0001,
  "env_id": "Pixelcopter-PLE-v0",
  "state_space": 7,
  "action_space": 2
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on Pixelcopter-PLE-v0
self-reported

27.50 +/- 21.88