A2C Agent Playing PandaReachDense-v3

This agent was trained locally with Stable-Baselines3 A2C using the Hugging Face Deep RL Course Unit 6 setup.

Results

  • Mean reward: -0.19 +/- 0.12
  • Evaluation episodes: 10
  • Timesteps: 1000000

Hyperparameters

env_id: PandaReachDense-v3
repo_id: jnforja/a2c-PandaReachDense-v3
model_name: a2c-PandaReachDense-v3
seed: 42
n_envs: 4
total_timesteps: 1000000
policy: MultiInputPolicy
model_architecture: A2C
norm_obs: true
norm_reward: true
clip_obs: 10.0
eval_episodes: 10
video_episodes: 1
min_video_seconds: 3

Files

  • a2c-PandaReachDense-v3.zip: Stable-Baselines3 A2C checkpoint.
  • vec_normalize.pkl: VecNormalize observation/reward statistics.
  • replay.mp4: rendered preview episode.
  • results.json: evaluation output.
Downloads last month
13
Video Preview
loading

Evaluation results