A2C Agent playing PandaReachDense-v3

This is a trained model of an A2C (Advantage Actor-Critic) agent playing PandaReachDense-v3 using the stable-baselines3 library and Panda-Gym.

Environment Description

The PandaReachDense-v3 environment features a Franka Emika Panda robotic arm that must place its end-effector at a target position (green ball). This is a continuous control task with:

  • Observation space: Dictionary containing achieved_goal, desired_goal, and observation (position + velocity)
  • Action space: 3-dimensional continuous control (x, y, z displacement)
  • Reward: Dense reward based on distance to target

Training Results

Metric Value
Mean Reward -0.35
Std Reward ± 0.12
Evaluation Episodes 10

Hyperparameters

Downloads last month
31
Video Preview
loading

Evaluation results