A2C Agent playing PandaReachDense-v3

This is a trained model of an A2C (Advantage Actor-Critic) agent playing PandaReachDense-v3 using the stable-baselines3 library and Panda-Gym.

Environment Description

The PandaReachDense-v3 environment features a Franka Emika Panda robotic arm that must place its end-effector at a target position (green ball). This is a continuous control task with:

Observation space: Dictionary containing achieved_goal, desired_goal, and observation (position + velocity)
Action space: 3-dimensional continuous control (x, y, z displacement)
Reward: Dense reward based on distance to target

Training Results

Metric	Value
Mean Reward	-0.35
Std Reward	± 0.12
Evaluation Episodes	10

Hyperparameters

Downloads last month: 3

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on PandaReachDense-v3
self-reported

-0.35 +/- 0.12