A2C PandaReachJointsDense-v3

This repository contains a Stable Baselines3 implementation of the Advantage Actor-Critic (A2C) algorithm trained on the PandaReachJointsDense-v3 environment from panda-gym. The model was trained for 500,000 timesteps to learn how to reach points in 3D space by controlling the robot's articulations.