DDPG Panda Reach Model

This is a DDPG (Deep Deterministic Policy Gradient) model trained to control a Panda robotic arm in a reaching task. The model was trained using Stable-Baselines3.

Task Description

The task involves controlling a 7-DOF Panda robotic arm to reach a target position in 3D space. The environment provides dense rewards based on the distance between the end-effector and the target position.

Training Details

  • Environment: PandaReachJointsDense-v3
  • Algorithm: DDPG with HER (Hindsight Experience Replay)
  • Training Steps: 10,000
  • Policy: MultiInputPolicy
  • Training Framework: Stable-Baselines3

Usage

import gymnasium as gym
import panda_gym
from stable_baselines3 import DDPG

# Create environment
env = gym.make("PandaReachJointsDense-v3", render_mode="human")

# Load the trained model
model = DDPG.load("StevanLS/ddpg-panda-reach-10")

# Test the model
obs, _ = env.reset()
while True:
    action, _ = model.predict(obs)
    obs, reward, done, truncated, info = env.step(action)
    if done or truncated:
        obs, _ = env.reset()

Author

  • StevanLS

Citations

@article{gymatorium2023,
    author={Farama Foundation},
    title={Gymnasium},
    year={2023},
    journal={GitHub repository},
    publisher={GitHub},
    url={https://github.com/Farama-Foundation/Gymnasium}
}

@article{raffin2021stable,
    title={Stable-baselines3: Reliable reinforcement learning implementations},
    author={Raffin, Antonin and Hill, Ashley and Gleave, Adam and Kanervisto, Anssi and Ernestus, Maximilian and Dormann, Noah},
    journal={Journal of Machine Learning Research},
    year={2021}
}

@article{gallouedec2021pandagym,
    title={panda-gym: Open-Source Goal-Conditioned Environments for Robotic Learning},
    author={Gallou{\'e}dec, Quentin and Cazin, Nicolas and Dellandr{\'e}a, Emmanuel and Chen, Liming},
    journal={arXiv preprint arXiv:2106.13687},
    year={2021}
}
Downloads last month
1
Video Preview
loading

Paper for StevanLS/ddpg-panda-reach-10

Evaluation results

  • mean_reward on PandaReachJointsDense-v3
    self-reported
    REPLACE_WITH_ACTUAL_MEAN_REWARD