StevanLS
/

ddpg-panda-reach-100

@@ -5,33 +5,111 @@ tags:
 - deep-reinforcement-learning
 - reinforcement-learning
 - stable-baselines3
 model-index:
-- name: DDPG
   results:
   - task:
       type: reinforcement-learning
-      name: reinforcement-learning
     dataset:
       name: PandaReachJointsDense-v3
-      type: PandaReachJointsDense-v3
     metrics:
-    - type: mean_reward
-      value: -21.06 +/- 6.60
-      name: mean_reward
-      verified: false
 ---
-# **DDPG** Agent playing **PandaReachJointsDense-v3**
-This is a trained model of a **DDPG** agent playing **PandaReachJointsDense-v3**
-using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
-## Usage (with Stable-baselines3)
-TODO: Add your code
 ```python
-from stable_baselines3 import ...
-from huggingface_sb3 import load_from_hub
-...
 ```

 - deep-reinforcement-learning
 - reinforcement-learning
 - stable-baselines3
+- DDPG
+- robot-manipulation
 model-index:
+- name: DDPG Panda Reach 100k
   results:
   - task:
       type: reinforcement-learning
+      name: Robot Arm Reaching
     dataset:
       name: PandaReachJointsDense-v3
+      type: panda-gym
     metrics:
+      - type: mean_reward
+        value: REPLACE_WITH_ACTUAL_MEAN  # Replace with your evaluation mean_reward
+        name: mean_reward
+      - type: std_reward
+        value: REPLACE_WITH_ACTUAL_STD   # Replace with your evaluation std_reward
+        name: std_reward
 ---
+# DDPG Panda Reach Model
+This is a DDPG (Deep Deterministic Policy Gradient) model trained to control a Franka Emika Panda robot arm in a reaching task using dense rewards. The model was trained using Stable-Baselines3 with Hindsight Experience Replay (HER).
+## Task Description
+In this task, a 7-DOF Panda robotic arm must reach a randomly positioned target in 3D space. The environment provides dense rewards based on the distance between the end-effector and the target position. The task is considered successful when the end-effector reaches within a small threshold distance of the target.
+## Training Details
+- **Environment**: PandaReachJointsDense-v3 from panda-gym
+- **Algorithm**: DDPG with HER
+- **Policy**: MultiInputPolicy
+- **Training Steps**: 100,000
+- **Framework**: Stable-Baselines3
+- **Training Monitoring**: Weights & Biases
+### Hyperparameters
+```python
+{
+    "policy": "MultiInputPolicy",
+    "replay_buffer_class": "HerReplayBuffer",
+    "tensorboard_log": True,
+    "verbose": 1,
+    "total_timesteps": 100000
+}
+```
+## Usage
 ```python
+import gymnasium as gym
+import panda_gym
+from stable_baselines3 import DDPG
+# Create environment
+env = gym.make("PandaReachJointsDense-v3", render_mode="human")
+# Load the trained model
+model = DDPG.load("StevanLS/ddpg-panda-reach-100")
+# Run the model
+obs, _ = env.reset()
+while True:
+    action, _ = model.predict(obs, deterministic=True)
+    obs, reward, done, truncated, info = env.step(action)
+    if done or truncated:
+        obs, _ = env.reset()
+```
+## Limitations
+- The model is trained specifically for the reaching task and may not generalize to other manipulation tasks
+- Performance may vary depending on the random target positions
+- The model uses dense rewards, which might not be available in real-world scenarios
+## Author
+- StevanLS
+## Citations
+```bibtex
+@article{raffin2021stable,
+    title={Stable-baselines3: Reliable reinforcement learning implementations},
+    author={Raffin, Antonin and Hill, Ashley and Gleave, Adam and Kanervisto, Anssi and Ernestus, Maximilian and Dormann, Noah},
+    journal={Journal of Machine Learning Research},
+    year={2021}
+}
+@article{gallouedec2021pandagym,
+    title={panda-gym: Open-Source Goal-Conditioned Environments for Robotic Learning},
+    author={Gallou{\'e}dec, Quentin and Cazin, Nicolas and Dellandr{\'e}a, Emmanuel and Chen, Liming},
+    journal={arXiv preprint arXiv:2106.13687},
+    year={2021}
+}
+@article{gymatorium2023,
+    author={Farama Foundation},
+    title={Gymnasium},
+    year={2023},
+    journal={GitHub repository},
+    publisher={GitHub},
+    url={https://github.com/Farama-Foundation/Gymnasium}
+}
 ```