StevanLS commited on
Commit
9d2aba6
·
verified ·
1 Parent(s): 8be8774

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -15
README.md CHANGED
@@ -5,33 +5,111 @@ tags:
5
  - deep-reinforcement-learning
6
  - reinforcement-learning
7
  - stable-baselines3
 
 
8
  model-index:
9
- - name: DDPG
10
  results:
11
  - task:
12
  type: reinforcement-learning
13
- name: reinforcement-learning
14
  dataset:
15
  name: PandaReachJointsDense-v3
16
- type: PandaReachJointsDense-v3
17
  metrics:
18
- - type: mean_reward
19
- value: -21.06 +/- 6.60
20
- name: mean_reward
21
- verified: false
 
 
22
  ---
23
 
24
- # **DDPG** Agent playing **PandaReachJointsDense-v3**
25
- This is a trained model of a **DDPG** agent playing **PandaReachJointsDense-v3**
26
- using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
27
 
28
- ## Usage (with Stable-baselines3)
29
- TODO: Add your code
30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  ```python
33
- from stable_baselines3 import ...
34
- from huggingface_sb3 import load_from_hub
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- ...
 
 
 
 
 
 
 
37
  ```
 
5
  - deep-reinforcement-learning
6
  - reinforcement-learning
7
  - stable-baselines3
8
+ - DDPG
9
+ - robot-manipulation
10
  model-index:
11
+ - name: DDPG Panda Reach 100k
12
  results:
13
  - task:
14
  type: reinforcement-learning
15
+ name: Robot Arm Reaching
16
  dataset:
17
  name: PandaReachJointsDense-v3
18
+ type: panda-gym
19
  metrics:
20
+ - type: mean_reward
21
+ value: REPLACE_WITH_ACTUAL_MEAN # Replace with your evaluation mean_reward
22
+ name: mean_reward
23
+ - type: std_reward
24
+ value: REPLACE_WITH_ACTUAL_STD # Replace with your evaluation std_reward
25
+ name: std_reward
26
  ---
27
 
28
+ # DDPG Panda Reach Model
 
 
29
 
30
+ This is a DDPG (Deep Deterministic Policy Gradient) model trained to control a Franka Emika Panda robot arm in a reaching task using dense rewards. The model was trained using Stable-Baselines3 with Hindsight Experience Replay (HER).
 
31
 
32
+ ## Task Description
33
+
34
+ In this task, a 7-DOF Panda robotic arm must reach a randomly positioned target in 3D space. The environment provides dense rewards based on the distance between the end-effector and the target position. The task is considered successful when the end-effector reaches within a small threshold distance of the target.
35
+
36
+ ## Training Details
37
+
38
+ - **Environment**: PandaReachJointsDense-v3 from panda-gym
39
+ - **Algorithm**: DDPG with HER
40
+ - **Policy**: MultiInputPolicy
41
+ - **Training Steps**: 100,000
42
+ - **Framework**: Stable-Baselines3
43
+ - **Training Monitoring**: Weights & Biases
44
+
45
+ ### Hyperparameters
46
+
47
+ ```python
48
+ {
49
+ "policy": "MultiInputPolicy",
50
+ "replay_buffer_class": "HerReplayBuffer",
51
+ "tensorboard_log": True,
52
+ "verbose": 1,
53
+ "total_timesteps": 100000
54
+ }
55
+ ```
56
+
57
+ ## Usage
58
 
59
  ```python
60
+ import gymnasium as gym
61
+ import panda_gym
62
+ from stable_baselines3 import DDPG
63
+
64
+ # Create environment
65
+ env = gym.make("PandaReachJointsDense-v3", render_mode="human")
66
+
67
+ # Load the trained model
68
+ model = DDPG.load("StevanLS/ddpg-panda-reach-100")
69
+
70
+ # Run the model
71
+ obs, _ = env.reset()
72
+ while True:
73
+ action, _ = model.predict(obs, deterministic=True)
74
+ obs, reward, done, truncated, info = env.step(action)
75
+ if done or truncated:
76
+ obs, _ = env.reset()
77
+ ```
78
+
79
+
80
+ ## Limitations
81
+
82
+ - The model is trained specifically for the reaching task and may not generalize to other manipulation tasks
83
+ - Performance may vary depending on the random target positions
84
+ - The model uses dense rewards, which might not be available in real-world scenarios
85
+
86
+ ## Author
87
+
88
+ - StevanLS
89
+
90
+ ## Citations
91
+
92
+ ```bibtex
93
+ @article{raffin2021stable,
94
+ title={Stable-baselines3: Reliable reinforcement learning implementations},
95
+ author={Raffin, Antonin and Hill, Ashley and Gleave, Adam and Kanervisto, Anssi and Ernestus, Maximilian and Dormann, Noah},
96
+ journal={Journal of Machine Learning Research},
97
+ year={2021}
98
+ }
99
+
100
+ @article{gallouedec2021pandagym,
101
+ title={panda-gym: Open-Source Goal-Conditioned Environments for Robotic Learning},
102
+ author={Gallou{\'e}dec, Quentin and Cazin, Nicolas and Dellandr{\'e}a, Emmanuel and Chen, Liming},
103
+ journal={arXiv preprint arXiv:2106.13687},
104
+ year={2021}
105
+ }
106
 
107
+ @article{gymatorium2023,
108
+ author={Farama Foundation},
109
+ title={Gymnasium},
110
+ year={2023},
111
+ journal={GitHub repository},
112
+ publisher={GitHub},
113
+ url={https://github.com/Farama-Foundation/Gymnasium}
114
+ }
115
  ```