Update README.md
Browse files
README.md
CHANGED
|
@@ -1,178 +1,233 @@
|
|
| 1 |
---
|
|
|
|
| 2 |
datasets:
|
| 3 |
-
|
|
|
|
|
|
|
| 4 |
library_name: lerobot
|
|
|
|
| 5 |
license: apache-2.0
|
|
|
|
| 6 |
model_name: diffusion
|
|
|
|
| 7 |
pipeline_tag: robotics
|
|
|
|
| 8 |
tags:
|
| 9 |
-
|
| 10 |
-
-
|
| 11 |
-
|
| 12 |
-
-
|
| 13 |
-
|
| 14 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
---
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
[](https://github.com/huggingface/lerobot)
|
| 20 |
-
|
|
|
|
|
|
|
| 21 |
[](https://www.uestc.edu.cn/)
|
|
|
|
| 22 |
[](https://www.apache.org/licenses/LICENSE-2.0)
|
| 23 |
|
| 24 |
-
> **Summary:** This model represents a benchmark experiment for **Diffusion Policy** on the challenging **Aloha Insertion** task (Simulated). It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework to evaluate the algorithm's performance on complex, high-dimensional 3D manipulation tasks compared to baseline methods.
|
| 25 |
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
- **🧠 Algorithm**: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM)
|
| 28 |
-
|
|
|
|
|
|
|
| 29 |
- **🎓 Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China)
|
| 30 |
|
|
|
|
|
|
|
| 31 |
---
|
| 32 |
|
|
|
|
|
|
|
| 33 |
## 🔬 Benchmark Results (vs ACT)
|
| 34 |
|
| 35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
### 📊 Evaluation Metrics (50 Episodes)
|
| 38 |
|
|
|
|
|
|
|
| 39 |
| Metric | Value | Comparison to ACT Baseline | Status |
|
|
|
|
| 40 |
| :--- | :---: | :--- | :---: |
|
| 41 |
-
| **Success Rate** | **0.0%** | **Slightly Lower** (ACT: 2.0%) | 📉 |
|
| 42 |
-
| **Avg Max Reward** | **0.10** | **Partial Success** (Grasping achieved) | 🚧 |
|
| 43 |
-
| **Avg Sum Reward** | **8.20** | **Stable Trajectories** | ✅ |
|
| 44 |
|
| 45 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
---
|
| 48 |
|
|
|
|
|
|
|
| 49 |
## ⚙️ Model Details
|
| 50 |
|
|
|
|
|
|
|
| 51 |
| Parameter | Description |
|
|
|
|
| 52 |
| :--- | :--- |
|
|
|
|
| 53 |
| **Architecture** | ResNet18 (Vision Backbone) + U-Net (Diffusion Head) |
|
| 54 |
-
|
| 55 |
| **Prediction Horizon** | 16 steps |
|
|
|
|
| 56 |
| **Observation History** | 2 steps |
|
|
|
|
| 57 |
| **Action Steps** | 8 steps |
|
| 58 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
---
|
| 60 |
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
For reproducibility, here are the key parameters used during the training session.
|
| 64 |
-
|
| 65 |
-
- **Source**: Configuration adapted from [CSCSX/LeRobotTutorial-CN](https://github.com/CSCSX/LeRobotTutorial-CN).
|
| 66 |
-
- **Batch Size**: 8 (Limited by 8GB VRAM)
|
| 67 |
-
- **Optimizer**: AdamW (`lr=1e-4`)
|
| 68 |
-
- **Scheduler**: Cosine with warmup
|
| 69 |
-
- **Vision**: ResNet18 with GroupNorm (Cropped to 420x560)
|
| 70 |
-
|
| 71 |
-
<details>
|
| 72 |
-
<summary>📄 <strong>Click to view full <code>diffusion_aloha.yaml</code> used for training</strong></summary>
|
| 73 |
-
|
| 74 |
-
```yaml
|
| 75 |
-
# @package _global_
|
| 76 |
-
|
| 77 |
-
# 随机种子
|
| 78 |
-
seed: 100000
|
| 79 |
-
job_name: Diffusion-Aloha-Insertion
|
| 80 |
-
|
| 81 |
-
# 训练参数
|
| 82 |
-
steps: 200000 # 原文件写的是 20万步 (Aloha 比较难练)
|
| 83 |
-
eval_freq: 20000 # 稍微改频一点,方便看进度
|
| 84 |
-
save_freq: 20000
|
| 85 |
-
log_freq: 200
|
| 86 |
-
batch_size: 8 # ⚠️ 关键:Aloha 必须用小 Batch,否则 8G 显存不够
|
| 87 |
-
|
| 88 |
-
# 数据集
|
| 89 |
-
dataset:
|
| 90 |
-
repo_id: lerobot/aloha_sim_insertion_human
|
| 91 |
-
|
| 92 |
-
# 评估设置
|
| 93 |
-
eval:
|
| 94 |
-
n_episodes: 50
|
| 95 |
-
batch_size: 8 # 保持与训练一致
|
| 96 |
-
|
| 97 |
-
# 环境设置
|
| 98 |
-
env:
|
| 99 |
-
type: aloha
|
| 100 |
-
task: AlohaInsertion-v0
|
| 101 |
-
fps: 50
|
| 102 |
-
|
| 103 |
-
# 策略配置
|
| 104 |
-
policy:
|
| 105 |
-
type: diffusion
|
| 106 |
-
|
| 107 |
-
# --- 视觉处理 ---
|
| 108 |
-
vision_backbone: resnet18
|
| 109 |
-
# Aloha 的图片是矩形的,这里使用特定的裁剪尺寸
|
| 110 |
-
crop_shape: [420, 560]
|
| 111 |
-
crop_is_random: true
|
| 112 |
-
pretrained_backbone_weights: null # 原配置指定不加载预训练权重
|
| 113 |
-
use_group_norm: true
|
| 114 |
-
spatial_softmax_num_keypoints: 32
|
| 115 |
-
|
| 116 |
-
# --- Diffusion 核心架构 (U-Net) ---
|
| 117 |
-
down_dims: [512, 1024, 2048]
|
| 118 |
-
kernel_size: 5
|
| 119 |
-
n_groups: 8
|
| 120 |
-
diffusion_step_embed_dim: 128
|
| 121 |
-
use_film_scale_modulation: true
|
| 122 |
-
|
| 123 |
-
# --- 动作预测参数 ---
|
| 124 |
-
n_action_steps: 8
|
| 125 |
-
n_obs_steps: 2
|
| 126 |
-
horizon: 16
|
| 127 |
-
|
| 128 |
-
# --- 噪声调度器 (DDPM) ---
|
| 129 |
-
noise_scheduler_type: DDPM
|
| 130 |
-
num_train_timesteps: 100
|
| 131 |
-
num_inference_steps: 100
|
| 132 |
-
beta_schedule: squaredcos_cap_v2
|
| 133 |
-
beta_start: 0.0001
|
| 134 |
-
beta_end: 0.02
|
| 135 |
-
prediction_type: epsilon
|
| 136 |
-
clip_sample: true
|
| 137 |
-
clip_sample_range: 1.0
|
| 138 |
-
|
| 139 |
-
# --- 优化器 ---
|
| 140 |
-
optimizer_lr: 1e-4
|
| 141 |
-
optimizer_weight_decay: 1e-6
|
| 142 |
-
#grad_clip_norm: 10
|
| 143 |
-
|
| 144 |
-
scheduler_name: cosine
|
| 145 |
-
scheduler_warmup_steps: 500
|
| 146 |
-
|
| 147 |
-
use_amp: true
|
| 148 |
-
````
|
| 149 |
-
|
| 150 |
-
\</details\>
|
| 151 |
-
|
| 152 |
-
-----
|
| 153 |
|
| 154 |
## 🚀 Evaluate (My Evaluation Mode)
|
| 155 |
|
|
|
|
|
|
|
| 156 |
Run the following command in your terminal to evaluate the model for 50 episodes and save the visualization videos:
|
| 157 |
|
|
|
|
|
|
|
| 158 |
```bash
|
|
|
|
| 159 |
python -m lerobot.scripts.lerobot_eval \
|
|
|
|
| 160 |
--policy.type diffusion \
|
| 161 |
-
|
|
|
|
|
|
|
| 162 |
--eval.n_episodes 50 \
|
| 163 |
-
|
| 164 |
-
--
|
| 165 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
```
|
| 167 |
|
|
|
|
|
|
|
| 168 |
To evaluate this model locally, run the following command:
|
| 169 |
|
|
|
|
|
|
|
| 170 |
```bash
|
|
|
|
| 171 |
python -m lerobot.scripts.lerobot_eval \
|
|
|
|
| 172 |
--policy.type diffusion \
|
| 173 |
-
|
|
|
|
|
|
|
| 174 |
--eval.n_episodes 50 \
|
| 175 |
-
|
| 176 |
-
--
|
| 177 |
-
|
| 178 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
|
| 3 |
datasets:
|
| 4 |
+
|
| 5 |
+
- lerobot/pusht
|
| 6 |
+
|
| 7 |
library_name: lerobot
|
| 8 |
+
|
| 9 |
license: apache-2.0
|
| 10 |
+
|
| 11 |
model_name: diffusion
|
| 12 |
+
|
| 13 |
pipeline_tag: robotics
|
| 14 |
+
|
| 15 |
tags:
|
| 16 |
+
|
| 17 |
+
- lerobot
|
| 18 |
+
|
| 19 |
+
- robotics
|
| 20 |
+
|
| 21 |
+
- diffusion
|
| 22 |
+
|
| 23 |
+
- pusht
|
| 24 |
+
|
| 25 |
+
- imitation-learning
|
| 26 |
+
|
| 27 |
+
- benchmark
|
| 28 |
+
|
| 29 |
---
|
| 30 |
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
# 🦾 Diffusion Policy for Push-T (200k Steps)
|
| 34 |
+
|
| 35 |
+
|
| 36 |
|
| 37 |
[](https://github.com/huggingface/lerobot)
|
| 38 |
+
|
| 39 |
+
[](https://huggingface.co/datasets/lerobot/pusht)
|
| 40 |
+
|
| 41 |
[](https://www.uestc.edu.cn/)
|
| 42 |
+
|
| 43 |
[](https://www.apache.org/licenses/LICENSE-2.0)
|
| 44 |
|
|
|
|
| 45 |
|
| 46 |
+
|
| 47 |
+
> **Summary:** This model demonstrates the capabilities of **Diffusion Policy** on the precision-demanding **Push-T** task. It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework as part of a thesis research project benchmarking Imitation Learning algorithms.
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
- **🧩 Task**: Push-T (Simulated)
|
| 52 |
+
|
| 53 |
- **🧠 Algorithm**: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM)
|
| 54 |
+
|
| 55 |
+
- **🔄 Training Steps**: 200,000 (Fine-tuned via Resume)
|
| 56 |
+
|
| 57 |
- **🎓 Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China)
|
| 58 |
|
| 59 |
+
|
| 60 |
+
|
| 61 |
---
|
| 62 |
|
| 63 |
+
|
| 64 |
+
|
| 65 |
## 🔬 Benchmark Results (vs ACT)
|
| 66 |
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
Compared to the ACT baseline (which achieved **0%** success rate in our controlled experiments), this Diffusion Policy model demonstrates significantly better control precision and trajectory stability.
|
| 70 |
+
|
| 71 |
+
|
| 72 |
|
| 73 |
### 📊 Evaluation Metrics (50 Episodes)
|
| 74 |
|
| 75 |
+
|
| 76 |
+
|
| 77 |
| Metric | Value | Comparison to ACT Baseline | Status |
|
| 78 |
+
|
| 79 |
| :--- | :---: | :--- | :---: |
|
|
|
|
|
|
|
|
|
|
| 80 |
|
| 81 |
+
| **Success Rate** | **14.0%** | **Significant Improvement** (ACT: 0%) | 🏆 |
|
| 82 |
+
|
| 83 |
+
| **Avg Max Reward** | **0.81** | **+58% Higher Precision** (ACT: ~0.51) | 📈 |
|
| 84 |
+
|
| 85 |
+
| **Avg Sum Reward** | **130.46** | **+147% More Stable** (ACT: ~52.7) | ✅ |
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
> **Note:** The Push-T environment requires **>95% target coverage** for success. An average max reward of `0.81` indicates the policy consistently moves the block very close to the target position, proving strong manipulation capabilities despite the strict success threshold.
|
| 90 |
+
|
| 91 |
+
|
| 92 |
|
| 93 |
---
|
| 94 |
|
| 95 |
+
|
| 96 |
+
|
| 97 |
## ⚙️ Model Details
|
| 98 |
|
| 99 |
+
|
| 100 |
+
|
| 101 |
| Parameter | Description |
|
| 102 |
+
|
| 103 |
| :--- | :--- |
|
| 104 |
+
|
| 105 |
| **Architecture** | ResNet18 (Vision Backbone) + U-Net (Diffusion Head) |
|
| 106 |
+
|
| 107 |
| **Prediction Horizon** | 16 steps |
|
| 108 |
+
|
| 109 |
| **Observation History** | 2 steps |
|
| 110 |
+
|
| 111 |
| **Action Steps** | 8 steps |
|
| 112 |
|
| 113 |
+
|
| 114 |
+
|
| 115 |
+
- **Training Strategy**:
|
| 116 |
+
|
| 117 |
+
- Phase 1: Initial training (100,000 steps) -> Model: `Lemon-03/DP_PushT_test`
|
| 118 |
+
|
| 119 |
+
- Phase 2: Resume/Fine-tuning (+100,000 steps) -> Model: `Lemon-03/DP_PushT_test_Resume`
|
| 120 |
+
|
| 121 |
+
- **Total**: 200,000 steps
|
| 122 |
+
|
| 123 |
+
|
| 124 |
+
|
| 125 |
+
---
|
| 126 |
+
|
| 127 |
+
|
| 128 |
+
|
| 129 |
+
## 🔧 Training Configuration (Reference)
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
For reproducibility, here are the key parameters used during the training session:
|
| 134 |
+
|
| 135 |
+
|
| 136 |
+
|
| 137 |
+
- **Batch Size**: 64
|
| 138 |
+
|
| 139 |
+
- **Optimizer**: AdamW (`lr=1e-4`)
|
| 140 |
+
|
| 141 |
+
- **Scheduler**: Cosine with warmup
|
| 142 |
+
|
| 143 |
+
- **Vision**: ResNet18 with random crop (84x84)
|
| 144 |
+
|
| 145 |
+
- **Precision**: Mixed Precision (AMP) enabled
|
| 146 |
+
|
| 147 |
+
|
| 148 |
+
|
| 149 |
+
#### Original Training Command (My Resume Mode)
|
| 150 |
+
|
| 151 |
+
|
| 152 |
+
|
| 153 |
+
```bash
|
| 154 |
+
|
| 155 |
+
python -m lerobot.scripts.lerobot_train \
|
| 156 |
+
|
| 157 |
+
--policy.type diffusion \
|
| 158 |
+
|
| 159 |
+
--env.type pusht \
|
| 160 |
+
|
| 161 |
+
--dataset.repo_id lerobot/pusht \
|
| 162 |
+
|
| 163 |
+
--wandb.enable true \
|
| 164 |
+
|
| 165 |
+
--eval.batch_size 8 \
|
| 166 |
+
|
| 167 |
+
--job_name DP_PushT_Resume \
|
| 168 |
+
|
| 169 |
+
--policy.repo_id Lemon-03/DP_PushT_test_Resume \
|
| 170 |
+
|
| 171 |
+
--policy.pretrained_path outputs/train/2025-12-02/14-33-35_DP_PushT/checkpoints/last/pretrained_model \
|
| 172 |
+
|
| 173 |
+
--steps 100000
|
| 174 |
+
|
| 175 |
+
```
|
| 176 |
+
|
| 177 |
---
|
| 178 |
|
| 179 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 180 |
|
| 181 |
## 🚀 Evaluate (My Evaluation Mode)
|
| 182 |
|
| 183 |
+
|
| 184 |
+
|
| 185 |
Run the following command in your terminal to evaluate the model for 50 episodes and save the visualization videos:
|
| 186 |
|
| 187 |
+
|
| 188 |
+
|
| 189 |
```bash
|
| 190 |
+
|
| 191 |
python -m lerobot.scripts.lerobot_eval \
|
| 192 |
+
|
| 193 |
--policy.type diffusion \
|
| 194 |
+
|
| 195 |
+
--policy.pretrained_path outputs/train/2025-12-04/14-47-37_DP_PushT_Resume/checkpoints/last/pretrained_model \
|
| 196 |
+
|
| 197 |
--eval.n_episodes 50 \
|
| 198 |
+
|
| 199 |
+
--eval.batch_size 10 \
|
| 200 |
+
|
| 201 |
+
--env.type pusht \
|
| 202 |
+
|
| 203 |
+
--env.task PushT-v0
|
| 204 |
+
|
| 205 |
```
|
| 206 |
|
| 207 |
+
|
| 208 |
+
|
| 209 |
To evaluate this model locally, run the following command:
|
| 210 |
|
| 211 |
+
|
| 212 |
+
|
| 213 |
```bash
|
| 214 |
+
|
| 215 |
python -m lerobot.scripts.lerobot_eval \
|
| 216 |
+
|
| 217 |
--policy.type diffusion \
|
| 218 |
+
|
| 219 |
+
--policy.pretrained_path Lemon-03/DP_PushT_test_Resume \
|
| 220 |
+
|
| 221 |
--eval.n_episodes 50 \
|
| 222 |
+
|
| 223 |
+
--eval.batch_size 10 \
|
| 224 |
+
|
| 225 |
+
--env.type pusht \
|
| 226 |
+
|
| 227 |
+
--env.task PushT-v0
|
| 228 |
+
|
| 229 |
+
```
|
| 230 |
+
|
| 231 |
+
|
| 232 |
+
|
| 233 |
+
-----
|