Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
datasets:
|
| 3 |
- lerobot/aloha_sim_insertion_human
|
|
@@ -14,115 +23,166 @@ tags:
|
|
| 14 |
- benchmark
|
| 15 |
---
|
| 16 |
|
| 17 |
-
# Diffusion Policy for Aloha Insertion (200k Steps)
|
| 18 |
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
|
| 24 |
-
This model represents a benchmark experiment for **Diffusion Policy** on the challenging **Aloha Insertion** task (Simulated). It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework to evaluate the algorithm's performance on complex, high-dimensional 3D manipulation tasks compared to baseline methods
|
| 25 |
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
**Evaluation Metrics (50 Episodes):**
|
| 31 |
|
| 32 |
-
|
| 33 |
-
| :--- | :--- | :--- |
|
| 34 |
-
| **Success Rate** | **0.0%** | Task difficulty is extremely high; model struggled to complete final insertion. |
|
| 35 |
-
| **Avg Max Reward** | **0.10** | Indicates partial success in grasping/moving, but failed alignment. |
|
| 36 |
-
| **Avg Sum Reward** | **8.20** | Shows the model learned valid trajectories but lacked precision at the goal state. |
|
| 37 |
|
| 38 |
-
**
|
| 39 |
-
Unlike the 2D Push-T task where Diffusion Policy excelled, the high-dimensional visual input (3 cameras) and precise 3D spatial reasoning required for Aloha Insertion proved challenging. While the ACT baseline achieved a 2% success rate (1/50), Diffusion Policy (at 200k steps) demonstrated trajectory learning but failed to finalize the insertion, suggesting a need for longer training or larger batch sizes for this specific domain.
|
| 40 |
|
| 41 |
-
|
| 42 |
|
| 43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
-
|
| 46 |
-
- **Input**: 3 Camera Views (Top, Left, Right)
|
| 47 |
-
- **Prediction Horizon**: 16 steps
|
| 48 |
-
- **Observation History**: 2 steps
|
| 49 |
-
- **Action Steps**: 8 steps
|
| 50 |
-
- **Image Resolution**: 480x640 (Cropped to 420x560)
|
| 51 |
-
- **Total Parameters**: ~263 Million
|
| 52 |
|
| 53 |
---
|
| 54 |
|
| 55 |
-
##
|
| 56 |
|
| 57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
-
|
| 60 |
|
| 61 |
-
|
| 62 |
-
# Install LeRobot
|
| 63 |
-
pip install lerobot
|
| 64 |
-
````
|
| 65 |
|
| 66 |
-
|
| 67 |
|
| 68 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
--policy.type diffusion \
|
| 73 |
-
--policy.pretrained_path Lemon-03/DP_Aloha_Insertion_test \
|
| 74 |
-
--eval.n_episodes 50 \
|
| 75 |
-
--eval.batch_size 8 \
|
| 76 |
-
--env.type aloha \
|
| 77 |
-
--env.task AlohaInsertion-v0
|
| 78 |
-
```
|
| 79 |
-
|
| 80 |
-
-----
|
| 81 |
|
| 82 |
-
|
|
|
|
| 83 |
|
| 84 |
-
|
|
|
|
|
|
|
| 85 |
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
|
|
|
| 91 |
|
| 92 |
-
|
|
|
|
|
|
|
| 93 |
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
--env.type aloha \
|
| 99 |
-
--env.task AlohaInsertion-v0 \
|
| 100 |
-
--dataset.repo_id lerobot/aloha_sim_insertion_human \
|
| 101 |
-
--wandb.enable true \
|
| 102 |
-
--job_name DP_Aloha_Insertion \
|
| 103 |
-
--policy.repo_id Lemon-03/DP_Aloha_Insertion_test
|
| 104 |
-
```
|
| 105 |
|
| 106 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
|
| 108 |
-
|
| 109 |
-
# Training
|
| 110 |
-
steps: 200000
|
| 111 |
-
eval_freq: 20000
|
| 112 |
-
save_freq: 20000
|
| 113 |
-
batch_size: 8
|
| 114 |
-
|
| 115 |
-
# Policy
|
| 116 |
policy:
|
| 117 |
type: diffusion
|
|
|
|
|
|
|
| 118 |
vision_backbone: resnet18
|
|
|
|
| 119 |
crop_shape: [420, 560]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
n_action_steps: 8
|
| 121 |
n_obs_steps: 2
|
| 122 |
horizon: 16
|
|
|
|
|
|
|
|
|
|
|
|
|
| 123 |
num_inference_steps: 100
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 124 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
```
|
| 127 |
|
| 128 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
没问题!这是一个非常严谨的科研做法。引用参考来源(Acknowledgement)并公开完整的配置文件,能极大地增加你论文和项目的**可信度**与**可复现性**。
|
| 2 |
+
|
| 3 |
+
我把你提供的 `diffusion_aloha.yaml` 完整代码放入了一个**可折叠的详情块**中(这样不会占用太长篇幅,保持页面整洁),并在显眼位置添加了**致谢链接**。
|
| 4 |
+
|
| 5 |
+
请直接**全选复制**下面的内容,覆盖 `Lemon-03/DP_Aloha_Insertion_test` 的 `README.md`。
|
| 6 |
+
|
| 7 |
+
### 📋 最终完美版 (含完整 Config 代码与致谢)
|
| 8 |
+
|
| 9 |
+
````markdown
|
| 10 |
---
|
| 11 |
datasets:
|
| 12 |
- lerobot/aloha_sim_insertion_human
|
|
|
|
| 23 |
- benchmark
|
| 24 |
---
|
| 25 |
|
| 26 |
+
# 🦾 Diffusion Policy for Aloha Insertion (200k Steps)
|
| 27 |
|
| 28 |
+
[](https://github.com/huggingface/lerobot)
|
| 29 |
+
[](https://huggingface.co/datasets/lerobot/aloha_sim_insertion_human)
|
| 30 |
+
[](https://www.uestc.edu.cn/)
|
| 31 |
+
[](https://www.apache.org/licenses/LICENSE-2.0)
|
| 32 |
|
| 33 |
+
> **Summary:** This model represents a benchmark experiment for **Diffusion Policy** on the challenging **Aloha Insertion** task (Simulated). It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework to evaluate the algorithm's performance on complex, high-dimensional 3D manipulation tasks compared to baseline methods.
|
| 34 |
|
| 35 |
+
- **🧩 Task**: Aloha Insertion (Simulated, 3D)
|
| 36 |
+
- **🧠 Algorithm**: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM)
|
| 37 |
+
- **🔄 Training Steps**: 200,000
|
| 38 |
+
- **🎓 Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China)
|
| 39 |
|
| 40 |
+
---
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
## 🔬 Benchmark Results (vs ACT)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
+
This experiment highlights the significant difficulty of the Aloha Insertion task for generative policies under limited compute constraints (Batch Size=8). While the ACT baseline achieved a **2%** success rate (1/50), the Diffusion Policy focused on trajectory learning but struggled with the final insertion alignment.
|
|
|
|
| 45 |
|
| 46 |
+
### 📊 Evaluation Metrics (50 Episodes)
|
| 47 |
|
| 48 |
+
| Metric | Value | Comparison to ACT Baseline | Status |
|
| 49 |
+
| :--- | :---: | :--- | :---: |
|
| 50 |
+
| **Success Rate** | **0.0%** | **Slightly Lower** (ACT: 2.0%) | 📉 |
|
| 51 |
+
| **Avg Max Reward** | **0.10** | **Partial Success** (Grasping achieved) | 🚧 |
|
| 52 |
+
| **Avg Sum Reward** | **8.20** | **Stable Trajectories** | ✅ |
|
| 53 |
|
| 54 |
+
> **Note:** The Aloha Insertion task involves high-dimensional inputs (3 cameras) and precise 3D spatial reasoning. The results indicate that under low batch-size constraints (Batch Size=8), ACT's deterministic policy may converge faster than Diffusion Policy, which likely requires longer training or larger batches for this specific domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
---
|
| 57 |
|
| 58 |
+
## ⚙️ Model Details
|
| 59 |
|
| 60 |
+
| Parameter | Description |
|
| 61 |
+
| :--- | :--- |
|
| 62 |
+
| **Architecture** | ResNet18 (Vision Backbone) + U-Net (Diffusion Head) |
|
| 63 |
+
| **Input** | 3 Camera Views (Top, Left, Right) |
|
| 64 |
+
| **Prediction Horizon** | 16 steps |
|
| 65 |
+
| **Observation History** | 2 steps |
|
| 66 |
+
| **Action Steps** | 8 steps |
|
| 67 |
|
| 68 |
+
---
|
| 69 |
|
| 70 |
+
## 🔧 Training Configuration
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
+
For reproducibility, here are the key parameters used during the training session.
|
| 73 |
|
| 74 |
+
- **Source**: Configuration adapted from [CSCSX/LeRobotTutorial-CN](https://github.com/CSCSX/LeRobotTutorial-CN).
|
| 75 |
+
- **Batch Size**: 8 (Limited by 8GB VRAM)
|
| 76 |
+
- **Optimizer**: AdamW (`lr=1e-4`)
|
| 77 |
+
- **Scheduler**: Cosine with warmup
|
| 78 |
+
- **Vision**: ResNet18 with GroupNorm (Cropped to 420x560)
|
| 79 |
|
| 80 |
+
<details>
|
| 81 |
+
<summary>📄 <strong>Click to view full <code>diffusion_aloha.yaml</code> used for training</strong></summary>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
+
```yaml
|
| 84 |
+
# @package _global_
|
| 85 |
|
| 86 |
+
# 随机种子
|
| 87 |
+
seed: 100000
|
| 88 |
+
job_name: Diffusion-Aloha-Insertion
|
| 89 |
|
| 90 |
+
# 训练参数
|
| 91 |
+
steps: 200000 # 原文件写的是 20万步 (Aloha 比较难练)
|
| 92 |
+
eval_freq: 20000 # 稍微改频一点,方便看进度
|
| 93 |
+
save_freq: 20000
|
| 94 |
+
log_freq: 200
|
| 95 |
+
batch_size: 8 # ⚠️ 关键:Aloha 必须用小 Batch,否则 8G 显存不够
|
| 96 |
|
| 97 |
+
# 数据集
|
| 98 |
+
dataset:
|
| 99 |
+
repo_id: lerobot/aloha_sim_insertion_human
|
| 100 |
|
| 101 |
+
# 评估设置
|
| 102 |
+
eval:
|
| 103 |
+
n_episodes: 50
|
| 104 |
+
batch_size: 8 # 保持与训练一致
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
|
| 106 |
+
# 环境设置
|
| 107 |
+
env:
|
| 108 |
+
type: aloha
|
| 109 |
+
task: AlohaInsertion-v0
|
| 110 |
+
fps: 50
|
| 111 |
|
| 112 |
+
# 策略配置
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
policy:
|
| 114 |
type: diffusion
|
| 115 |
+
|
| 116 |
+
# --- 视觉处理 ---
|
| 117 |
vision_backbone: resnet18
|
| 118 |
+
# Aloha 的图片是矩形的,这里使用特定的裁剪尺寸
|
| 119 |
crop_shape: [420, 560]
|
| 120 |
+
crop_is_random: true
|
| 121 |
+
pretrained_backbone_weights: null # 原配置指定不加载预训练权重
|
| 122 |
+
use_group_norm: true
|
| 123 |
+
spatial_softmax_num_keypoints: 32
|
| 124 |
+
|
| 125 |
+
# --- Diffusion 核心架构 (U-Net) ---
|
| 126 |
+
down_dims: [512, 1024, 2048]
|
| 127 |
+
kernel_size: 5
|
| 128 |
+
n_groups: 8
|
| 129 |
+
diffusion_step_embed_dim: 128
|
| 130 |
+
use_film_scale_modulation: true
|
| 131 |
+
|
| 132 |
+
# --- 动作预测参数 ---
|
| 133 |
n_action_steps: 8
|
| 134 |
n_obs_steps: 2
|
| 135 |
horizon: 16
|
| 136 |
+
|
| 137 |
+
# --- 噪声调度器 (DDPM) ---
|
| 138 |
+
noise_scheduler_type: DDPM
|
| 139 |
+
num_train_timesteps: 100
|
| 140 |
num_inference_steps: 100
|
| 141 |
+
beta_schedule: squaredcos_cap_v2
|
| 142 |
+
beta_start: 0.0001
|
| 143 |
+
beta_end: 0.02
|
| 144 |
+
prediction_type: epsilon
|
| 145 |
+
clip_sample: true
|
| 146 |
+
clip_sample_range: 1.0
|
| 147 |
+
|
| 148 |
+
# --- 优化器 ---
|
| 149 |
+
optimizer_lr: 1e-4
|
| 150 |
+
optimizer_weight_decay: 1e-6
|
| 151 |
+
#grad_clip_norm: 10
|
| 152 |
+
|
| 153 |
+
scheduler_name: cosine
|
| 154 |
+
scheduler_warmup_steps: 500
|
| 155 |
+
|
| 156 |
+
use_amp: true
|
| 157 |
```
|
| 158 |
+
````
|
| 159 |
+
|
| 160 |
+
\</details\>
|
| 161 |
+
|
| 162 |
+
-----
|
| 163 |
|
| 164 |
+
## 🚀 Evaluate (My Evaluation Mode)
|
| 165 |
+
|
| 166 |
+
Run the following command in your terminal to evaluate the model for 50 episodes and save the visualization videos:
|
| 167 |
+
|
| 168 |
+
```bash
|
| 169 |
+
python -m lerobot.scripts.lerobot_eval \
|
| 170 |
+
--policy.type diffusion \
|
| 171 |
+
--policy.pretrained_path Lemon-03/DP_Aloha_Insertion_test \
|
| 172 |
+
--eval.n_episodes 50 \
|
| 173 |
+
--eval.batch_size 8 \
|
| 174 |
+
--env.type aloha \
|
| 175 |
+
--env.task AlohaInsertion-v0
|
| 176 |
```
|
| 177 |
|
| 178 |
+
To evaluate this model locally, run the following command:
|
| 179 |
+
|
| 180 |
+
```bash
|
| 181 |
+
python -m lerobot.scripts.lerobot_eval \
|
| 182 |
+
--policy.type diffusion \
|
| 183 |
+
--policy.pretrained_path Lemon-03/DP_Aloha_Insertion_test \
|
| 184 |
+
--eval.n_episodes 50 \
|
| 185 |
+
--eval.batch_size 8 \
|
| 186 |
+
--env.type aloha \
|
| 187 |
+
--env.task AlohaInsertion-v0
|
| 188 |
+
```
|