Lemon-03 commited on
Commit
4a06465
·
verified ·
1 Parent(s): a62b23a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +135 -75
README.md CHANGED
@@ -1,3 +1,12 @@
 
 
 
 
 
 
 
 
 
1
  ---
2
  datasets:
3
  - lerobot/aloha_sim_insertion_human
@@ -14,115 +23,166 @@ tags:
14
  - benchmark
15
  ---
16
 
17
- # Diffusion Policy for Aloha Insertion (200k Steps)
18
 
19
- **Task**: Aloha Insertion (Simulated, 3D Manipulation)
20
- **Algorithm**: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM)
21
- **Training Steps**: 200,000
22
- **Author**: Graduate Student, UESTC (University of Electronic Science and Technology of China)
23
 
24
- This model represents a benchmark experiment for **Diffusion Policy** on the challenging **Aloha Insertion** task (Simulated). It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework to evaluate the algorithm's performance on complex, high-dimensional 3D manipulation tasks compared to baseline methods (ACT).
25
 
26
- ### 🔬 Benchmark Results
 
 
 
27
 
28
- This experiment highlights the significant difficulty of the Aloha Insertion task for generative policies under limited compute constraints (Batch Size=8).
29
-
30
- **Evaluation Metrics (50 Episodes):**
31
 
32
- | Metric | Value | Note |
33
- | :--- | :--- | :--- |
34
- | **Success Rate** | **0.0%** | Task difficulty is extremely high; model struggled to complete final insertion. |
35
- | **Avg Max Reward** | **0.10** | Indicates partial success in grasping/moving, but failed alignment. |
36
- | **Avg Sum Reward** | **8.20** | Shows the model learned valid trajectories but lacked precision at the goal state. |
37
 
38
- **Analysis:**
39
- Unlike the 2D Push-T task where Diffusion Policy excelled, the high-dimensional visual input (3 cameras) and precise 3D spatial reasoning required for Aloha Insertion proved challenging. While the ACT baseline achieved a 2% success rate (1/50), Diffusion Policy (at 200k steps) demonstrated trajectory learning but failed to finalize the insertion, suggesting a need for longer training or larger batch sizes for this specific domain.
40
 
41
- ---
42
 
43
- ## Model Details
 
 
 
 
44
 
45
- - **Architecture**: ResNet18 (Vision Backbone) + U-Net (Diffusion Head)
46
- - **Input**: 3 Camera Views (Top, Left, Right)
47
- - **Prediction Horizon**: 16 steps
48
- - **Observation History**: 2 steps
49
- - **Action Steps**: 8 steps
50
- - **Image Resolution**: 480x640 (Cropped to 420x560)
51
- - **Total Parameters**: ~263 Million
52
 
53
  ---
54
 
55
- ## How to Use This Model
56
 
57
- You can evaluate this model or visualize its performance using `lerobot`.
 
 
 
 
 
 
58
 
59
- ### 1. Installation
60
 
61
- ```bash
62
- # Install LeRobot
63
- pip install lerobot
64
- ````
65
 
66
- ### 2\. Evaluate / Visualize
67
 
68
- Run the following command in your terminal to evaluate the model for 50 episodes and save the visualization videos:
 
 
 
 
69
 
70
- ```bash
71
- python -m lerobot.scripts.lerobot_eval \
72
- --policy.type diffusion \
73
- --policy.pretrained_path Lemon-03/DP_Aloha_Insertion_test \
74
- --eval.n_episodes 50 \
75
- --eval.batch_size 8 \
76
- --env.type aloha \
77
- --env.task AlohaInsertion-v0
78
- ```
79
-
80
- -----
81
 
82
- ## Training Configuration (Reference)
 
83
 
84
- For reproducibility, here are the key parameters used during the training session:
 
 
85
 
86
- - **Batch Size**: 8 (Limited by 8GB VRAM)
87
- - **Optimizer**: Adam (lr=1e-4, betas=[0.95, 0.999])
88
- - **Scheduler**: Cosine with warmup (500 steps)
89
- - **Vision**: ResNet18 with GroupNorm
90
- - **Precision**: Mixed Precision (AMP) enabled
 
91
 
92
- <!-- end list -->
 
 
93
 
94
- ```bash
95
- # Original Training Command
96
- python -m lerobot.scripts.lerobot_train \
97
- --config_path diffusion_aloha.yaml \
98
- --env.type aloha \
99
- --env.task AlohaInsertion-v0 \
100
- --dataset.repo_id lerobot/aloha_sim_insertion_human \
101
- --wandb.enable true \
102
- --job_name DP_Aloha_Insertion \
103
- --policy.repo_id Lemon-03/DP_Aloha_Insertion_test
104
- ```
105
 
106
- ### Config File (`diffusion_aloha.yaml`) Snippet:
 
 
 
 
107
 
108
- ```yaml
109
- # Training
110
- steps: 200000
111
- eval_freq: 20000
112
- save_freq: 20000
113
- batch_size: 8
114
-
115
- # Policy
116
  policy:
117
  type: diffusion
 
 
118
  vision_backbone: resnet18
 
119
  crop_shape: [420, 560]
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  n_action_steps: 8
121
  n_obs_steps: 2
122
  horizon: 16
 
 
 
 
123
  num_inference_steps: 100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
124
  ```
 
 
 
 
 
125
 
 
 
 
 
 
 
 
 
 
 
 
 
126
  ```
127
 
128
- ---
 
 
 
 
 
 
 
 
 
 
 
1
+ 没问题!这是一个非常严谨的科研做法。引用参考来源(Acknowledgement)并公开完整的配置文件,能极大地增加你论文和项目的**可信度**与**可复现性**。
2
+
3
+ 我把你提供的 `diffusion_aloha.yaml` 完整代码放入了一个**可折叠的详情块**中(这样不会占用太长篇幅,保持页面整洁),并在显眼位置添加了**致谢链接**。
4
+
5
+ 请直接**全选复制**下面的内容,覆盖 `Lemon-03/DP_Aloha_Insertion_test` 的 `README.md`。
6
+
7
+ ### 📋 最终完美版 (含完整 Config 代码与致谢)
8
+
9
+ ````markdown
10
  ---
11
  datasets:
12
  - lerobot/aloha_sim_insertion_human
 
23
  - benchmark
24
  ---
25
 
26
+ # 🦾 Diffusion Policy for Aloha Insertion (200k Steps)
27
 
28
+ [![LeRobot](https://img.shields.io/badge/Library-LeRobot-yellow)](https://github.com/huggingface/lerobot)
29
+ [![Task](https://img.shields.io/badge/Task-Aloha_Insertion-blue)](https://huggingface.co/datasets/lerobot/aloha_sim_insertion_human)
30
+ [![UESTC](https://img.shields.io/badge/Author-UESTC_Graduate-red)](https://www.uestc.edu.cn/)
31
+ [![License](https://img.shields.io/badge/License-Apache_2.0-green)](https://www.apache.org/licenses/LICENSE-2.0)
32
 
33
+ > **Summary:** This model represents a benchmark experiment for **Diffusion Policy** on the challenging **Aloha Insertion** task (Simulated). It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework to evaluate the algorithm's performance on complex, high-dimensional 3D manipulation tasks compared to baseline methods.
34
 
35
+ - **🧩 Task**: Aloha Insertion (Simulated, 3D)
36
+ - **🧠 Algorithm**: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM)
37
+ - **🔄 Training Steps**: 200,000
38
+ - **🎓 Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China)
39
 
40
+ ---
 
 
41
 
42
+ ## 🔬 Benchmark Results (vs ACT)
 
 
 
 
43
 
44
+ This experiment highlights the significant difficulty of the Aloha Insertion task for generative policies under limited compute constraints (Batch Size=8). While the ACT baseline achieved a **2%** success rate (1/50), the Diffusion Policy focused on trajectory learning but struggled with the final insertion alignment.
 
45
 
46
+ ### 📊 Evaluation Metrics (50 Episodes)
47
 
48
+ | Metric | Value | Comparison to ACT Baseline | Status |
49
+ | :--- | :---: | :--- | :---: |
50
+ | **Success Rate** | **0.0%** | **Slightly Lower** (ACT: 2.0%) | 📉 |
51
+ | **Avg Max Reward** | **0.10** | **Partial Success** (Grasping achieved) | 🚧 |
52
+ | **Avg Sum Reward** | **8.20** | **Stable Trajectories** | ✅ |
53
 
54
+ > **Note:** The Aloha Insertion task involves high-dimensional inputs (3 cameras) and precise 3D spatial reasoning. The results indicate that under low batch-size constraints (Batch Size=8), ACT's deterministic policy may converge faster than Diffusion Policy, which likely requires longer training or larger batches for this specific domain.
 
 
 
 
 
 
55
 
56
  ---
57
 
58
+ ## ⚙️ Model Details
59
 
60
+ | Parameter | Description |
61
+ | :--- | :--- |
62
+ | **Architecture** | ResNet18 (Vision Backbone) + U-Net (Diffusion Head) |
63
+ | **Input** | 3 Camera Views (Top, Left, Right) |
64
+ | **Prediction Horizon** | 16 steps |
65
+ | **Observation History** | 2 steps |
66
+ | **Action Steps** | 8 steps |
67
 
68
+ ---
69
 
70
+ ## 🔧 Training Configuration
 
 
 
71
 
72
+ For reproducibility, here are the key parameters used during the training session.
73
 
74
+ - **Source**: Configuration adapted from [CSCSX/LeRobotTutorial-CN](https://github.com/CSCSX/LeRobotTutorial-CN).
75
+ - **Batch Size**: 8 (Limited by 8GB VRAM)
76
+ - **Optimizer**: AdamW (`lr=1e-4`)
77
+ - **Scheduler**: Cosine with warmup
78
+ - **Vision**: ResNet18 with GroupNorm (Cropped to 420x560)
79
 
80
+ <details>
81
+ <summary>📄 <strong>Click to view full <code>diffusion_aloha.yaml</code> used for training</strong></summary>
 
 
 
 
 
 
 
 
 
82
 
83
+ ```yaml
84
+ # @package _global_
85
 
86
+ # 随机种子
87
+ seed: 100000
88
+ job_name: Diffusion-Aloha-Insertion
89
 
90
+ # 训练参数
91
+ steps: 200000 # 原文件写的是 20万步 (Aloha 比较难练)
92
+ eval_freq: 20000 # 稍微改频一点,方便看进度
93
+ save_freq: 20000
94
+ log_freq: 200
95
+ batch_size: 8 # ⚠️ 关键:Aloha 必须用小 Batch,否则 8G 显存不够
96
 
97
+ # 数据集
98
+ dataset:
99
+ repo_id: lerobot/aloha_sim_insertion_human
100
 
101
+ # 评估设置
102
+ eval:
103
+ n_episodes: 50
104
+ batch_size: 8 # 保持与训练一致
 
 
 
 
 
 
 
105
 
106
+ # 环境设置
107
+ env:
108
+ type: aloha
109
+ task: AlohaInsertion-v0
110
+ fps: 50
111
 
112
+ # 策略配置
 
 
 
 
 
 
 
113
  policy:
114
  type: diffusion
115
+
116
+ # --- 视觉处理 ---
117
  vision_backbone: resnet18
118
+ # Aloha 的图片是矩形的,这里使用特定的裁剪尺寸
119
  crop_shape: [420, 560]
120
+ crop_is_random: true
121
+ pretrained_backbone_weights: null # 原配置指定不加载预训练权重
122
+ use_group_norm: true
123
+ spatial_softmax_num_keypoints: 32
124
+
125
+ # --- Diffusion 核心架构 (U-Net) ---
126
+ down_dims: [512, 1024, 2048]
127
+ kernel_size: 5
128
+ n_groups: 8
129
+ diffusion_step_embed_dim: 128
130
+ use_film_scale_modulation: true
131
+
132
+ # --- 动作预测参数 ---
133
  n_action_steps: 8
134
  n_obs_steps: 2
135
  horizon: 16
136
+
137
+ # --- 噪声调度器 (DDPM) ---
138
+ noise_scheduler_type: DDPM
139
+ num_train_timesteps: 100
140
  num_inference_steps: 100
141
+ beta_schedule: squaredcos_cap_v2
142
+ beta_start: 0.0001
143
+ beta_end: 0.02
144
+ prediction_type: epsilon
145
+ clip_sample: true
146
+ clip_sample_range: 1.0
147
+
148
+ # --- 优化器 ---
149
+ optimizer_lr: 1e-4
150
+ optimizer_weight_decay: 1e-6
151
+ #grad_clip_norm: 10
152
+
153
+ scheduler_name: cosine
154
+ scheduler_warmup_steps: 500
155
+
156
+ use_amp: true
157
  ```
158
+ ````
159
+
160
+ \</details\>
161
+
162
+ -----
163
 
164
+ ## 🚀 Evaluate (My Evaluation Mode)
165
+
166
+ Run the following command in your terminal to evaluate the model for 50 episodes and save the visualization videos:
167
+
168
+ ```bash
169
+ python -m lerobot.scripts.lerobot_eval \
170
+ --policy.type diffusion \
171
+ --policy.pretrained_path Lemon-03/DP_Aloha_Insertion_test \
172
+ --eval.n_episodes 50 \
173
+ --eval.batch_size 8 \
174
+ --env.type aloha \
175
+ --env.task AlohaInsertion-v0
176
  ```
177
 
178
+ To evaluate this model locally, run the following command:
179
+
180
+ ```bash
181
+ python -m lerobot.scripts.lerobot_eval \
182
+ --policy.type diffusion \
183
+ --policy.pretrained_path Lemon-03/DP_Aloha_Insertion_test \
184
+ --eval.n_episodes 50 \
185
+ --eval.batch_size 8 \
186
+ --env.type aloha \
187
+ --env.task AlohaInsertion-v0
188
+ ```