File size: 6,300 Bytes
35766b1
2e0bb49
 
35766b1
 
 
 
 
 
 
 
2e0bb49
 
 
35766b1
 
4a06465
35766b1
4a06465
 
 
 
35766b1
12e6801
 
 
 
4a06465
35766b1
4a06465
 
 
 
35766b1
4a06465
2e0bb49
4a06465
2e0bb49
4a06465
2e0bb49
4a06465
2e0bb49
4a06465
 
 
 
 
2e0bb49
4a06465
35766b1
 
 
4a06465
35766b1
4a06465
 
 
 
 
 
 
35766b1
4a06465
35766b1
4a06465
35766b1
4a06465
35766b1
4a06465
 
 
 
 
35766b1
f27cf19
 
 
 
 
 
 
 
 
 
 
 
 
 
4a06465
 
35766b1
4a06465
 
35766b1
12e6801
4a06465
 
2e0bb49
12e6801
 
 
4a06465
 
12e6801
2e0bb49
12e6801
4a06465
 
2e0bb49
12e6801
4a06465
 
12e6801
2e0bb49
12e6801
4a06465
 
 
 
2e0bb49
12e6801
2e0bb49
 
4a06465
12e6801
2e0bb49
12e6801
2e0bb49
4a06465
12e6801
4a06465
 
 
12e6801
4a06465
 
 
 
 
 
12e6801
2e0bb49
 
 
4a06465
12e6801
4a06465
 
a54ea26
4a06465
 
 
 
 
 
 
12e6801
4a06465
 
 
 
 
 
 
 
a54ea26
7c0e919
4a06465
 
2e0bb49
4a06465
 
 
 
 
 
 
 
 
 
 
 
12e6801
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
---
datasets:
- lerobot/aloha_sim_insertion_human
library_name: lerobot
license: apache-2.0
model_name: diffusion
pipeline_tag: robotics
tags:
- lerobot
- robotics
- diffusion
- aloha
- imitation-learning
- benchmark
---

# ๐Ÿฆพ Diffusion Policy for Aloha Insertion (200k Steps)

[![LeRobot](https://img.shields.io/badge/Library-LeRobot-yellow)](https://github.com/huggingface/lerobot)
[![Task](https://img.shields.io/badge/Task-Aloha_Insertion-blue)](https://huggingface.co/datasets/lerobot/aloha_sim_insertion_human)
[![UESTC](https://img.shields.io/badge/Author-UESTC_Graduate-red)](https://www.uestc.edu.cn/)
[![License](https://img.shields.io/badge/License-Apache_2.0-green)](https://www.apache.org/licenses/LICENSE-2.0)

## ๐ŸŽฏ Research Purpose

**Important Note:** This model was trained primarily for **academic comparison**โ€”evaluating the performance difference between **Diffusion Policy** and **ACT** algorithms under identical training conditions (using the `lerobot/aloha_sim_insertion_human` dataset). This is a benchmark experiment designed to analyze different algorithms' learning capabilities for complex 3D manipulation tasks under limited computational resources (Batch Size=8), **not to train a highly successful practical model**.

> **Summary:** This model represents a benchmark experiment for **Diffusion Policy** on the challenging **Aloha Insertion** task (Simulated). It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework to evaluate the algorithm's performance on complex, high-dimensional 3D manipulation tasks compared to baseline methods.

- **๐Ÿงฉ Task**: Aloha Insertion (Simulated, 3D)
- **๐Ÿง  Algorithm**: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM)
- **๐Ÿ”„ Training Steps**: 200,000
- **๐ŸŽ“ Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China)

---

## ๐Ÿ”ฌ Benchmark Results (vs ACT)

This experiment highlights the significant difficulty of the Aloha Insertion task for generative policies under limited compute constraints (Batch Size=8). While the ACT baseline achieved a **2%** success rate (1/50), the Diffusion Policy focused on trajectory learning but struggled with the final insertion alignment.

### ๐Ÿ“Š Evaluation Metrics (50 Episodes)

| Metric | Value | Comparison to ACT Baseline | Status |
| :--- | :---: | :--- | :---: |
| **Success Rate** | **0.0%** | **Slightly Lower** (ACT: 2.0%) | ๐Ÿ“‰ |
| **Avg Max Reward** | **0.10** | **Partial Success** (Grasping achieved) | ๐Ÿšง |
| **Avg Sum Reward** | **8.20** | **Stable Trajectories** | โœ… |

> **Note:** The Aloha Insertion task involves high-dimensional inputs (3 cameras) and precise 3D spatial reasoning. The results indicate that under low batch-size constraints (Batch Size=8), ACT's deterministic policy may converge faster than Diffusion Policy, which likely requires longer training or larger batches for this specific domain.

---

## โš™๏ธ Model Details

| Parameter | Description |
| :--- | :--- |
| **Architecture** | ResNet18 (Vision Backbone) + U-Net (Diffusion Head) |
| **Input** | 3 Camera Views (Top, Left, Right) |
| **Prediction Horizon** | 16 steps |
| **Observation History** | 2 steps |
| **Action Steps** | 8 steps |

---

## ๐Ÿ”ง Training Configuration

For reproducibility, here are the key parameters used during the training session.

- **Source**: Configuration adapted from [CSCSX/LeRobotTutorial-CN](https://github.com/CSCSX/LeRobotTutorial-CN).
- **Batch Size**: 8 (Limited by 8GB VRAM)
- **Optimizer**: AdamW (`lr=1e-4`)
- **Scheduler**: Cosine with warmup
- **Vision**: ResNet18 with GroupNorm (Cropped to 420x560)

### Original Training Command (My Resume Mode)

```bash
python -m lerobot.scripts.lerobot_train \
  --config_path diffusion_aloha.yaml \
  --env.type aloha \
  --env.task AlohaInsertion-v0 \
  --dataset.repo_id lerobot/aloha_sim_insertion_human \
  --wandb.enable true \
  --job_name DP_Aloha_Insertion \
  --policy.repo_id Lemon-03/DP_Aloha_Insertion_test \
```

### diffusion_aloha.yaml
<details>
<summary>๐Ÿ“„ <strong>Click to view full <code>diffusion_aloha.yaml</code> used for training</strong></summary>

```yaml
# @package _global_

# Random seed
seed: 100000
job_name: Diffusion-Aloha-Insertion

# Training parameters
steps: 200000            # Original file states 200k steps (Aloha is difficult to train)
eval_freq: 20000         # Slightly increased frequency to monitor progress
save_freq: 20000
log_freq: 200
batch_size: 8            # โš ๏ธ Crucial: Aloha requires small batch size, otherwise 8GB VRAM is insufficient

# Dataset
dataset:
  repo_id: lerobot/aloha_sim_insertion_human

# Evaluation settings
eval:
  n_episodes: 50
  batch_size: 8          # Keep consistent with training

# Environment settings
env:
  type: aloha
  task: AlohaInsertion-v0
  fps: 50

# Policy configuration
policy:
  type: diffusion

  # --- Vision processing ---
  vision_backbone: resnet18
  # Aloha images are rectangular, using specific crop dimensions here
  crop_shape: [420, 560]
  crop_is_random: true
  pretrained_backbone_weights: null  # Original config specifies not to load pretrained weights
  use_group_norm: true
  spatial_softmax_num_keypoints: 32

  # --- Diffusion core architecture (U-Net) ---
  down_dims: [512, 1024, 2048]
  kernel_size: 5
  n_groups: 8
  diffusion_step_embed_dim: 128
  use_film_scale_modulation: true

  # --- Action prediction parameters ---
  n_action_steps: 8
  n_obs_steps: 2
  horizon: 16

  # --- Noise scheduler (DDPM) ---
  noise_scheduler_type: DDPM
  num_train_timesteps: 100
  num_inference_timesteps: 100
  beta_schedule: squaredcos_cap_v2
  beta_start: 0.0001
  beta_end: 0.02
  prediction_type: epsilon
  clip_sample: true
  clip_sample_range: 1.0

  # --- Optimizer ---
  optimizer_lr: 1e-4
  optimizer_weight_decay: 1e-6
  #grad_clip_norm: 10
  
  scheduler_name: cosine
  scheduler_warmup_steps: 500

  use_amp: true
```
</details>

-----

## ๐Ÿš€ Evaluate (My Evaluation Mode)

To evaluate this model locally, run the following command:

```bash
python -m lerobot.scripts.lerobot_eval \
  --policy.type diffusion \
  --policy.pretrained_path Lemon-03/DP_Aloha_Insertion_test \
  --eval.n_episodes 50 \
  --eval.batch_size 8 \
  --env.type aloha \
  --env.task AlohaInsertion-v0
```