File size: 6,300 Bytes
35766b1 2e0bb49 35766b1 2e0bb49 35766b1 4a06465 35766b1 4a06465 35766b1 12e6801 4a06465 35766b1 4a06465 35766b1 4a06465 2e0bb49 4a06465 2e0bb49 4a06465 2e0bb49 4a06465 2e0bb49 4a06465 2e0bb49 4a06465 35766b1 4a06465 35766b1 4a06465 35766b1 4a06465 35766b1 4a06465 35766b1 4a06465 35766b1 4a06465 35766b1 f27cf19 4a06465 35766b1 4a06465 35766b1 12e6801 4a06465 2e0bb49 12e6801 4a06465 12e6801 2e0bb49 12e6801 4a06465 2e0bb49 12e6801 4a06465 12e6801 2e0bb49 12e6801 4a06465 2e0bb49 12e6801 2e0bb49 4a06465 12e6801 2e0bb49 12e6801 2e0bb49 4a06465 12e6801 4a06465 12e6801 4a06465 12e6801 2e0bb49 4a06465 12e6801 4a06465 a54ea26 4a06465 12e6801 4a06465 a54ea26 7c0e919 4a06465 2e0bb49 4a06465 12e6801 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
---
datasets:
- lerobot/aloha_sim_insertion_human
library_name: lerobot
license: apache-2.0
model_name: diffusion
pipeline_tag: robotics
tags:
- lerobot
- robotics
- diffusion
- aloha
- imitation-learning
- benchmark
---
# ๐ฆพ Diffusion Policy for Aloha Insertion (200k Steps)
[](https://github.com/huggingface/lerobot)
[](https://huggingface.co/datasets/lerobot/aloha_sim_insertion_human)
[](https://www.uestc.edu.cn/)
[](https://www.apache.org/licenses/LICENSE-2.0)
## ๐ฏ Research Purpose
**Important Note:** This model was trained primarily for **academic comparison**โevaluating the performance difference between **Diffusion Policy** and **ACT** algorithms under identical training conditions (using the `lerobot/aloha_sim_insertion_human` dataset). This is a benchmark experiment designed to analyze different algorithms' learning capabilities for complex 3D manipulation tasks under limited computational resources (Batch Size=8), **not to train a highly successful practical model**.
> **Summary:** This model represents a benchmark experiment for **Diffusion Policy** on the challenging **Aloha Insertion** task (Simulated). It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework to evaluate the algorithm's performance on complex, high-dimensional 3D manipulation tasks compared to baseline methods.
- **๐งฉ Task**: Aloha Insertion (Simulated, 3D)
- **๐ง Algorithm**: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM)
- **๐ Training Steps**: 200,000
- **๐ Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China)
---
## ๐ฌ Benchmark Results (vs ACT)
This experiment highlights the significant difficulty of the Aloha Insertion task for generative policies under limited compute constraints (Batch Size=8). While the ACT baseline achieved a **2%** success rate (1/50), the Diffusion Policy focused on trajectory learning but struggled with the final insertion alignment.
### ๐ Evaluation Metrics (50 Episodes)
| Metric | Value | Comparison to ACT Baseline | Status |
| :--- | :---: | :--- | :---: |
| **Success Rate** | **0.0%** | **Slightly Lower** (ACT: 2.0%) | ๐ |
| **Avg Max Reward** | **0.10** | **Partial Success** (Grasping achieved) | ๐ง |
| **Avg Sum Reward** | **8.20** | **Stable Trajectories** | โ
|
> **Note:** The Aloha Insertion task involves high-dimensional inputs (3 cameras) and precise 3D spatial reasoning. The results indicate that under low batch-size constraints (Batch Size=8), ACT's deterministic policy may converge faster than Diffusion Policy, which likely requires longer training or larger batches for this specific domain.
---
## โ๏ธ Model Details
| Parameter | Description |
| :--- | :--- |
| **Architecture** | ResNet18 (Vision Backbone) + U-Net (Diffusion Head) |
| **Input** | 3 Camera Views (Top, Left, Right) |
| **Prediction Horizon** | 16 steps |
| **Observation History** | 2 steps |
| **Action Steps** | 8 steps |
---
## ๐ง Training Configuration
For reproducibility, here are the key parameters used during the training session.
- **Source**: Configuration adapted from [CSCSX/LeRobotTutorial-CN](https://github.com/CSCSX/LeRobotTutorial-CN).
- **Batch Size**: 8 (Limited by 8GB VRAM)
- **Optimizer**: AdamW (`lr=1e-4`)
- **Scheduler**: Cosine with warmup
- **Vision**: ResNet18 with GroupNorm (Cropped to 420x560)
### Original Training Command (My Resume Mode)
```bash
python -m lerobot.scripts.lerobot_train \
--config_path diffusion_aloha.yaml \
--env.type aloha \
--env.task AlohaInsertion-v0 \
--dataset.repo_id lerobot/aloha_sim_insertion_human \
--wandb.enable true \
--job_name DP_Aloha_Insertion \
--policy.repo_id Lemon-03/DP_Aloha_Insertion_test \
```
### diffusion_aloha.yaml
<details>
<summary>๐ <strong>Click to view full <code>diffusion_aloha.yaml</code> used for training</strong></summary>
```yaml
# @package _global_
# Random seed
seed: 100000
job_name: Diffusion-Aloha-Insertion
# Training parameters
steps: 200000 # Original file states 200k steps (Aloha is difficult to train)
eval_freq: 20000 # Slightly increased frequency to monitor progress
save_freq: 20000
log_freq: 200
batch_size: 8 # โ ๏ธ Crucial: Aloha requires small batch size, otherwise 8GB VRAM is insufficient
# Dataset
dataset:
repo_id: lerobot/aloha_sim_insertion_human
# Evaluation settings
eval:
n_episodes: 50
batch_size: 8 # Keep consistent with training
# Environment settings
env:
type: aloha
task: AlohaInsertion-v0
fps: 50
# Policy configuration
policy:
type: diffusion
# --- Vision processing ---
vision_backbone: resnet18
# Aloha images are rectangular, using specific crop dimensions here
crop_shape: [420, 560]
crop_is_random: true
pretrained_backbone_weights: null # Original config specifies not to load pretrained weights
use_group_norm: true
spatial_softmax_num_keypoints: 32
# --- Diffusion core architecture (U-Net) ---
down_dims: [512, 1024, 2048]
kernel_size: 5
n_groups: 8
diffusion_step_embed_dim: 128
use_film_scale_modulation: true
# --- Action prediction parameters ---
n_action_steps: 8
n_obs_steps: 2
horizon: 16
# --- Noise scheduler (DDPM) ---
noise_scheduler_type: DDPM
num_train_timesteps: 100
num_inference_timesteps: 100
beta_schedule: squaredcos_cap_v2
beta_start: 0.0001
beta_end: 0.02
prediction_type: epsilon
clip_sample: true
clip_sample_range: 1.0
# --- Optimizer ---
optimizer_lr: 1e-4
optimizer_weight_decay: 1e-6
#grad_clip_norm: 10
scheduler_name: cosine
scheduler_warmup_steps: 500
use_amp: true
```
</details>
-----
## ๐ Evaluate (My Evaluation Mode)
To evaluate this model locally, run the following command:
```bash
python -m lerobot.scripts.lerobot_eval \
--policy.type diffusion \
--policy.pretrained_path Lemon-03/DP_Aloha_Insertion_test \
--eval.n_episodes 50 \
--eval.batch_size 8 \
--env.type aloha \
--env.task AlohaInsertion-v0
``` |