|
|
--- |
|
|
datasets: |
|
|
- lerobot/aloha_sim_insertion_human |
|
|
library_name: lerobot |
|
|
license: apache-2.0 |
|
|
model_name: diffusion |
|
|
pipeline_tag: robotics |
|
|
tags: |
|
|
- lerobot |
|
|
- robotics |
|
|
- diffusion |
|
|
- aloha |
|
|
- imitation-learning |
|
|
- benchmark |
|
|
--- |
|
|
|
|
|
# ๐ฆพ Diffusion Policy for Aloha Insertion (200k Steps) |
|
|
|
|
|
[](https://github.com/huggingface/lerobot) |
|
|
[](https://huggingface.co/datasets/lerobot/aloha_sim_insertion_human) |
|
|
[](https://www.uestc.edu.cn/) |
|
|
[](https://www.apache.org/licenses/LICENSE-2.0) |
|
|
|
|
|
## ๐ฏ Research Purpose |
|
|
|
|
|
**Important Note:** This model was trained primarily for **academic comparison**โevaluating the performance difference between **Diffusion Policy** and **ACT** algorithms under identical training conditions (using the `lerobot/aloha_sim_insertion_human` dataset). This is a benchmark experiment designed to analyze different algorithms' learning capabilities for complex 3D manipulation tasks under limited computational resources (Batch Size=8), **not to train a highly successful practical model**. |
|
|
|
|
|
> **Summary:** This model represents a benchmark experiment for **Diffusion Policy** on the challenging **Aloha Insertion** task (Simulated). It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework to evaluate the algorithm's performance on complex, high-dimensional 3D manipulation tasks compared to baseline methods. |
|
|
|
|
|
- **๐งฉ Task**: Aloha Insertion (Simulated, 3D) |
|
|
- **๐ง Algorithm**: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM) |
|
|
- **๐ Training Steps**: 200,000 |
|
|
- **๐ Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China) |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ฌ Benchmark Results (vs ACT) |
|
|
|
|
|
This experiment highlights the significant difficulty of the Aloha Insertion task for generative policies under limited compute constraints (Batch Size=8). While the ACT baseline achieved a **2%** success rate (1/50), the Diffusion Policy focused on trajectory learning but struggled with the final insertion alignment. |
|
|
|
|
|
### ๐ Evaluation Metrics (50 Episodes) |
|
|
|
|
|
| Metric | Value | Comparison to ACT Baseline | Status | |
|
|
| :--- | :---: | :--- | :---: | |
|
|
| **Success Rate** | **0.0%** | **Slightly Lower** (ACT: 2.0%) | ๐ | |
|
|
| **Avg Max Reward** | **0.10** | **Partial Success** (Grasping achieved) | ๐ง | |
|
|
| **Avg Sum Reward** | **8.20** | **Stable Trajectories** | โ
| |
|
|
|
|
|
> **Note:** The Aloha Insertion task involves high-dimensional inputs (3 cameras) and precise 3D spatial reasoning. The results indicate that under low batch-size constraints (Batch Size=8), ACT's deterministic policy may converge faster than Diffusion Policy, which likely requires longer training or larger batches for this specific domain. |
|
|
|
|
|
--- |
|
|
|
|
|
## โ๏ธ Model Details |
|
|
|
|
|
| Parameter | Description | |
|
|
| :--- | :--- | |
|
|
| **Architecture** | ResNet18 (Vision Backbone) + U-Net (Diffusion Head) | |
|
|
| **Input** | 3 Camera Views (Top, Left, Right) | |
|
|
| **Prediction Horizon** | 16 steps | |
|
|
| **Observation History** | 2 steps | |
|
|
| **Action Steps** | 8 steps | |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ง Training Configuration |
|
|
|
|
|
For reproducibility, here are the key parameters used during the training session. |
|
|
|
|
|
- **Source**: Configuration adapted from [CSCSX/LeRobotTutorial-CN](https://github.com/CSCSX/LeRobotTutorial-CN). |
|
|
- **Batch Size**: 8 (Limited by 8GB VRAM) |
|
|
- **Optimizer**: AdamW (`lr=1e-4`) |
|
|
- **Scheduler**: Cosine with warmup |
|
|
- **Vision**: ResNet18 with GroupNorm (Cropped to 420x560) |
|
|
|
|
|
### Original Training Command (My Resume Mode) |
|
|
|
|
|
```bash |
|
|
python -m lerobot.scripts.lerobot_train \ |
|
|
--config_path diffusion_aloha.yaml \ |
|
|
--env.type aloha \ |
|
|
--env.task AlohaInsertion-v0 \ |
|
|
--dataset.repo_id lerobot/aloha_sim_insertion_human \ |
|
|
--wandb.enable true \ |
|
|
--job_name DP_Aloha_Insertion \ |
|
|
--policy.repo_id Lemon-03/DP_Aloha_Insertion_test \ |
|
|
``` |
|
|
|
|
|
### diffusion_aloha.yaml |
|
|
<details> |
|
|
<summary>๐ <strong>Click to view full <code>diffusion_aloha.yaml</code> used for training</strong></summary> |
|
|
|
|
|
```yaml |
|
|
# @package _global_ |
|
|
|
|
|
# Random seed |
|
|
seed: 100000 |
|
|
job_name: Diffusion-Aloha-Insertion |
|
|
|
|
|
# Training parameters |
|
|
steps: 200000 # Original file states 200k steps (Aloha is difficult to train) |
|
|
eval_freq: 20000 # Slightly increased frequency to monitor progress |
|
|
save_freq: 20000 |
|
|
log_freq: 200 |
|
|
batch_size: 8 # โ ๏ธ Crucial: Aloha requires small batch size, otherwise 8GB VRAM is insufficient |
|
|
|
|
|
# Dataset |
|
|
dataset: |
|
|
repo_id: lerobot/aloha_sim_insertion_human |
|
|
|
|
|
# Evaluation settings |
|
|
eval: |
|
|
n_episodes: 50 |
|
|
batch_size: 8 # Keep consistent with training |
|
|
|
|
|
# Environment settings |
|
|
env: |
|
|
type: aloha |
|
|
task: AlohaInsertion-v0 |
|
|
fps: 50 |
|
|
|
|
|
# Policy configuration |
|
|
policy: |
|
|
type: diffusion |
|
|
|
|
|
# --- Vision processing --- |
|
|
vision_backbone: resnet18 |
|
|
# Aloha images are rectangular, using specific crop dimensions here |
|
|
crop_shape: [420, 560] |
|
|
crop_is_random: true |
|
|
pretrained_backbone_weights: null # Original config specifies not to load pretrained weights |
|
|
use_group_norm: true |
|
|
spatial_softmax_num_keypoints: 32 |
|
|
|
|
|
# --- Diffusion core architecture (U-Net) --- |
|
|
down_dims: [512, 1024, 2048] |
|
|
kernel_size: 5 |
|
|
n_groups: 8 |
|
|
diffusion_step_embed_dim: 128 |
|
|
use_film_scale_modulation: true |
|
|
|
|
|
# --- Action prediction parameters --- |
|
|
n_action_steps: 8 |
|
|
n_obs_steps: 2 |
|
|
horizon: 16 |
|
|
|
|
|
# --- Noise scheduler (DDPM) --- |
|
|
noise_scheduler_type: DDPM |
|
|
num_train_timesteps: 100 |
|
|
num_inference_timesteps: 100 |
|
|
beta_schedule: squaredcos_cap_v2 |
|
|
beta_start: 0.0001 |
|
|
beta_end: 0.02 |
|
|
prediction_type: epsilon |
|
|
clip_sample: true |
|
|
clip_sample_range: 1.0 |
|
|
|
|
|
# --- Optimizer --- |
|
|
optimizer_lr: 1e-4 |
|
|
optimizer_weight_decay: 1e-6 |
|
|
#grad_clip_norm: 10 |
|
|
|
|
|
scheduler_name: cosine |
|
|
scheduler_warmup_steps: 500 |
|
|
|
|
|
use_amp: true |
|
|
``` |
|
|
</details> |
|
|
|
|
|
----- |
|
|
|
|
|
## ๐ Evaluate (My Evaluation Mode) |
|
|
|
|
|
To evaluate this model locally, run the following command: |
|
|
|
|
|
```bash |
|
|
python -m lerobot.scripts.lerobot_eval \ |
|
|
--policy.type diffusion \ |
|
|
--policy.pretrained_path Lemon-03/DP_Aloha_Insertion_test \ |
|
|
--eval.n_episodes 50 \ |
|
|
--eval.batch_size 8 \ |
|
|
--env.type aloha \ |
|
|
--env.task AlohaInsertion-v0 |
|
|
``` |