Lemon-03
/

DP_Aloha_Insertion_test

@@ -12,7 +12,6 @@ tags:
 - aloha
 - imitation-learning
 - benchmark
 ---
 # 🦾 Diffusion Policy for Aloha Insertion (200k Steps)
@@ -22,6 +21,10 @@ tags:
 [![UESTC](https://img.shields.io/badge/Author-UESTC_Graduate-red)](https://www.uestc.edu.cn/)
 [![License](https://img.shields.io/badge/License-Apache_2.0-green)](https://www.apache.org/licenses/LICENSE-2.0)
 > **Summary:** This model represents a benchmark experiment for **Diffusion Policy** on the challenging **Aloha Insertion** task (Simulated). It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework to evaluate the algorithm's performance on complex, high-dimensional 3D manipulation tasks compared to baseline methods.
 - **🧩 Task**: Aloha Insertion (Simulated, 3D)
@@ -89,58 +92,58 @@ python -m lerobot.scripts.lerobot_train \
 ```yaml
 # @package _global_
-# 随机种子
 seed: 100000
 job_name: Diffusion-Aloha-Insertion
-# 训练参数
-steps: 200000            # 原文件写的是 20万步 (Aloha 比较难练)
-eval_freq: 20000         # 稍微改频一点，方便看进度
 save_freq: 20000
 log_freq: 200
-batch_size: 8            # ⚠️ 关键：Aloha 必须用小 Batch，否则 8G 显存不够
-# 数据集
 dataset:
   repo_id: lerobot/aloha_sim_insertion_human
-# 评估设置
 eval:
   n_episodes: 50
-  batch_size: 8          # 保持与训练一致
-# 环境设置
 env:
   type: aloha
   task: AlohaInsertion-v0
   fps: 50
-# 策略配置
 policy:
   type: diffusion
-  # --- 视觉处理 ---
   vision_backbone: resnet18
-  # Aloha 的图片是矩形的，这里使用特定的裁剪尺寸
   crop_shape: [420, 560]
   crop_is_random: true
-  pretrained_backbone_weights: null  # 原配置指定不加载预训练权重
   use_group_norm: true
   spatial_softmax_num_keypoints: 32
-  # --- Diffusion 核心架构 (U-Net) ---
   down_dims: [512, 1024, 2048]
   kernel_size: 5
   n_groups: 8
   diffusion_step_embed_dim: 128
   use_film_scale_modulation: true
-  # --- 动作预测参数 ---
   n_action_steps: 8
   n_obs_steps: 2
   horizon: 16
-  # --- 噪声调度器 (DDPM) ---
   noise_scheduler_type: DDPM
   num_train_timesteps: 100
   num_inference_timesteps: 100
@@ -151,7 +154,7 @@ policy:
   clip_sample: true
   clip_sample_range: 1.0
-  # --- 优化器 ---
   optimizer_lr: 1e-4
   optimizer_weight_decay: 1e-6
   #grad_clip_norm: 10
@@ -189,4 +192,4 @@ python -m lerobot.scripts.lerobot_eval \
   --eval.batch_size 8 \
   --env.type aloha \
   --env.task AlohaInsertion-v0
-```

 - aloha
 - imitation-learning
 - benchmark
 ---
 # 🦾 Diffusion Policy for Aloha Insertion (200k Steps)
 [![UESTC](https://img.shields.io/badge/Author-UESTC_Graduate-red)](https://www.uestc.edu.cn/)
 [![License](https://img.shields.io/badge/License-Apache_2.0-green)](https://www.apache.org/licenses/LICENSE-2.0)
+## 🎯 Research Purpose
+**Important Note:** This model was trained primarily for **academic comparison**—evaluating the performance difference between **Diffusion Policy** and **ACT** algorithms under identical training conditions (using the `lerobot/aloha_sim_insertion_human` dataset). This is a benchmark experiment designed to analyze different algorithms' learning capabilities for complex 3D manipulation tasks under limited computational resources (Batch Size=8), **not to train a highly successful practical model**.
 > **Summary:** This model represents a benchmark experiment for **Diffusion Policy** on the challenging **Aloha Insertion** task (Simulated). It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework to evaluate the algorithm's performance on complex, high-dimensional 3D manipulation tasks compared to baseline methods.
 - **🧩 Task**: Aloha Insertion (Simulated, 3D)
 ```yaml
 # @package _global_
+# Random seed
 seed: 100000
 job_name: Diffusion-Aloha-Insertion
+# Training parameters
+steps: 200000            # Original file states 200k steps (Aloha is difficult to train)
+eval_freq: 20000         # Slightly increased frequency to monitor progress
 save_freq: 20000
 log_freq: 200
+batch_size: 8            # ⚠️ Crucial: Aloha requires small batch size, otherwise 8GB VRAM is insufficient
+# Dataset
 dataset:
   repo_id: lerobot/aloha_sim_insertion_human
+# Evaluation settings
 eval:
   n_episodes: 50
+  batch_size: 8          # Keep consistent with training
+# Environment settings
 env:
   type: aloha
   task: AlohaInsertion-v0
   fps: 50
+# Policy configuration
 policy:
   type: diffusion
+  # --- Vision processing ---
   vision_backbone: resnet18
+  # Aloha images are rectangular, using specific crop dimensions here
   crop_shape: [420, 560]
   crop_is_random: true
+  pretrained_backbone_weights: null  # Original config specifies not to load pretrained weights
   use_group_norm: true
   spatial_softmax_num_keypoints: 32
+  # --- Diffusion core architecture (U-Net) ---
   down_dims: [512, 1024, 2048]
   kernel_size: 5
   n_groups: 8
   diffusion_step_embed_dim: 128
   use_film_scale_modulation: true
+  # --- Action prediction parameters ---
   n_action_steps: 8
   n_obs_steps: 2
   horizon: 16
+  # --- Noise scheduler (DDPM) ---
   noise_scheduler_type: DDPM
   num_train_timesteps: 100
   num_inference_timesteps: 100
   clip_sample: true
   clip_sample_range: 1.0
+  # --- Optimizer ---
   optimizer_lr: 1e-4
   optimizer_weight_decay: 1e-6
   #grad_clip_norm: 10
   --eval.batch_size 8 \
   --env.type aloha \
   --env.task AlohaInsertion-v0
+```