--- datasets: - lerobot/aloha_sim_insertion_human library_name: lerobot license: apache-2.0 model_name: diffusion pipeline_tag: robotics tags: - lerobot - robotics - diffusion - aloha - imitation-learning - benchmark --- # 🦾 Diffusion Policy for Aloha Insertion (200k Steps) [![LeRobot](https://img.shields.io/badge/Library-LeRobot-yellow)](https://github.com/huggingface/lerobot) [![Task](https://img.shields.io/badge/Task-Aloha_Insertion-blue)](https://huggingface.co/datasets/lerobot/aloha_sim_insertion_human) [![UESTC](https://img.shields.io/badge/Author-UESTC_Graduate-red)](https://www.uestc.edu.cn/) [![License](https://img.shields.io/badge/License-Apache_2.0-green)](https://www.apache.org/licenses/LICENSE-2.0) ## 🎯 Research Purpose **Important Note:** This model was trained primarily for **academic comparison**β€”evaluating the performance difference between **Diffusion Policy** and **ACT** algorithms under identical training conditions (using the `lerobot/aloha_sim_insertion_human` dataset). This is a benchmark experiment designed to analyze different algorithms' learning capabilities for complex 3D manipulation tasks under limited computational resources (Batch Size=8), **not to train a highly successful practical model**. > **Summary:** This model represents a benchmark experiment for **Diffusion Policy** on the challenging **Aloha Insertion** task (Simulated). It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework to evaluate the algorithm's performance on complex, high-dimensional 3D manipulation tasks compared to baseline methods. - **🧩 Task**: Aloha Insertion (Simulated, 3D) - **🧠 Algorithm**: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM) - **πŸ”„ Training Steps**: 200,000 - **πŸŽ“ Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China) --- ## πŸ”¬ Benchmark Results (vs ACT) This experiment highlights the significant difficulty of the Aloha Insertion task for generative policies under limited compute constraints (Batch Size=8). While the ACT baseline achieved a **2%** success rate (1/50), the Diffusion Policy focused on trajectory learning but struggled with the final insertion alignment. ### πŸ“Š Evaluation Metrics (50 Episodes) | Metric | Value | Comparison to ACT Baseline | Status | | :--- | :---: | :--- | :---: | | **Success Rate** | **0.0%** | **Slightly Lower** (ACT: 2.0%) | πŸ“‰ | | **Avg Max Reward** | **0.10** | **Partial Success** (Grasping achieved) | 🚧 | | **Avg Sum Reward** | **8.20** | **Stable Trajectories** | βœ… | > **Note:** The Aloha Insertion task involves high-dimensional inputs (3 cameras) and precise 3D spatial reasoning. The results indicate that under low batch-size constraints (Batch Size=8), ACT's deterministic policy may converge faster than Diffusion Policy, which likely requires longer training or larger batches for this specific domain. --- ## βš™οΈ Model Details | Parameter | Description | | :--- | :--- | | **Architecture** | ResNet18 (Vision Backbone) + U-Net (Diffusion Head) | | **Input** | 3 Camera Views (Top, Left, Right) | | **Prediction Horizon** | 16 steps | | **Observation History** | 2 steps | | **Action Steps** | 8 steps | --- ## πŸ”§ Training Configuration For reproducibility, here are the key parameters used during the training session. - **Source**: Configuration adapted from [CSCSX/LeRobotTutorial-CN](https://github.com/CSCSX/LeRobotTutorial-CN). - **Batch Size**: 8 (Limited by 8GB VRAM) - **Optimizer**: AdamW (`lr=1e-4`) - **Scheduler**: Cosine with warmup - **Vision**: ResNet18 with GroupNorm (Cropped to 420x560) ### Original Training Command (My Resume Mode) ```bash python -m lerobot.scripts.lerobot_train \ --config_path diffusion_aloha.yaml \ --env.type aloha \ --env.task AlohaInsertion-v0 \ --dataset.repo_id lerobot/aloha_sim_insertion_human \ --wandb.enable true \ --job_name DP_Aloha_Insertion \ --policy.repo_id Lemon-03/DP_Aloha_Insertion_test \ ``` ### diffusion_aloha.yaml
πŸ“„ Click to view full diffusion_aloha.yaml used for training ```yaml # @package _global_ # Random seed seed: 100000 job_name: Diffusion-Aloha-Insertion # Training parameters steps: 200000 # Original file states 200k steps (Aloha is difficult to train) eval_freq: 20000 # Slightly increased frequency to monitor progress save_freq: 20000 log_freq: 200 batch_size: 8 # ⚠️ Crucial: Aloha requires small batch size, otherwise 8GB VRAM is insufficient # Dataset dataset: repo_id: lerobot/aloha_sim_insertion_human # Evaluation settings eval: n_episodes: 50 batch_size: 8 # Keep consistent with training # Environment settings env: type: aloha task: AlohaInsertion-v0 fps: 50 # Policy configuration policy: type: diffusion # --- Vision processing --- vision_backbone: resnet18 # Aloha images are rectangular, using specific crop dimensions here crop_shape: [420, 560] crop_is_random: true pretrained_backbone_weights: null # Original config specifies not to load pretrained weights use_group_norm: true spatial_softmax_num_keypoints: 32 # --- Diffusion core architecture (U-Net) --- down_dims: [512, 1024, 2048] kernel_size: 5 n_groups: 8 diffusion_step_embed_dim: 128 use_film_scale_modulation: true # --- Action prediction parameters --- n_action_steps: 8 n_obs_steps: 2 horizon: 16 # --- Noise scheduler (DDPM) --- noise_scheduler_type: DDPM num_train_timesteps: 100 num_inference_timesteps: 100 beta_schedule: squaredcos_cap_v2 beta_start: 0.0001 beta_end: 0.02 prediction_type: epsilon clip_sample: true clip_sample_range: 1.0 # --- Optimizer --- optimizer_lr: 1e-4 optimizer_weight_decay: 1e-6 #grad_clip_norm: 10 scheduler_name: cosine scheduler_warmup_steps: 500 use_amp: true ```
----- ## πŸš€ Evaluate (My Evaluation Mode) To evaluate this model locally, run the following command: ```bash python -m lerobot.scripts.lerobot_eval \ --policy.type diffusion \ --policy.pretrained_path Lemon-03/DP_Aloha_Insertion_test \ --eval.n_episodes 50 \ --eval.batch_size 8 \ --env.type aloha \ --env.task AlohaInsertion-v0 ```