Lemon-03
/

DP_Aloha_Insertion_test

@@ -1,5 +1,7 @@
 ---
-datasets: lerobot/aloha_sim_insertion_human
 library_name: lerobot
 license: apache-2.0
 model_name: diffusion
@@ -8,55 +10,120 @@ tags:
 - lerobot
 - robotics
 - diffusion
 ---
-# Model Card for diffusion
-<!-- Provide a quick summary of what the model is/does. -->
-[Diffusion Policy](https://huggingface.co/papers/2303.04137) treats visuomotor control as a generative diffusion process, producing smooth, multi-step action trajectories that excel at contact-rich manipulation.
-This policy has been trained and pushed to the Hub using [LeRobot](https://github.com/huggingface/lerobot).
-See the full documentation at [LeRobot Docs](https://huggingface.co/docs/lerobot/index).
 ---
-## How to Get Started with the Model
-For a complete walkthrough, see the [training guide](https://huggingface.co/docs/lerobot/il_robots#train-a-policy).
-Below is the short version on how to train and run inference/eval:
-### Train from scratch
 ```bash
-lerobot-train \
-  --dataset.repo_id=${HF_USER}/<dataset> \
-  --policy.type=act \
-  --output_dir=outputs/train/<desired_policy_repo_id> \
-  --job_name=lerobot_training \
-  --policy.device=cuda \
-  --policy.repo_id=${HF_USER}/<desired_policy_repo_id>
-  --wandb.enable=true
-```
-_Writes checkpoints to `outputs/train/<desired_policy_repo_id>/checkpoints/`._
-### Evaluate the policy/run inference
 ```bash
-lerobot-record \
-  --robot.type=so100_follower \
-  --dataset.repo_id=<hf_user>/eval_<dataset> \
-  --policy.path=<hf_user>/<desired_policy_repo_id> \
-  --episodes=10
 ```
-Prefix the dataset repo with **eval\_** and supply `--policy.path` pointing to a local or hub checkpoint.
----
-## Model Details
-- **License:** apache-2.0

+````markdown
 ---
+datasets:
+- lerobot/aloha_sim_insertion_human
 library_name: lerobot
 license: apache-2.0
 model_name: diffusion
 - lerobot
 - robotics
 - diffusion
+- aloha
+- imitation-learning
+- benchmark
 ---
+# Diffusion Policy for Aloha Insertion (200k Steps)
+**Task**: Aloha Insertion (Simulated, 3D Manipulation)
+**Algorithm**: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM)
+**Training Steps**: 200,000
+**Author**: Graduate Student, UESTC (University of Electronic Science and Technology of China)
+This model represents a benchmark experiment for **Diffusion Policy** on the challenging **Aloha Insertion** task (Simulated). It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework to evaluate the algorithm's performance on complex, high-dimensional 3D manipulation tasks compared to baseline methods (ACT).
+### 🔬 Benchmark Results
+This experiment highlights the significant difficulty of the Aloha Insertion task for generative policies under limited compute constraints (Batch Size=8).
+**Evaluation Metrics (50 Episodes):**
+| Metric | Value | Note |
+| :--- | :--- | :--- |
+| **Success Rate** | **0.0%** | Task difficulty is extremely high; model struggled to complete final insertion. |
+| **Avg Max Reward** | **0.10** | Indicates partial success in grasping/moving, but failed alignment. |
+| **Avg Sum Reward** | **8.20** | Shows the model learned valid trajectories but lacked precision at the goal state. |
+**Analysis:**
+Unlike the 2D Push-T task where Diffusion Policy excelled, the high-dimensional visual input (3 cameras) and precise 3D spatial reasoning required for Aloha Insertion proved challenging. While the ACT baseline achieved a 2% success rate (1/50), Diffusion Policy (at 200k steps) demonstrated trajectory learning but failed to finalize the insertion, suggesting a need for longer training or larger batch sizes for this specific domain.
+---
+## Model Details
+- **Architecture**: ResNet18 (Vision Backbone) + U-Net (Diffusion Head)
+- **Input**: 3 Camera Views (Top, Left, Right)
+- **Prediction Horizon**: 16 steps
+- **Observation History**: 2 steps
+- **Action Steps**: 8 steps
+- **Image Resolution**: 480x640 (Cropped to 420x560)
+- **Total Parameters**: ~263 Million
 ---
+## How to Use This Model
+You can evaluate this model or visualize its performance using `lerobot`.
+### 1. Installation
 ```bash
+# Install LeRobot
+pip install lerobot
+````
+### 2\. Evaluate / Visualize
+Run the following command in your terminal to evaluate the model for 50 episodes and save the visualization videos:
 ```bash
+python -m lerobot.scripts.lerobot_eval \
+  --policy.type diffusion \
+  --policy.pretrained_path Lemon-03/DP_Aloha_Insertion_test \
+  --eval.n_episodes 50 \
+  --eval.batch_size 8 \
+  --env.type aloha \
+  --env.task AlohaInsertion-v0
 ```
+-----
+## Training Configuration (Reference)
+For reproducibility, here are the key parameters used during the training session:
+  - **Batch Size**: 8 (Limited by 8GB VRAM)
+  - **Optimizer**: Adam (lr=1e-4, betas=[0.95, 0.999])
+  - **Scheduler**: Cosine with warmup (500 steps)
+  - **Vision**: ResNet18 with GroupNorm
+  - **Precision**: Mixed Precision (AMP) enabled
+<!-- end list -->
+```bash
+# Original Training Command
+python -m lerobot.scripts.lerobot_train \
+  --config_path diffusion_aloha.yaml \
+  --env.type aloha \
+  --env.task AlohaInsertion-v0 \
+  --dataset.repo_id lerobot/aloha_sim_insertion_human \
+  --wandb.enable true \
+  --job_name DP_Aloha_Insertion \
+  --policy.repo_id Lemon-03/DP_Aloha_Insertion_test
+```
+### Config File (`diffusion_aloha.yaml`) Snippet:
+```yaml
+# Training
+steps: 200000
+eval_freq: 20000
+save_freq: 20000
+batch_size: 8
+# Policy
+policy:
+  type: diffusion
+  vision_backbone: resnet18
+  crop_shape: [420, 560]
+  n_action_steps: 8
+  n_obs_steps: 2
+  horizon: 16
+  num_inference_steps: 100
+```
+```
+---