File size: 4,792 Bytes
82c9540
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6126e33
82c9540
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
00b78b3
 
 
 
 
 
 
 
 
 
 
82c9540
 
 
 
 
 
 
39551ef
82c9540
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---

license: apache-2.0
library_name: lerobot
tags:
  - robotics
  - imitation-learning
  - aloha
  - diffusion-policy
  - lerobot
  - baseline
datasets:
  - lerobot/aloha_sim_insertion_human_image
pipeline_tag: robotics
---


# Diffusion Policy for ALOHA Insertion Task (Baseline)

⚠️ **Note: This model underperforms ACT on this task. Published for comparison purposes.**

A Diffusion Policy model trained on the ALOHA simulation Insertion task. This model is published as a **baseline comparison** to demonstrate that ACT outperforms Diffusion Policy on ALOHA bimanual tasks.

## Key Finding

| Model | Steps | Success Rate | Task Difficulty |
|-------|-------|--------------|-----------------|
| **ACT** | 200K | **15%** | Hard |
| Diffusion Policy | 200K | 10% | Hard |

**Conclusion: ACT is the recommended approach for ALOHA tasks.**

## Model Description

| Property | Value |
|----------|-------|
| Architecture | Diffusion Policy |
| Parameters | ~100M |
| Task | ALOHA Insertion-v0 |
| Training Steps | 200,000 |
| Batch Size | 32 |
| Success Rate | 0-10% |

## Training Data

- **Dataset**: [lerobot/aloha_sim_insertion_human_image](https://huggingface.co/datasets/lerobot/aloha_sim_insertion_human_image)
- **Episodes**: 50 human demonstrations
- **Frames**: 20,000

## Task Description

The Insertion task requires a bimanual robot to:
1. Pick up a socket with the left arm
2. Pick up a peg with the right arm
3. Insert the peg into the socket in mid-air

⚠️ **This is a difficult task** requiring precise bimanual coordination.

## Demo Video

<video controls src="eval_episode_3.mp4" title="Insertion Diffusion Policy Demo"></video>

## Training Environment

- **GPU**: RTX A6000
- **Framework**: LeRobot 0.4.3
- **Training Time**: Around 12 hours

## Usage

### Installation
```bash

pip install lerobot gym-aloha

```

### Training
```bash

lerobot-train \

    --policy.type=diffusion \

    --dataset.repo_id=lerobot/aloha_sim_insertion_human_image \

    --env.type=aloha \

    --env.task=AlohaInsertion-v0 \

    --batch_size=32 \

    --steps=200000 \

    --eval.n_episodes=10 \

    --eval_freq=20000 \

    --save_freq=20000 \

    --output_dir=./outputs/dp_aloha_insertion \

    --wandb.enable=false \

    --policy.push_to_hub=false

```

### Evaluation
```bash

lerobot-eval \

    --policy.path=LeTau/diffusion_aloha_insertion \

    --env.type=aloha \

    --env.task=AlohaInsertion-v0 \

    --eval.batch_size=1 \

    --eval.n_episodes=20

```

## Results

| Evaluation | Episodes | Success Rate | Avg Sum Reward |
|------------|----------|--------------|----------------|
| Training (200K) | 10 | 10% | 25.0 |
| Independent | 20 | 0% | 17.4 |

**Expected success rate: 0-10%**

## Detailed Evaluation Results (Independent)
```

Sum Rewards: [0.0, 0.0, 37.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

              0.0, 0.0, 0.0, 311.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]



Successes: 0/20 episodes

```


## Comparison: ACT vs Diffusion Policy on ALOHA Tasks

| Task | ACT | Diffusion Policy |
|------|-----|------------------|
| TransferCube (Easy) | **42%** | 10% |
| Insertion (Hard) | **15%** | 0% |

**ACT consistently outperforms Diffusion Policy on ALOHA bimanual tasks.**

## Why Does Diffusion Policy Underperform?

1. **ACT is designed for ALOHA**: ACT was specifically created for bimanual manipulation tasks
2. **Data efficiency**: Diffusion Policy may need more demonstrations to learn effectively
3. **Task characteristics**: ALOHA tasks require precise, deterministic actions rather than multi-modal action distributions

## Recommendation

For ALOHA bimanual tasks, use **ACT** instead:
- [LeTau/act_aloha_transfer_cube](https://huggingface.co/LeTau/act_aloha_transfer_cube) - 42% success rate
- [LeTau/act_aloha_insertion](https://huggingface.co/LeTau/act_aloha_insertion) - 15% success rate

## Citation
```bibtex

@article{zhao2023learning,

  title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},

  author={Zhao, Tony Z and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},

  journal={arXiv preprint arXiv:2304.13705},

  year={2023}

}



@article{chi2023diffusion,

  title={Diffusion Policy: Visuomotor Policy Learning via Action Diffusion},

  author={Chi, Cheng and Feng, Siyuan and Du, Yilun and Xu, Zhenjia and Cousineau, Eric and Burchfiel, Benjamin and Song, Shuran},

  journal={arXiv preprint arXiv:2303.04137},

  year={2023}

}

```

## Acknowledgments

- [LeRobot](https://github.com/huggingface/lerobot) framework by HuggingFace
- [ALOHA](https://tonyzhaozh.github.io/aloha/) project by Stanford
- [Diffusion Policy](https://diffusion-policy.cs.columbia.edu/) project by Columbia