File size: 4,378 Bytes
ee8ee5d dfd8420 ee8ee5d dfd8420 ee8ee5d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
---
license: apache-2.0
library_name: lerobot
tags:
- robotics
- imitation-learning
- aloha
- diffusion-policy
- lerobot
- baseline
datasets:
- lerobot/aloha_sim_transfer_cube_human_image
pipeline_tag: robotics
---
# Diffusion Policy for ALOHA TransferCube Task (Baseline)
⚠️ **Note: This model underperforms ACT on this task. Published for comparison purposes.**
A Diffusion Policy model trained on the ALOHA simulation TransferCube task. This model is published as a **baseline comparison** to demonstrate that ACT significantly outperforms Diffusion Policy on ALOHA bimanual tasks.
## Key Finding
| Model | Steps | Success Rate | Parameters |
|-------|-------|--------------|------------|
| **ACT** | 60K | **42%** | 52M |
| Diffusion Policy | 200K | 10% | ~100M |
**Conclusion: ACT is the recommended approach for ALOHA tasks.**
## Model Description
| Property | Value |
|----------|-------|
| Architecture | Diffusion Policy |
| Parameters | ~100M |
| Task | ALOHA TransferCube-v0 |
| Training Steps | 200,000 |
| Batch Size | 32 |
| Success Rate | ~10% |
## Training Data
- **Dataset**: [lerobot/aloha_sim_transfer_cube_human_image](https://huggingface.co/datasets/lerobot/aloha_sim_transfer_cube_human_image)
- **Episodes**: 50 human demonstrations
- **Frames**: 20,000
## Task Description
The TransferCube task requires a bimanual robot to:
1. Pick up a red cube with the right arm
2. Transfer the cube to the left gripper
## Demo Video
<video controls src="eval_episode_3.mp4" title="TransferCube Diffusion Policy Demo"></video>
## Training Environment
- **GPU**: RTX A6000
- **Framework**: LeRobot 0.4.3
- **Training Time**: Around 12 hours
## Usage
### Installation
```bash
pip install lerobot gym-aloha
```
### Training
```bash
lerobot-train \
--policy.type=diffusion \
--dataset.repo_id=lerobot/aloha_sim_transfer_cube_human_image \
--env.type=aloha \
--env.task=AlohaTransferCube-v0 \
--batch_size=32 \
--steps=200000 \
--eval.n_episodes=10 \
--eval_freq=20000 \
--save_freq=20000 \
--output_dir=./outputs/dp_aloha_transfer_cube \
--wandb.enable=false \
--policy.push_to_hub=false
```
### Evaluation
```bash
lerobot-eval \
--policy.path=LeTau/diffusion_aloha_transfer_cube \
--env.type=aloha \
--env.task=AlohaTransferCube-v0 \
--eval.batch_size=1 \
--eval.n_episodes=20
```
## Results
| Evaluation | Episodes | Success Rate | Avg Sum Reward |
|------------|----------|--------------|----------------|
| Training (100K) | 10 | 10% | 23.7 |
| Training (200K) | 10 | 10% | 23.3 |
| Independent | 20 | 10% | 28.3 |
**Expected success rate: ~10%**
## Detailed Evaluation Results (Independent)
```
Sum Rewards: [0.0, 0.0, 253.0, 4.0, 0.0, 0.0, 0.0, 81.0, 21.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 207.0, 0.0, 0.0, 0.0, 0.0]
Successes: 2/20 episodes
```
## Why Does Diffusion Policy Underperform?
1. **ACT is designed for ALOHA**: ACT was specifically created for bimanual manipulation tasks
2. **Data efficiency**: Diffusion Policy may need more demonstrations to learn effectively
3. **Task characteristics**: TransferCube requires precise, deterministic actions rather than multi-modal action distributions
## Recommendation
For ALOHA bimanual tasks, use **ACT** instead:
- [LeTau/act_aloha_transfer_cube](https://huggingface.co/LeTau/act_aloha_transfer_cube) - 42% success rate
## Citation
```bibtex
@article{zhao2023learning,
title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
author={Zhao, Tony Z and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},
journal={arXiv preprint arXiv:2304.13705},
year={2023}
}
@article{chi2023diffusion,
title={Diffusion Policy: Visuomotor Policy Learning via Action Diffusion},
author={Chi, Cheng and Feng, Siyuan and Du, Yilun and Xu, Zhenjia and Cousineau, Eric and Burchfiel, Benjamin and Song, Shuran},
journal={arXiv preprint arXiv:2303.04137},
year={2023}
}
```
## Acknowledgments
- [LeRobot](https://github.com/huggingface/lerobot) framework by HuggingFace
- [ALOHA](https://tonyzhaozh.github.io/aloha/) project by Stanford
- [Diffusion Policy](https://diffusion-policy.cs.columbia.edu/) project by Columbia |