Upload folder using huggingface_hub

6126e33 verified 17 days ago

4.79 kB

	---
	license: apache-2.0
	library_name: lerobot
	tags:
	- robotics
	- imitation-learning
	- aloha
	- diffusion-policy
	- lerobot
	- baseline
	datasets:
	- lerobot/aloha_sim_insertion_human_image
	pipeline_tag: robotics
	---

	# Diffusion Policy for ALOHA Insertion Task (Baseline)

	⚠️ Note: This model underperforms ACT on this task. Published for comparison purposes.

	A Diffusion Policy model trained on the ALOHA simulation Insertion task. This model is published as a baseline comparison to demonstrate that ACT outperforms Diffusion Policy on ALOHA bimanual tasks.

	## Key Finding

	\| Model \| Steps \| Success Rate \| Task Difficulty \|
	\|-------\|-------\|--------------\|-----------------\|
	\| ACT \| 200K \| 15% \| Hard \|
	\| Diffusion Policy \| 200K \| 10% \| Hard \|

	Conclusion: ACT is the recommended approach for ALOHA tasks.

	## Model Description

	\| Property \| Value \|
	\|----------\|-------\|
	\| Architecture \| Diffusion Policy \|
	\| Parameters \| ~100M \|
	\| Task \| ALOHA Insertion-v0 \|
	\| Training Steps \| 200,000 \|
	\| Batch Size \| 32 \|
	\| Success Rate \| 0-10% \|

	## Training Data

	- Dataset: [lerobot/aloha_sim_insertion_human_image](https://huggingface.co/datasets/lerobot/aloha_sim_insertion_human_image)
	- Episodes: 50 human demonstrations
	- Frames: 20,000

	## Task Description

	The Insertion task requires a bimanual robot to:
	1. Pick up a socket with the left arm
	2. Pick up a peg with the right arm
	3. Insert the peg into the socket in mid-air

	⚠️ This is a difficult task requiring precise bimanual coordination.

	## Demo Video

	<video controls src="eval_episode_3.mp4" title="Insertion Diffusion Policy Demo"></video>

	## Training Environment

	- GPU: RTX A6000
	- Framework: LeRobot 0.4.3
	- Training Time: Around 12 hours

	## Usage

	### Installation
	```bash
	pip install lerobot gym-aloha
	```

	### Training
	```bash
	lerobot-train \
	--policy.type=diffusion \
	--dataset.repo_id=lerobot/aloha_sim_insertion_human_image \
	--env.type=aloha \
	--env.task=AlohaInsertion-v0 \
	--batch_size=32 \
	--steps=200000 \
	--eval.n_episodes=10 \
	--eval_freq=20000 \
	--save_freq=20000 \
	--output_dir=./outputs/dp_aloha_insertion \
	--wandb.enable=false \
	--policy.push_to_hub=false
	```

	### Evaluation
	```bash
	lerobot-eval \
	--policy.path=LeTau/diffusion_aloha_insertion \
	--env.type=aloha \
	--env.task=AlohaInsertion-v0 \
	--eval.batch_size=1 \
	--eval.n_episodes=20
	```

	## Results

	\| Evaluation \| Episodes \| Success Rate \| Avg Sum Reward \|
	\|------------\|----------\|--------------\|----------------\|
	\| Training (200K) \| 10 \| 10% \| 25.0 \|
	\| Independent \| 20 \| 0% \| 17.4 \|

	Expected success rate: 0-10%

	## Detailed Evaluation Results (Independent)
	```
	Sum Rewards: [0.0, 0.0, 37.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
	0.0, 0.0, 0.0, 311.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

	Successes: 0/20 episodes
	```


	## Comparison: ACT vs Diffusion Policy on ALOHA Tasks

	\| Task \| ACT \| Diffusion Policy \|
	\|------\|-----\|------------------\|
	\| TransferCube (Easy) \| 42% \| 10% \|
	\| Insertion (Hard) \| 15% \| 0% \|

	ACT consistently outperforms Diffusion Policy on ALOHA bimanual tasks.

	## Why Does Diffusion Policy Underperform?

	1. ACT is designed for ALOHA: ACT was specifically created for bimanual manipulation tasks
	2. Data efficiency: Diffusion Policy may need more demonstrations to learn effectively
	3. Task characteristics: ALOHA tasks require precise, deterministic actions rather than multi-modal action distributions

	## Recommendation

	For ALOHA bimanual tasks, use ACT instead:
	- [LeTau/act_aloha_transfer_cube](https://huggingface.co/LeTau/act_aloha_transfer_cube) - 42% success rate
	- [LeTau/act_aloha_insertion](https://huggingface.co/LeTau/act_aloha_insertion) - 15% success rate

	## Citation
	```bibtex
	@article{zhao2023learning,
	title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
	author={Zhao, Tony Z and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},
	journal={arXiv preprint arXiv:2304.13705},
	year={2023}
	}

	@article{chi2023diffusion,
	title={Diffusion Policy: Visuomotor Policy Learning via Action Diffusion},
	author={Chi, Cheng and Feng, Siyuan and Du, Yilun and Xu, Zhenjia and Cousineau, Eric and Burchfiel, Benjamin and Song, Shuran},
	journal={arXiv preprint arXiv:2303.04137},
	year={2023}
	}
	```

	## Acknowledgments

	- [LeRobot](https://github.com/huggingface/lerobot) framework by HuggingFace
	- [ALOHA](https://tonyzhaozh.github.io/aloha/) project by Stanford
	- [Diffusion Policy](https://diffusion-policy.cs.columbia.edu/) project by Columbia