LeTau
/

act_aloha_insertion

imitation-learning

Model card Files Files and versions

act_aloha_insertion / README.md

LeTau's picture

Upload folder using huggingface_hub

7cabb6c verified 19 days ago

|

history blame contribute delete

3.82 kB

	---
	license: apache-2.0
	library_name: lerobot
	tags:
	- robotics
	- imitation-learning
	- aloha
	- act
	- lerobot
	datasets:
	- lerobot/aloha_sim_insertion_human_image
	pipeline_tag: robotics
	---

	# ACT Model for ALOHA Insertion Task

	A lightweight Action Chunking with Transformers (ACT) model trained on the ALOHA simulation Insertion task. This is a difficult bimanual coordination task with lower success rate compared to TransferCube.

	## Model Description

	\| Property \| Value \|
	\|----------\|-------\|
	\| Architecture \| ACT (Action Chunking with Transformers) \|
	\| Parameters \| 52M \|
	\| Task \| ALOHA Insertion-v0 \|
	\| Training Steps \| 200,000 \|
	\| Batch Size \| 32 \|
	\| Success Rate \| ~15% \|

	## Training Data

	- Dataset: [lerobot/aloha_sim_insertion_human_image](https://huggingface.co/datasets/lerobot/aloha_sim_insertion_human_image)
	- Episodes: 50 human demonstrations
	- Frames: 20,000

	## Task Description

	The Insertion task requires a bimanual robot to:
	1. Pick up a socket with the left arm
	2. Pick up a peg with the right arm
	3. Insert the peg into the socket in mid-air

	⚠️ This is a difficult task requiring precise bimanual coordination. Success rate is significantly lower than TransferCube.

	## Demo Video

	<video controls src="eval_episode_3.mp4" title="Insertion Demo"></video>

	## Training Environment

	- GPU: RTX A6000
	- Framework: LeRobot 0.4.3
	- Training Time: Around 13 hours

	## Usage

	### Installation
	```bash
	pip install lerobot gym-aloha
	```

	### Training
	```bash
	lerobot-train \
	--policy.type=act \
	--dataset.repo_id=lerobot/aloha_sim_insertion_human_image \
	--env.type=aloha \
	--env.task=AlohaInsertion-v0 \
	--batch_size=32 \
	--steps=200000 \
	--eval.n_episodes=10 \
	--eval_freq=20000 \
	--save_freq=20000 \
	--output_dir=./outputs/act_aloha_insertion \
	--wandb.enable=false \
	--policy.push_to_hub=false
	```

	### Evaluation
	```bash
	lerobot-eval \
	--policy.path=LeTau/act_aloha_insertion \
	--env.type=aloha \
	--env.task=AlohaInsertion-v0 \
	--eval.batch_size=1 \
	--eval.n_episodes=20
	```

	### Fine-tuning
	```bash
	lerobot-train \
	--resume=true \
	--config_path=LeTau/act_aloha_insertion/train_config.json \
	--steps=300000
	```

	## Results

	\| Evaluation \| Episodes \| Success Rate \| Avg Sum Reward \|
	\|------------\|----------\|--------------\|----------------\|
	\| Training (120K) \| 10 \| 10% \| 40.3 \|
	\| Training (200K) \| 10 \| 20% \| 40.4 \|
	\| Independent \| 20 \| 15% \| 51.2 \|

	Expected success rate: 15-20%

	### Task Difficulty Comparison

	\| Task \| Difficulty \| Success Rate \|
	\|------\|------------\|--------------\|
	\| TransferCube \| Easy \| 35-42% \|
	\| Insertion \| Hard \| 15-20% \|

	## Detailed Evaluation Results (Independent)
	```
	Sum Rewards: [0.0, 0.0, 0.0, 240.0, 121.0, 0.0, 0.0, 0.0, 43.0, 0.0,
	256.0, 0.0, 0.0, 321.0, 0.0, 0.0, 0.0, 0.0, 43.0, 0.0]

	Successes: 3/20 episodes
	```

	## Limitations

	- Difficult task: Insertion requires precise bimanual coordination
	- Limited training data: Only 50 demonstration episodes available
	- Low success rate: This is a baseline model for a challenging task
	- Single task: Only trained on Insertion, no multi-task capability


	## Citation
	```bibtex
	@article{zhao2023learning,
	title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
	author={Zhao, Tony Z and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},
	journal={arXiv preprint arXiv:2304.13705},
	year={2023}
	}
	```

	## Acknowledgments

	- [LeRobot](https://github.com/huggingface/lerobot) framework by HuggingFace
	- [ALOHA](https://tonyzhaozh.github.io/aloha/) project by Stanford