Robotics
LeRobot
Safetensors
imitation-learning
aloha
act
File size: 3,816 Bytes
b3855d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7cabb6c
 
 
 
 
 
 
 
 
 
b3855d5
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---

license: apache-2.0
library_name: lerobot
tags:
  - robotics
  - imitation-learning
  - aloha
  - act
  - lerobot
datasets:
  - lerobot/aloha_sim_insertion_human_image
pipeline_tag: robotics
---


# ACT Model for ALOHA Insertion Task

A lightweight Action Chunking with Transformers (ACT) model trained on the ALOHA simulation Insertion task. This is a **difficult bimanual coordination task** with lower success rate compared to TransferCube.

## Model Description

| Property | Value |
|----------|-------|
| Architecture | ACT (Action Chunking with Transformers) |
| Parameters | 52M |
| Task | ALOHA Insertion-v0 |
| Training Steps | 200,000 |
| Batch Size | 32 |
| Success Rate | ~15% |

## Training Data

- **Dataset**: [lerobot/aloha_sim_insertion_human_image](https://huggingface.co/datasets/lerobot/aloha_sim_insertion_human_image)
- **Episodes**: 50 human demonstrations
- **Frames**: 20,000

## Task Description

The Insertion task requires a bimanual robot to:
1. Pick up a socket with the left arm
2. Pick up a peg with the right arm
3. Insert the peg into the socket in mid-air

⚠️ **This is a difficult task** requiring precise bimanual coordination. Success rate is significantly lower than TransferCube.

## Demo Video

<video controls src="eval_episode_3.mp4" title="Insertion Demo"></video>

## Training Environment

- **GPU**: RTX A6000
- **Framework**: LeRobot 0.4.3
- **Training Time**: Around 13 hours

## Usage

### Installation
```bash

pip install lerobot gym-aloha

```

### Training
```bash

lerobot-train \

    --policy.type=act \

    --dataset.repo_id=lerobot/aloha_sim_insertion_human_image \

    --env.type=aloha \

    --env.task=AlohaInsertion-v0 \

    --batch_size=32 \

    --steps=200000 \

    --eval.n_episodes=10 \

    --eval_freq=20000 \

    --save_freq=20000 \

    --output_dir=./outputs/act_aloha_insertion \

    --wandb.enable=false \

    --policy.push_to_hub=false

```

### Evaluation
```bash

lerobot-eval \

    --policy.path=LeTau/act_aloha_insertion \

    --env.type=aloha \

    --env.task=AlohaInsertion-v0 \

    --eval.batch_size=1 \

    --eval.n_episodes=20

```

### Fine-tuning
```bash

lerobot-train \

    --resume=true \

    --config_path=LeTau/act_aloha_insertion/train_config.json \

    --steps=300000

```

## Results

| Evaluation | Episodes | Success Rate | Avg Sum Reward |
|------------|----------|--------------|----------------|
| Training (120K) | 10 | 10% | 40.3 |
| Training (200K) | 10 | 20% | 40.4 |
| Independent | 20 | 15% | 51.2 |

**Expected success rate: 15-20%**

### Task Difficulty Comparison

| Task | Difficulty | Success Rate |
|------|------------|--------------|
| TransferCube | Easy | 35-42% |
| **Insertion** | **Hard** | **15-20%** |

## Detailed Evaluation Results (Independent)
```

Sum Rewards: [0.0, 0.0, 0.0, 240.0, 121.0, 0.0, 0.0, 0.0, 43.0, 0.0,

              256.0, 0.0, 0.0, 321.0, 0.0, 0.0, 0.0, 0.0, 43.0, 0.0]



Successes: 3/20 episodes

```

## Limitations

- **Difficult task**: Insertion requires precise bimanual coordination
- **Limited training data**: Only 50 demonstration episodes available
- **Low success rate**: This is a baseline model for a challenging task
- **Single task**: Only trained on Insertion, no multi-task capability


## Citation
```bibtex

@article{zhao2023learning,

  title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},

  author={Zhao, Tony Z and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},

  journal={arXiv preprint arXiv:2304.13705},

  year={2023}

}

```

## Acknowledgments

- [LeRobot](https://github.com/huggingface/lerobot) framework by HuggingFace
- [ALOHA](https://tonyzhaozh.github.io/aloha/) project by Stanford