File size: 2,581 Bytes
6fbda45 d42372e 6fbda45 d42372e 6fbda45 d42372e 6fbda45 d42372e 6fbda45 d42372e 6fbda45 d42372e 6fbda45 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
---
license: apache-2.0
tags:
- robotics
- act
- lerobot
- manipulation
- imitation-learning
datasets:
- gpudad/so101_pick_cube_chunked
library_name: lerobot
pipeline_tag: robotics
---
# ACT Model for SO-101 Pick Cube Task
This is an Action Chunking Transformer (ACT) model trained on the SO-101 robot arm for a cube picking task.
## Demo

*Visualization showing ground truth (green) vs predicted actions (blue) with mean absolute error per frame.*
## Environment

## Model Details
| Parameter | Value |
|-----------|-------|
| Architecture | ACT (Action Chunking Transformer) |
| Vision Backbone | ResNet18 |
| Training Steps | 500,000 |
| Chunk Size | 100 |
| N Action Steps | 1 (with temporal ensembling) |
| Temporal Ensemble Coeff | 0.01 |
| KL Weight | 10.0 |
| Batch Size | 16 |
| Learning Rate | 3e-5 |
| Parameters | 51.6M |
## Evaluation Metrics
Evaluated on a sample episode from the training set:
| Joint | MAE | MSE |
|-------|-----|-----|
| Joint 0 | 0.0374 | 0.0034 |
| Joint 1 | 0.0342 | 0.0042 |
| Joint 2 | 0.0394 | 0.0025 |
| Joint 3 | 0.0216 | 0.0011 |
| Joint 4 | 0.0264 | 0.0009 |
| Joint 5 (gripper) | 0.0020 | 0.00001 |
| **Overall** | **0.0268** | **0.0020** |
## Training Dataset
Trained on [gpudad/so101_pick_cube_chunked](https://huggingface.co/datasets/gpudad/so101_pick_cube_chunked) - a chunked version of the SO-101 pick cube dataset with episode-level video files for efficient loading.
- ~11k episodes
- 3 camera views (front, overhead, wrist)
- 30 FPS
## Camera Views
The model uses 3 camera inputs:
- **Front camera** - Main observation view
- **Overhead camera** - Top-down perspective
- **Wrist camera** - End-effector mounted camera
## Training Command
```bash
python -m roboport.train act \
/path/to/so101_pick_cube_chunked \
-o /path/to/output \
--steps 500000 \
--chunk-size 100 \
--n-action-steps 1 \
--temporal-ensemble 0.01 \
--kl-weight 10.0 \
--batch-size 16 \
--lr 3e-5 \
--vision-backbone resnet18 \
--save-freq 50000 \
--gpu 0
```
## Usage
```python
from lerobot.policies.act.modeling_act import ACTPolicy
policy = ACTPolicy.from_pretrained("gpudad/act_so101_pick_cube")
policy.eval()
# Run inference
action = policy.select_action(observation)
```
## Framework
Trained using [roboport](https://github.com/DreamwareInc/roboport) with LeRobot backend.
|