--- license: apache-2.0 tags: - robotics - act - lerobot - manipulation - imitation-learning datasets: - gpudad/so101_pick_cube_chunked library_name: lerobot pipeline_tag: robotics --- # ACT Model for SO-101 Pick Cube Task This is an Action Chunking Transformer (ACT) model trained on the SO-101 robot arm for a cube picking task. ## Demo ![Model Evaluation](https://huggingface.co/gpudad/act_so101_pick_cube/resolve/main/act_eval_500k.gif) *Visualization showing ground truth (green) vs predicted actions (blue) with mean absolute error per frame.* ## Environment ![Environment Preview](https://huggingface.co/datasets/gpudad/so101_pick_cube_chunked/resolve/main/camera_angles.png) ## Model Details | Parameter | Value | |-----------|-------| | Architecture | ACT (Action Chunking Transformer) | | Vision Backbone | ResNet18 | | Training Steps | 500,000 | | Chunk Size | 100 | | N Action Steps | 1 (with temporal ensembling) | | Temporal Ensemble Coeff | 0.01 | | KL Weight | 10.0 | | Batch Size | 16 | | Learning Rate | 3e-5 | | Parameters | 51.6M | ## Evaluation Metrics Evaluated on a sample episode from the training set: | Joint | MAE | MSE | |-------|-----|-----| | Joint 0 | 0.0374 | 0.0034 | | Joint 1 | 0.0342 | 0.0042 | | Joint 2 | 0.0394 | 0.0025 | | Joint 3 | 0.0216 | 0.0011 | | Joint 4 | 0.0264 | 0.0009 | | Joint 5 (gripper) | 0.0020 | 0.00001 | | **Overall** | **0.0268** | **0.0020** | ## Training Dataset Trained on [gpudad/so101_pick_cube_chunked](https://huggingface.co/datasets/gpudad/so101_pick_cube_chunked) - a chunked version of the SO-101 pick cube dataset with episode-level video files for efficient loading. - ~11k episodes - 3 camera views (front, overhead, wrist) - 30 FPS ## Camera Views The model uses 3 camera inputs: - **Front camera** - Main observation view - **Overhead camera** - Top-down perspective - **Wrist camera** - End-effector mounted camera ## Training Command ```bash python -m roboport.train act \ /path/to/so101_pick_cube_chunked \ -o /path/to/output \ --steps 500000 \ --chunk-size 100 \ --n-action-steps 1 \ --temporal-ensemble 0.01 \ --kl-weight 10.0 \ --batch-size 16 \ --lr 3e-5 \ --vision-backbone resnet18 \ --save-freq 50000 \ --gpu 0 ``` ## Usage ```python from lerobot.policies.act.modeling_act import ACTPolicy policy = ACTPolicy.from_pretrained("gpudad/act_so101_pick_cube") policy.eval() # Run inference action = policy.select_action(observation) ``` ## Framework Trained using [roboport](https://github.com/DreamwareInc/roboport) with LeRobot backend.