metadata
license: apache-2.0
tags:
- robotics
- lerobot
- act
- imitation-learning
- so101
datasets:
- abdul004/so101_ball_in_cup_v5
pipeline_tag: robotics
SO-101 Ball-in-Cup ACT Policy
A trained ACT (Action Chunking Transformers) policy for the ball-in-cup task using the SO-101 robot arm.
Task Description
Goal: Pick up an orange ball from the table and place it into a pink cup.
Robot: SO-101 - 6-DOF robot arm with gripper
Cameras: Dual camera setup (overhead + wrist-mounted)
Training Details
| Parameter | Value |
|---|---|
| Dataset | abdul004/so101_ball_in_cup_v5 |
| Episodes | 72 teleoperated demonstrations |
| Frames | 25,045 |
| Training Steps | 100,000 |
| Batch Size | 32 |
| Policy Type | ACT (Action Chunking Transformers) |
| Hardware | RTX 3080 Ti / RTX 4090 on Vast.ai |
| Training Time | ~8 hours |
| Cost | ~$2-3 USD |
Evaluation Results
Evaluated using custom metrics + VLM (Gemini) visual assessment:
| Session | VLM Score | Grasp | Lift | Transport | Final Position |
|---|---|---|---|---|---|
| s1 | 70 | β Yes | β Yes | β Yes | on_table (dropped) |
| base | 50 | β Yes | β Yes | β Yes | on_table (dropped) |
| s2 | 30 | β Yes | β Yes | β οΈ Partial | on_table |
Key Findings:
- Successfully learns grasp and lift behaviors
- Struggles with final placement (often drops during transport)
- Low pause rate (~7%) indicates confident movements
- High gripper activity suggests active grasping attempts
Demo
Side-by-side: Overhead camera (left) + Wrist camera (right)
Sample Evaluation
5-frame composite showing: Start β Approach β Grasp β Transport β Final
Usage
from lerobot.common.policies.act.modeling_act import ACTPolicy
# Load policy
policy = ACTPolicy.from_pretrained("abdul004/so101_act_policy_v5")
# Run inference
action = policy.select_action(observation)
Comparison with DOT Policy
Also trained a DOT (Decoder-Only Transformer) policy on the same dataset:
| Policy | Steps | Grasp | Lift | VLM Score |
|---|---|---|---|---|
| ACT | 100K | β | β | 70 |
| DOT | 14K | β | β | 30 |
DOT training ongoing - decoder-only architecture may require more steps to converge.
Infrastructure Notes
Cloud Training Setup:
- Platform: Vast.ai (interruptible instances for cost savings)
- Checkpoint sync: Automatic upload to HF Hub every 1K steps
- Resume capability: Training can resume from any checkpoint after interruption
Evaluation Pipeline:
- Automated metrics from joint/action data (pause%, movement variance)
- VLM-based visual assessment using composite images
- Dual-camera frame capture at key moments
Limitations
- Success rate not yet 100% - drops ball during transport phase
- Sensitive to ball/cup initial positioning
- 72 episodes may be insufficient for robust generalization
Citation
@misc{so101_ball_in_cup,
author = {Abdul},
title = {SO-101 Ball-in-Cup Policy Training},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/abdul004/so101_act_policy_v5}
}
Acknowledgments
- LeRobot by Hugging Face
- ACT Policy architecture
- SO-101 robot design community