SO-101 Ball-in-Cup ACT Policy

A trained ACT (Action Chunking Transformers) policy for the ball-in-cup task using the SO-101 robot arm.

Task Description

Goal: Pick up an orange ball from the table and place it into a pink cup.

Robot: SO-101 - 6-DOF robot arm with gripper

Cameras: Dual camera setup (overhead + wrist-mounted)

Training Details

Parameter	Value
Dataset	abdul004/so101_ball_in_cup_v5
Episodes	72 teleoperated demonstrations
Frames	25,045
Training Steps	100,000
Batch Size	32
Policy Type	ACT (Action Chunking Transformers)
Hardware	RTX 3080 Ti / RTX 4090 on Vast.ai
Training Time	~8 hours
Cost	~$2-3 USD

Evaluation Results

Evaluated using custom metrics + VLM (Gemini) visual assessment:

Session	VLM Score	Grasp	Lift	Transport	Final Position
s1	70	✅ Yes	✅ Yes	✅ Yes	on_table (dropped)
base	50	✅ Yes	✅ Yes	✅ Yes	on_table (dropped)
s2	30	✅ Yes	✅ Yes	⚠️ Partial	on_table

Key Findings:

Successfully learns grasp and lift behaviors
Struggles with final placement (often drops during transport)
Low pause rate (~7%) indicates confident movements
High gripper activity suggests active grasping attempts

Demo

Side-by-side: Overhead camera (left) + Wrist camera (right)

Sample Evaluation

5-frame composite showing: Start → Approach → Grasp → Transport → Final

Usage

from lerobot.common.policies.act.modeling_act import ACTPolicy

# Load policy
policy = ACTPolicy.from_pretrained("abdul004/so101_act_policy_v5")

# Run inference
action = policy.select_action(observation)

Comparison with DOT Policy

Also trained a DOT (Decoder-Only Transformer) policy on the same dataset:

Policy	Steps	Grasp	Lift	VLM Score
ACT	100K	✅	✅	70
DOT	14K	❌	❌	30

DOT training ongoing - decoder-only architecture may require more steps to converge.

Infrastructure Notes

Cloud Training Setup:

Platform: Vast.ai (interruptible instances for cost savings)
Checkpoint sync: Automatic upload to HF Hub every 1K steps
Resume capability: Training can resume from any checkpoint after interruption

Evaluation Pipeline:

Automated metrics from joint/action data (pause%, movement variance)
VLM-based visual assessment using composite images
Dual-camera frame capture at key moments

Limitations

Success rate not yet 100% - drops ball during transport phase
Sensitive to ball/cup initial positioning
72 episodes may be insufficient for robust generalization

Citation

@misc{so101_ball_in_cup,
  author = {Abdul},
  title = {SO-101 Ball-in-Cup Policy Training},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/abdul004/so101_act_policy_v5}
}

Acknowledgments

LeRobot by Hugging Face
ACT Policy architecture
SO-101 robot design community

Downloads last month: 2

Video Preview

Robotics

Dataset used to train abdul004/so101_act_policy_v5

Paper for abdul004/so101_act_policy_v5

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

Paper • 2304.13705 • Published Apr 23, 2023 • 7