SO-101 Ball-in-Cup DOT Policy (Experimental)

A trained DOT (Decoder-Only Transformer) policy for the ball-in-cup task using the SO-101 robot arm.

Status: 🔬 Experimental - Training in progress

What is DOT?

DOT (Decoder-Only Transformer) is an alternative to ACT that uses:

Decoder-only architecture (GPT-style) instead of encoder-decoder
Multi-step observation history (30 lookback steps)
LoRA regularization on visual backbone
No VAE - simpler architecture

Based on Ilia Larchenko's DOT implementation.

Task Description

Goal: Pick up an orange ball from the table and place it into a pink cup.

Robot: SO-101 - 6-DOF robot arm with gripper

Training Details

Parameter	Value
Dataset	abdul004/so101_ball_in_cup_v5
Episodes	72 teleoperated demonstrations
Current Steps	63,000 / 100,000 (63%)
Batch Size	12
Train Horizon	150 steps
Inference Horizon	100 steps
Lookback Obs Steps	30
LoRA Rank	20
Hardware	RTX 3080 Ti on Vast.ai

Current Results (63K Steps)

Checkpoint	VLM Score	Grasp	Lift	Pause %	Notes
11K	30	❌	❌	~80%	Arm positioning only, no ball approach
14K	30	❌	❌	83%	Similar to 11K, arm barely moves
63K	30	❌	❌	53%	Actively approaches ball, attempts grasp, fails to secure

Key Observation (14K → 63K):

✅ Significant improvement in approach behavior - arm now actively moves toward ball
✅ Less hesitation - pause rate dropped from 83% → 53%
✅ Grasp attempts - gripper closes at correct moment (reaches min position 6.3)
❌ Still fails to secure ball - grasp attempt doesn't capture ball in gripper

The policy has learned the approach phase but struggles with precise grasp execution.

Demo (63K Checkpoint)

Side-by-side: Overhead (left) + Wrist (right) showing approach and grasp attempt at ~36s

Sample Evaluation

63K Checkpoint

T1→T5: Start → Pre-Grasp → Grasp (attempt) → Drop → End. Ball approached but not secured.

14K Checkpoint

Ball remains stationary throughout episode - arm barely moves toward target

Comparison with ACT

Metric	ACT (100K)	DOT (14K)	DOT (63K)
Grasp	✅ Yes	❌ No	❌ Attempts
Lift	✅ Yes	❌ No	❌ No
VLM Score	70	30	30
Pause %	7%	83%	53%
Approach	✅	❌ Minimal	✅ Active
Training Time	~8 hrs	~4 hrs	~18 hrs

DOT Configuration

DOTConfig(
    n_obs_steps=3,
    train_horizon=150,
    inference_horizon=100,
    lookback_obs_steps=30,
    lookback_aug=5,
    lora_rank=20,
    crop_scale=0.8,
    state_noise=0.01,
    optimizer_lr=3e-5,
    optimizer_min_lr=1e-5,
)

Known Issues

Data loading bottleneck: DOT loads 13 observation frames per sample vs ACT's 1, causing ~10x slower data loading (CPU-bound video decoding)
Grasp precision: At 63K, policy approaches ball correctly but fails to secure it in gripper
Slower learning curve: DOT requires more training steps than ACT for comparable behavior

Usage

from lerobot.policies.dot.modeling_dot import DOTPolicy

# Load policy
policy = DOTPolicy.from_pretrained("abdul004/so101_dot_policy_v5")

# Run inference
action = policy.select_action(observation)

Training Infrastructure

Challenges encountered:

DataLoader Bus errors with num_workers > 0 (solved with batch_size=12)
Resume functionality required patches for DOT's internal normalization
Interruptible instances on Vast.ai require checkpoint sync to HF Hub

Checkpoint sync script: Automatically uploads checkpoints every 1K steps to prevent data loss on interruption.

Next Steps

~~Continue training to 50K steps~~ (reached 63K)
~~Evaluate learning curve~~ (14K→63K shows progress in approach, not grasp)
Continue to 100K to see if grasp precision improves
Investigate hyperparameter tuning (action horizon, LoRA rank)
Try Pi Zero / Pi0.5 as alternative VLA approach

Acknowledgments

DOT Policy by Ilia Larchenko
LeRobot by Hugging Face

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

abdul004
/

so101_dot_policy_v5