ACT Policy for SO101 Robot Arm

An Action Chunking Transformer (ACT) policy trained for the SO101 robot arm manipulation tasks.

Training Environment

Training Environment Left: Front camera view | Right: Wrist camera view (128x128 each)

Model Details

Parameter Value
Architecture ACT (Action Chunking Transformer)
Vision Backbone ResNet50 (ImageNet V2 pretrained)
Parameters 65M
Chunk Size 40
N Action Steps 15
KL Weight 1.0
Training Steps 500,000
Batch Size 64
Learning Rate 3e-5
Backbone LR 1e-5

Training Data

  • Dataset: SO101 Safe Worker 1
  • Episodes: 21,557
  • Total Frames: 1.89M
  • Cameras: Front + Wrist (128x128)
  • Action Space: 4D
  • State Space: 10D
  • FPS: 10

Usage

from lerobot.policies.act.modeling_act import ACTPolicy

# Load the policy
policy = ACTPolicy.from_pretrained("gpudad/act-so101-chunk40-500k")

# Run inference
action = policy.select_action(observation)

With LeRobot Evaluation

from lerobot.scripts.eval import eval_policy

eval_policy(
    policy_path="gpudad/act-so101-chunk40-500k",
    env_name="so101_pick_cube",
    n_episodes=50,
)

Training Configuration

policy_cfg = ACTConfig(
    chunk_size=40,              # Predict 40 future actions
    n_action_steps=15,          # Execute 15 before re-planning
    kl_weight=1.0,              # Low KL for decisive actions
    vision_backbone="resnet50",
    pretrained_backbone_weights="ResNet50_Weights.IMAGENET1K_V2",
    optimizer_lr=3e-5,
    optimizer_lr_backbone=1e-5,
    use_amp=True,
)

Performance Notes

  • Chunk size 40 covers most episode trajectories (episodes are ~90-120 steps)
  • N action steps 15 allows frequent re-planning for error correction
  • KL weight 1.0 produces more decisive, less hesitant actions
  • ResNet50 provides stronger visual features than ResNet18

Framework

Trained using LeRobot v0.4.2 with Roboport.

License

MIT

Downloads last month
28
Video Preview
loading