ACT Policy for SO101 Robot Arm
An Action Chunking Transformer (ACT) policy trained for the SO101 robot arm manipulation tasks.
Training Environment
Left: Front camera view | Right: Wrist camera view (128x128 each)
Model Details
| Parameter | Value |
|---|---|
| Architecture | ACT (Action Chunking Transformer) |
| Vision Backbone | ResNet50 (ImageNet V2 pretrained) |
| Parameters | 65M |
| Chunk Size | 40 |
| N Action Steps | 15 |
| KL Weight | 1.0 |
| Training Steps | 500,000 |
| Batch Size | 64 |
| Learning Rate | 3e-5 |
| Backbone LR | 1e-5 |
Training Data
- Dataset: SO101 Safe Worker 1
- Episodes: 21,557
- Total Frames: 1.89M
- Cameras: Front + Wrist (128x128)
- Action Space: 4D
- State Space: 10D
- FPS: 10
Usage
from lerobot.policies.act.modeling_act import ACTPolicy
# Load the policy
policy = ACTPolicy.from_pretrained("gpudad/act-so101-chunk40-500k")
# Run inference
action = policy.select_action(observation)
With LeRobot Evaluation
from lerobot.scripts.eval import eval_policy
eval_policy(
policy_path="gpudad/act-so101-chunk40-500k",
env_name="so101_pick_cube",
n_episodes=50,
)
Training Configuration
policy_cfg = ACTConfig(
chunk_size=40, # Predict 40 future actions
n_action_steps=15, # Execute 15 before re-planning
kl_weight=1.0, # Low KL for decisive actions
vision_backbone="resnet50",
pretrained_backbone_weights="ResNet50_Weights.IMAGENET1K_V2",
optimizer_lr=3e-5,
optimizer_lr_backbone=1e-5,
use_amp=True,
)
Performance Notes
- Chunk size 40 covers most episode trajectories (episodes are ~90-120 steps)
- N action steps 15 allows frequent re-planning for error correction
- KL weight 1.0 produces more decisive, less hesitant actions
- ResNet50 provides stronger visual features than ResNet18
Framework
Trained using LeRobot v0.4.2 with Roboport.
License
MIT
- Downloads last month
- 28