X-VLA SO-101 Phase II - All Checkpoints

Fine-tuned X-VLA model checkpoints for SO-101 robot arm pick-and-place task.

Model Details

Base model: lerobot/xvla-base
Training steps: 200,000 total
Task: Pick up cube and place in bin
Robot: SO-101 single arm
Action space: Delta position control (4D: x, y, z, gripper)
Domain ID: 0 (WidowX-compatible)

Available Checkpoints

Checkpoint	Steps	Path
020000	20,000	`020000/pretrained_model/`
040000	40,000	`040000/pretrained_model/`
060000	60,000	`060000/pretrained_model/`
080000	80,000	`080000/pretrained_model/`
100000	100,000	`100000/pretrained_model/`
120000	120,000	`120000/pretrained_model/`
140000	140,000	`140000/pretrained_model/`
160000	160,000	`160000/pretrained_model/`
180000	180,000	`180000/pretrained_model/`
200000	200,000	`200000/pretrained_model/`

Training Configuration

Frozen: Vision encoder, Language encoder
Trained: Policy transformer, Soft prompts, Action heads
Loss: L1 for XYZ, BCE for gripper
LR: 1e-4 → 1e-5 with warmup

Best Checkpoint

The 200000 checkpoint is recommended - it achieves:

Phase	Status
Approach cube	✅ Works
Grasp cube	✅ Works
Place in bin	⚠️ Partial

Usage

from lerobot.common.policies.xvla.modeling_xvla import XVLAPolicy

# Load best checkpoint (200k)
policy = XVLAPolicy.from_pretrained(
    "gpudad/xvla-so101-phase2-checkpoints",
    subfolder="200000/pretrained_model"
)

# Or load an earlier checkpoint
policy = XVLAPolicy.from_pretrained(
    "gpudad/xvla-so101-phase2-checkpoints",
    subfolder="100000/pretrained_model"
)

Evaluation Tips

Use n_action_steps=4 for faster re-querying (better performance)
Model works best with 128x128 images (front + wrist cameras)
Language instruction: "pick up the cube and place it in the bin"

Files Structure

├── 020000/
│   └── pretrained_model/
│       ├── model.safetensors
│       ├── config.json
│       └── ...
├── 040000/
│   └── pretrained_model/
├── ...
└── 200000/
    └── pretrained_model/

Citation

Based on X-VLA from LeRobot.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics