RoboTwin2 Checkpoints
ACT, and pi0.5 single-task finetuning using B200 GPU on RoboTwin2.0 dataset.
The policies were trained on the following Tasks:
place_phone_stand
place_a2b_left
move_can_pot
handover_block
put_bottles_dustbin
Data
- Demonstrations: 50
demo_clean episodes per task
- Embodiment: aloha-agilex (dual-arm)
- Action dim: 14 (6 DOF × 2 arms + 2 grippers)
- Cameras:
cam_high, cam_right_wrist, cam_left_wrist
ACT (Action Chunking Transformers)
Architecture
| Param |
Value |
| Backbone |
ResNet-18 |
| Hidden dim |
512 |
| Feedforward dim |
3200 |
| Attention heads |
8 |
| Encoder layers |
4 |
| Decoder layers |
7 |
| Chunk size |
50 |
| KL weight |
10 |
| Action dim |
14 |
| Dropout |
0.1 |
| Parameters |
~83.9M |
Training
| Param |
Value |
| Batch size |
8 |
| Epochs |
6000 |
| Learning rate |
1e-5 |
| LR backbone |
1e-5 |
| Weight decay |
1e-4 |
| Optimizer |
AdamW |
| Save freq |
every 2000 epochs |
Checkpoints
| Path |
Seed |
Val Loss |
ACT/act-place_phone_stand/demo_clean-50/ |
0 |
— |
ACT/act-place_phone_stand-run2/demo_clean-50/ |
1 |
0.038 |
ACT/act-place_a2b_left/demo_clean-50/ |
0 |
— |
ACT/act-place_a2b_left-run2/demo_clean-50/ |
1 |
0.059 |
ACT/act-move_can_pot/demo_clean-50/ |
0 |
— |
ACT/act-move_can_pot-run2/demo_clean-50/ |
1 |
0.036 |
ACT/act-handover_block-run2/demo_clean-50/ |
1 |
0.030 |
ACT/act-put_bottles_dustbin-run2/demo_clean-50/ |
1 |
0.032 |
Each checkpoint directory contains:
policy_best.ckpt — best validation loss checkpoint
policy_last.ckpt — final epoch checkpoint
policy_epoch_{2000,4000,5000,6000}_seed_{0,1}.ckpt — intermediate checkpoints
dataset_stats.pkl — normalization statistics
Pi0.5 LoRA (place_phone_stand only)
Fine-tuned from gs://openpi-assets/checkpoints/pi05_base/params using the openpi framework.
Architecture
| Param |
Value |
| Base model |
Pi0.5 (3B params) |
| PaliGemma variant |
gemma_2b_lora |
| Action expert variant |
gemma_300m_lora |
| Fine-tuning method |
LoRA |
Training
| Param |
Value |
| Batch size |
32 |
| Total steps |
20,000 (trained to 9,000) |
| Save interval |
200 steps |
| XLA memory fraction |
0.45 (64 GB pool on H200) |
| GPU |
NVIDIA H200 (143 GB VRAM) |
Checkpoints
| Path |
Step |
pi05_lora/place_phone_stand/step_5000/ |
5,000 |
pi05_lora/place_phone_stand/step_9000/ |
9,000 |
Environment
- Framework: RoboTwin2.0
- Simulator: SAPIEN with Vulkan rendering
- GPU: NVIDIA H200 SXM (143 GB VRAM)
- CUDA: 12.8