# RoboTwin2 Checkpoints ACT, and pi0.5 single-task finetuning using B200 GPU on [RoboTwin2.0](https://github.com/TianxingChen/RoboTwin) dataset. ## The policies were trained on the following Tasks: - `place_phone_stand` - `place_a2b_left` - `move_can_pot` - `handover_block` - `put_bottles_dustbin` ## Data - **Demonstrations:** 50 `demo_clean` episodes per task - **Embodiment:** aloha-agilex (dual-arm) - **Action dim:** 14 (6 DOF × 2 arms + 2 grippers) - **Cameras:** `cam_high`, `cam_right_wrist`, `cam_left_wrist` --- ## ACT (Action Chunking Transformers) ### Architecture | Param | Value | |---|---| | Backbone | ResNet-18 | | Hidden dim | 512 | | Feedforward dim | 3200 | | Attention heads | 8 | | Encoder layers | 4 | | Decoder layers | 7 | | Chunk size | 50 | | KL weight | 10 | | Action dim | 14 | | Dropout | 0.1 | | Parameters | ~83.9M | ### Training | Param | Value | |---|---| | Batch size | 8 | | Epochs | 6000 | | Learning rate | 1e-5 | | LR backbone | 1e-5 | | Weight decay | 1e-4 | | Optimizer | AdamW | | Save freq | every 2000 epochs | ### Checkpoints | Path | Seed | Val Loss | |---|---|---| | `ACT/act-place_phone_stand/demo_clean-50/` | 0 | — | | `ACT/act-place_phone_stand-run2/demo_clean-50/` | 1 | 0.038 | | `ACT/act-place_a2b_left/demo_clean-50/` | 0 | — | | `ACT/act-place_a2b_left-run2/demo_clean-50/` | 1 | 0.059 | | `ACT/act-move_can_pot/demo_clean-50/` | 0 | — | | `ACT/act-move_can_pot-run2/demo_clean-50/` | 1 | 0.036 | | `ACT/act-handover_block-run2/demo_clean-50/` | 1 | 0.030 | | `ACT/act-put_bottles_dustbin-run2/demo_clean-50/` | 1 | 0.032 | Each checkpoint directory contains: - `policy_best.ckpt` — best validation loss checkpoint - `policy_last.ckpt` — final epoch checkpoint - `policy_epoch_{2000,4000,5000,6000}_seed_{0,1}.ckpt` — intermediate checkpoints - `dataset_stats.pkl` — normalization statistics --- ## Pi0.5 LoRA (place_phone_stand only) Fine-tuned from `gs://openpi-assets/checkpoints/pi05_base/params` using the [openpi](https://github.com/Physical-Intelligence/openpi) framework. ### Architecture | Param | Value | |---|---| | Base model | Pi0.5 (3B params) | | PaliGemma variant | `gemma_2b_lora` | | Action expert variant | `gemma_300m_lora` | | Fine-tuning method | LoRA | ### Training | Param | Value | |---|---| | Batch size | 32 | | Total steps | 20,000 (trained to 9,000) | | Save interval | 200 steps | | XLA memory fraction | 0.45 (64 GB pool on H200) | | GPU | NVIDIA H200 (143 GB VRAM) | ### Checkpoints | Path | Step | |---|---| | `pi05_lora/place_phone_stand/step_5000/` | 5,000 | | `pi05_lora/place_phone_stand/step_9000/` | 9,000 | --- ## Environment - **Framework:** [RoboTwin2.0](https://github.com/TianxingChen/RoboTwin) - **Simulator:** SAPIEN with Vulkan rendering - **GPU:** NVIDIA H200 SXM (143 GB VRAM) - **CUDA:** 12.8