Update README.md

338ca90 verified about 1 month ago

2.83 kB

	# RoboTwin2 Checkpoints

	ACT, and pi0.5 single-task finetuning using B200 GPU on [RoboTwin2.0](https://github.com/TianxingChen/RoboTwin) dataset.

	## The policies were trained on the following Tasks:
	- `place_phone_stand`
	- `place_a2b_left`
	- `move_can_pot`
	- `handover_block`
	- `put_bottles_dustbin`

	## Data
	- Demonstrations: 50 `demo_clean` episodes per task
	- Embodiment: aloha-agilex (dual-arm)
	- Action dim: 14 (6 DOF × 2 arms + 2 grippers)
	- Cameras: `cam_high`, `cam_right_wrist`, `cam_left_wrist`

	---

	## ACT (Action Chunking Transformers)

	### Architecture
	\| Param \| Value \|
	\|---\|---\|
	\| Backbone \| ResNet-18 \|
	\| Hidden dim \| 512 \|
	\| Feedforward dim \| 3200 \|
	\| Attention heads \| 8 \|
	\| Encoder layers \| 4 \|
	\| Decoder layers \| 7 \|
	\| Chunk size \| 50 \|
	\| KL weight \| 10 \|
	\| Action dim \| 14 \|
	\| Dropout \| 0.1 \|
	\| Parameters \| ~83.9M \|

	### Training
	\| Param \| Value \|
	\|---\|---\|
	\| Batch size \| 8 \|
	\| Epochs \| 6000 \|
	\| Learning rate \| 1e-5 \|
	\| LR backbone \| 1e-5 \|
	\| Weight decay \| 1e-4 \|
	\| Optimizer \| AdamW \|
	\| Save freq \| every 2000 epochs \|

	### Checkpoints
	\| Path \| Seed \| Val Loss \|
	\|---\|---\|---\|
	\| `ACT/act-place_phone_stand/demo_clean-50/` \| 0 \| — \|
	\| `ACT/act-place_phone_stand-run2/demo_clean-50/` \| 1 \| 0.038 \|
	\| `ACT/act-place_a2b_left/demo_clean-50/` \| 0 \| — \|
	\| `ACT/act-place_a2b_left-run2/demo_clean-50/` \| 1 \| 0.059 \|
	\| `ACT/act-move_can_pot/demo_clean-50/` \| 0 \| — \|
	\| `ACT/act-move_can_pot-run2/demo_clean-50/` \| 1 \| 0.036 \|
	\| `ACT/act-handover_block-run2/demo_clean-50/` \| 1 \| 0.030 \|
	\| `ACT/act-put_bottles_dustbin-run2/demo_clean-50/` \| 1 \| 0.032 \|

	Each checkpoint directory contains:
	- `policy_best.ckpt` — best validation loss checkpoint
	- `policy_last.ckpt` — final epoch checkpoint
	- `policy_epoch_{2000,4000,5000,6000}_seed_{0,1}.ckpt` — intermediate checkpoints
	- `dataset_stats.pkl` — normalization statistics

	---

	## Pi0.5 LoRA (place_phone_stand only)

	Fine-tuned from `gs://openpi-assets/checkpoints/pi05_base/params` using the [openpi](https://github.com/Physical-Intelligence/openpi) framework.

	### Architecture
	\| Param \| Value \|
	\|---\|---\|
	\| Base model \| Pi0.5 (3B params) \|
	\| PaliGemma variant \| `gemma_2b_lora` \|
	\| Action expert variant \| `gemma_300m_lora` \|
	\| Fine-tuning method \| LoRA \|

	### Training
	\| Param \| Value \|
	\|---\|---\|
	\| Batch size \| 32 \|
	\| Total steps \| 20,000 (trained to 9,000) \|
	\| Save interval \| 200 steps \|
	\| XLA memory fraction \| 0.45 (64 GB pool on H200) \|
	\| GPU \| NVIDIA H200 (143 GB VRAM) \|

	### Checkpoints
	\| Path \| Step \|
	\|---\|---\|
	\| `pi05_lora/place_phone_stand/step_5000/` \| 5,000 \|
	\| `pi05_lora/place_phone_stand/step_9000/` \| 9,000 \|

	---

	## Environment
	- Framework: [RoboTwin2.0](https://github.com/TianxingChen/RoboTwin)
	- Simulator: SAPIEN with Vulkan rendering
	- GPU: NVIDIA H200 SXM (143 GB VRAM)
	- CUDA: 12.8

	# RoboTwin2 Checkpoints

	ACT, and pi0.5 single-task finetuning using B200 GPU on [RoboTwin2.0](https://github.com/TianxingChen/RoboTwin) dataset.

	## The policies were trained on the following Tasks:
	- `place_phone_stand`
	- `place_a2b_left`
	- `move_can_pot`
	- `handover_block`
	- `put_bottles_dustbin`

	## Data
	- Demonstrations: 50 `demo_clean` episodes per task
	- Embodiment: aloha-agilex (dual-arm)
	- Action dim: 14 (6 DOF × 2 arms + 2 grippers)
	- Cameras: `cam_high`, `cam_right_wrist`, `cam_left_wrist`

	---

	## ACT (Action Chunking Transformers)

	### Architecture
	\| Param \| Value \|
	\|---\|---\|
	\| Backbone \| ResNet-18 \|
	\| Hidden dim \| 512 \|
	\| Feedforward dim \| 3200 \|
	\| Attention heads \| 8 \|
	\| Encoder layers \| 4 \|
	\| Decoder layers \| 7 \|
	\| Chunk size \| 50 \|
	\| KL weight \| 10 \|
	\| Action dim \| 14 \|
	\| Dropout \| 0.1 \|
	\| Parameters \| ~83.9M \|

	### Training
	\| Param \| Value \|
	\|---\|---\|
	\| Batch size \| 8 \|
	\| Epochs \| 6000 \|
	\| Learning rate \| 1e-5 \|
	\| LR backbone \| 1e-5 \|
	\| Weight decay \| 1e-4 \|
	\| Optimizer \| AdamW \|
	\| Save freq \| every 2000 epochs \|

	### Checkpoints
	\| Path \| Seed \| Val Loss \|
	\|---\|---\|---\|
	\| `ACT/act-place_phone_stand/demo_clean-50/` \| 0 \| — \|
	\| `ACT/act-place_phone_stand-run2/demo_clean-50/` \| 1 \| 0.038 \|
	\| `ACT/act-place_a2b_left/demo_clean-50/` \| 0 \| — \|
	\| `ACT/act-place_a2b_left-run2/demo_clean-50/` \| 1 \| 0.059 \|
	\| `ACT/act-move_can_pot/demo_clean-50/` \| 0 \| — \|
	\| `ACT/act-move_can_pot-run2/demo_clean-50/` \| 1 \| 0.036 \|
	\| `ACT/act-handover_block-run2/demo_clean-50/` \| 1 \| 0.030 \|
	\| `ACT/act-put_bottles_dustbin-run2/demo_clean-50/` \| 1 \| 0.032 \|

	Each checkpoint directory contains:
	- `policy_best.ckpt` — best validation loss checkpoint
	- `policy_last.ckpt` — final epoch checkpoint
	- `policy_epoch_{2000,4000,5000,6000}_seed_{0,1}.ckpt` — intermediate checkpoints
	- `dataset_stats.pkl` — normalization statistics

	---

	## Pi0.5 LoRA (place_phone_stand only)

	Fine-tuned from `gs://openpi-assets/checkpoints/pi05_base/params` using the [openpi](https://github.com/Physical-Intelligence/openpi) framework.

	### Architecture
	\| Param \| Value \|
	\|---\|---\|
	\| Base model \| Pi0.5 (3B params) \|
	\| PaliGemma variant \| `gemma_2b_lora` \|
	\| Action expert variant \| `gemma_300m_lora` \|
	\| Fine-tuning method \| LoRA \|

	### Training
	\| Param \| Value \|
	\|---\|---\|
	\| Batch size \| 32 \|
	\| Total steps \| 20,000 (trained to 9,000) \|
	\| Save interval \| 200 steps \|
	\| XLA memory fraction \| 0.45 (64 GB pool on H200) \|
	\| GPU \| NVIDIA H200 (143 GB VRAM) \|

	### Checkpoints
	\| Path \| Step \|
	\|---\|---\|
	\| `pi05_lora/place_phone_stand/step_5000/` \| 5,000 \|
	\| `pi05_lora/place_phone_stand/step_9000/` \| 9,000 \|

	---

	## Environment
	- Framework: [RoboTwin2.0](https://github.com/TianxingChen/RoboTwin)
	- Simulator: SAPIEN with Vulkan rendering
	- GPU: NVIDIA H200 SXM (143 GB VRAM)
	- CUDA: 12.8