Hebbian-Robotics
/

pi05_subtask

+---
+license: apache-2.0
+language:
+- en
+tags:
+- robotics
+- vla
+- pi05
+- subtask
+- openpi
+- lerobot
+- orbax
+datasets:
+- physical-intelligence/libero
+pipeline_tag: robotics
+---
+# pi0.5 subtask fine-tune
+A 100-step fine-tune of `pi05_base` for subtask generation from the original [pi05 paper](https://www.pi.website/download/pi05.pdf).
+We reproduced steps from a community issue thread on openpi that studies this [#701](https://github.com/Physical-Intelligence/openpi/issues/701).
+## TL;DR
+- **Start weights**: `gs://openpi-assets/checkpoints/pi05_base/params`
+- **Config**: `pi05_subtask_libero` (adds `Pi05Subtask` head: joint flow-matching + CE-on-subtask-tokens loss)
+- **Training**: 100 steps × batch 8 on 30 LIBERO episodes, 1× H100 on Modal
+- **Final loss**: 3.04 → 0.23
+## Loading
+```python
+from pathlib import Path
+import jax
+import jax.numpy as jnp
+import flax.nnx as nnx
+from huggingface_hub import hf_hub_download
+import tarfile
+from openpi.models import model as _model
+from openpi.models.pi0 import Pi0
+from openpi.models.pi0_config import Pi0Config
+# 1. Download + extract
+tar = hf_hub_download("swatery/pi05-subtask",
+                     "jax/pi05_subtask.tar")
+tarfile.open(tar).extractall(".")
+ckpt = Path("99")
+# 2. Build model and restore weights
+config = Pi0Config(pi05=True)
+model = config.create(jax.random.key(0))
+params = _model.restore_params(ckpt / "params", dtype=jnp.bfloat16)
+nnx.update(model, nnx.State(params))
+model.eval()
+```
+For end-to-end subtask generation (JIT-compiled AR decode with ASCII vocab mask over PaliGemma's LM head), see the `SubtaskGenerator` implementation in [openpi/hosting](https://github.com/Hebbian-Robotics/openpi) `src/hosting/subtask_generator.py`.
+That module loads a checkpoint like this one and calls `.generate(prompt, images)`.
+## Training details
+| | |
+|---|---|
+| Architecture | pi0.5 — PaliGemma + Gemma action expert, with `Pi05Subtask` head |
+| Loss | Flow-matching (action) + cross-entropy (subtask tokens) |
+| Knowledge insulation | Yes — LM backbone receives only CE gradients |
+| Steps | 100 |
+| Batch size | 8 (global, single device) |
+| Optimizer | AdamW, cosine schedule, peak LR 5e-5, warmup 10k (only 100 steps used, so effectively constant warmup) |
+| EMA decay | 0.999 |
+| Precision | bfloat16 |
+| Hardware | 1× NVIDIA H100 80GB (Modal) |
+| Wall-clock | ~10 min training + ~5 min data/weight fetch |
+### Data
+- **Dataset**: first 30 episodes of `physical-intelligence/libero` chunk-000 (~8,294 frames)
+- **Norm stats**: reused `pi05_libero`'s precomputed full-dataset stats from `gs://openpi-assets/checkpoints/pi05_libero/assets/`
+- **Subtask annotation**: **identity** — `high_prompt = low_prompt = task_prompt`
+  (real hierarchical subtask annotations for LIBERO are not publicly available)
+## References
+- https://www.pi.website/blog/pi05
+- https://github.com/Physical-Intelligence/openpi (upstream pi0.5 implementation)
+- https://github.com/Physical-Intelligence/openpi/issues/701 (community issue thread reproducing subtask generation)
+- https://github.com/LisavilaLee/openpi_with_subtask (fork with training example)
+## License
+- Code & fine-tuned weights: Apache 2.0 (inherited from openpi)
+- Gemma dependency: this checkpoint is derived from Google's Gemma via PaliGemma. Usage is subject to the Gemma Terms of Use in addition to Apache 2.0.