| # Bootstrap backbones |
|
|
| DA3-Giant backbone weight used by `DA3GiantEncoder.__init__` to instantiate |
| the Stage 1 model before our finetuned `student_da3` state_dict is loaded |
| on top. This file is the same one referenced as `stage_1.ckpt_path` in |
| every training config in this lineage. |
| |
| ## Files |
| |
| - `track4world_da3.pth` (~5.2 GB) — DA3-Giant multi-view backbone weights. |
| Load with `torch.load(map_location='cpu')`. Used only at model |
| instantiation; the finetuned `student_da3` weights inside any |
| `franka_multitask_v1/*/0XXXXXX.pt` checkpoint override these on |
| `load_state_dict`. |
|
|
| ## Other dependencies (NOT in this repo — fetch from public HF) |
|
|
| - `google-t5/t5-base` (~900 MB): language encoder used by the shallow12 AR |
| predictor (`predictor.language_encoder_type: t5`). |
| - `openai/clip-vit-large-patch14` (~1.7 GB): only referenced in the config; |
| the multi-task finetune actually routes through T5, so CLIP weights are |
| loaded but unused at inference. Safe to skip on bandwidth-constrained |
| deploy hosts. |
|
|
| Both download automatically on first `transformers`/`huggingface_hub` call; |
| configure `HF_HOME` if the deploy host needs an offline mirror. |
|
|
| ## Deploy load order |
|
|
| ```python |
| # 1. Instantiate DA3GiantEncoder with this backbone bootstrap. |
| encoder = DA3GiantEncoder( |
| ckpt_path="/local/track4world_da3.pth", |
| ..., |
| ) |
| # 2. Strict-load the finetuned student weights on top. |
| finetune = torch.load("/local/franka_multitask_0010000.pt", map_location="cpu") |
| encoder.load_state_dict(finetune["student_da3"], strict=True) |
| ``` |
|
|
| See `docs/realrobot-franka-deploy-handoff.md` in |
| [ONground-Korea/3DA](https://github.com/ONground-Korea/3DA) for the full |
| deploy spec. |
|
|