Bootstrap backbones
DA3-Giant backbone weight used by DA3GiantEncoder.__init__ to instantiate
the Stage 1 model before our finetuned student_da3 state_dict is loaded
on top. This file is the same one referenced as stage_1.ckpt_path in
every training config in this lineage.
Files
track4world_da3.pth(~5.2 GB) — DA3-Giant multi-view backbone weights. Load withtorch.load(map_location='cpu'). Used only at model instantiation; the finetunedstudent_da3weights inside anyfranka_multitask_v1/*/0XXXXXX.ptcheckpoint override these onload_state_dict.
Other dependencies (NOT in this repo — fetch from public HF)
google-t5/t5-base(~900 MB): language encoder used by the shallow12 AR predictor (predictor.language_encoder_type: t5).openai/clip-vit-large-patch14(~1.7 GB): only referenced in the config; the multi-task finetune actually routes through T5, so CLIP weights are loaded but unused at inference. Safe to skip on bandwidth-constrained deploy hosts.
Both download automatically on first transformers/huggingface_hub call;
configure HF_HOME if the deploy host needs an offline mirror.
Deploy load order
# 1. Instantiate DA3GiantEncoder with this backbone bootstrap.
encoder = DA3GiantEncoder(
ckpt_path="/local/track4world_da3.pth",
...,
)
# 2. Strict-load the finetuned student weights on top.
finetune = torch.load("/local/franka_multitask_0010000.pt", map_location="cpu")
encoder.load_state_dict(finetune["student_da3"], strict=True)
See docs/realrobot-franka-deploy-handoff.md in
ONground-Korea/3DA for the full
deploy spec.