onground
/

ckpt

Model card Files Files and versions

ckpt / backbones /README.md

onground's picture

Upload backbones/README.md with huggingface_hub

300c981 verified 11 days ago

|

history blame contribute delete

1.68 kB

	# Bootstrap backbones

	DA3-Giant backbone weight used by `DA3GiantEncoder.__init__` to instantiate
	the Stage 1 model before our finetuned `student_da3` state_dict is loaded
	on top. This file is the same one referenced as `stage_1.ckpt_path` in
	every training config in this lineage.

	## Files

	- `track4world_da3.pth` (~5.2 GB) — DA3-Giant multi-view backbone weights.
	Load with `torch.load(map_location='cpu')`. Used only at model
	instantiation; the finetuned `student_da3` weights inside any
	`franka_multitask_v1/*/0XXXXXX.pt` checkpoint override these on
	`load_state_dict`.

	## Other dependencies (NOT in this repo — fetch from public HF)

	- `google-t5/t5-base` (~900 MB): language encoder used by the shallow12 AR
	predictor (`predictor.language_encoder_type: t5`).
	- `openai/clip-vit-large-patch14` (~1.7 GB): only referenced in the config;
	the multi-task finetune actually routes through T5, so CLIP weights are
	loaded but unused at inference. Safe to skip on bandwidth-constrained
	deploy hosts.

	Both download automatically on first `transformers`/`huggingface_hub` call;
	configure `HF_HOME` if the deploy host needs an offline mirror.

	## Deploy load order

	```python
	# 1. Instantiate DA3GiantEncoder with this backbone bootstrap.
	encoder = DA3GiantEncoder(
	ckpt_path="/local/track4world_da3.pth",
	...,
	)
	# 2. Strict-load the finetuned student weights on top.
	finetune = torch.load("/local/franka_multitask_0010000.pt", map_location="cpu")
	encoder.load_state_dict(finetune["student_da3"], strict=True)
	```

	See `docs/realrobot-franka-deploy-handoff.md` in
	[ONground-Korea/3DA](https://github.com/ONground-Korea/3DA) for the full
	deploy spec.