akrao9
/

VGGT-LACT

Depth Estimation

test-time-training

Model card Files Files and versions

VGGT-LACT / README.md

akrao9's picture

Upload README.md with huggingface_hub

f0826bb verified 12 days ago

|

history blame contribute delete

3.14 kB

	---
	license: apache-2.0
	library_name: pytorch
	base_model: facebook/VGGT-1B
	tags:
	- vggt
	- depth-estimation
	- 3d-vision
	- camera-pose
	- test-time-training
	- lact
	pipeline_tag: depth-estimation
	---

	# VGGT LaCT (stage 1) — slim adapter weights

	These files are LaCT-block weights only (~200 MB), not a full VGGT checkpoint. They plug into the public [facebook/VGGT-1B](https://huggingface.co/facebook/VGGT-1B) backbone: DINOv2 patch embed, frame-wise attention, and prediction heads stay at Meta’s pretrained VGGT-1B; only the global-attention layers are replaced by LaCT-style fast-weight GLU blocks trained with stage-1 distillation against the frozen teacher.

	Code: [github.com/Akrao9/vggt_ttt](https://github.com/Akrao9/vggt_ttt) (install `vggt` from [facebookresearch/vggt](https://github.com/facebookresearch/vggt) as in that README).

	## Files

	\| File \| Description \|
	\|------\|-------------\|
	\| `vggt_ttt_lact_stage1.pt` \| Stage 1 distilled LaCT state dict (`lact_state_dict()` format). Keys are prefixed with `aggregator.lact_blocks.`. \|

	## Load (Python)

	```python
	import torch
	from huggingface_hub import hf_hub_download

	# From the vggt_ttt repo (with `vggt` installed per upstream README):
	from model.vggt_ttt import VGGT_TTT
	from model.io_utils import torch_load_checkpoint

	ckpt_path = hf_hub_download("akrao9/VGGT-LACT", "vggt_ttt_lact_stage1.pt")
	device = "cuda"
	model = VGGT_TTT.from_pretrained("facebook/VGGT-1B", chunk_size=16).to(device).eval()
	state = torch_load_checkpoint(ckpt_path, map_location=device)
	model.load_lact_state_dict(state, strict=True)
	```

	Use a local path instead of `hf_hub_download` if you already downloaded the `.pt` file.

	## Inference CLI

	From the [vggt_ttt](https://github.com/Akrao9/vggt_ttt) repo, after downloading this checkpoint locally:

	```bash
	python scripts/run_inference.py \
	--input path/to/video.mp4 --fps 2 \
	--checkpoint ./vggt_ttt_lact_stage1.pt \
	--out ./out
	```

	(`--checkpoint` accepts this LaCT-only dict; see `scripts/run_inference.py`.)

	## Training summary

	- Stage 1: distillation from frozen `facebook/VGGT-1B` (pose / depth / world points), trainable parameters confined to the 24 LaCT blocks; `c_proj` zero-init for a near-identity start.
	- Checkpoints: saved with `torch.save(model.lact_state_dict(), path)` — same tensor layout as this Hub file.

	## Hardware / scaling

	LaCT path is aimed at longer frame sequences with more favorable VRAM scaling than full global attention; see the GitHub README for benchmark tables (DL3DV-style eval).

	## License and attribution

	- This adapter repository and the training code release are under Apache 2.0 (see project `LICENSE` / `NOTICE` on GitHub).
	- VGGT-1B is subject to Meta’s license and terms on its model card; you must comply with those when using the backbone.
	- Method builds on VGGT and LaCT-style components as described in the upstream README.

	## Citation

	If you use these weights or the [vggt_ttt](https://github.com/Akrao9/vggt_ttt) codebase, cite the original VGGT paper/repo and credit this adapter as appropriate for your venue.