metadata
license: mit
tags:
- 3d
- depth-estimation
- multilayer-depth
- point-cloud
- diffusion
- video
- dynamic-scene
library_name: torch
pipeline_tag: image-to-3d
extra_gated_heading: Request access to the World Tracing dynamic model
extra_gated_description: >
These checkpoints are released for research and product experimentation under
the **MIT license**. Please share a few details below so we can keep a light
audit trail of how the weights are used in the wild. Requests are reviewed
manually, typically within **1-3 business days**.
extra_gated_button_content: Submit access request
extra_gated_fields:
Full name: text
Affiliation (university / company): text
Country: country
Primary intended use:
type: select
options:
- Academic research
- Personal / hobbyist project
- Industrial research
- Commercial product
- Other
Brief description of your intended use: text
I agree to cite the World Tracing paper in any publication or release that uses these weights: checkbox
World Tracing — Dynamic Model (16-frame video, r76)
Access
The checkpoints in this repo are released under the MIT license, but downloads are gated so we can keep a light audit trail of how the model is used. To download:
- Scroll up and fill in the "Submit access request" form (basic contact info + a short note on intended use).
- We review every request manually, usually within 1-3 business days. You will receive an email from Hugging Face once your request is approved.
- After approval, log in with
huggingface-cli login(or setHF_TOKEN) and run any of the inference examples from the GitHub repo — thewtpackage picks the token up automatically and--ckpt r75b/r69e/r76triggers a normalhf_hub_download.
Note: this is a manual review flow, not an auto-approve click-through. We read every request individually, so please give a one-line description of what you plan to use the weights for.
EMA-only release weights for the r76 dynamic-video model from World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible.
- Repo: https://github.com/haoz19/world-tracing
- Project page: https://haoz19.github.io/world-tracing-page/
- Config name:
r76 - Architecture:
MultilayerXYZModelwith temporal attention blocks, 2.1 B params - Input: 16 frames × 336 × 336 RGBA (single shared crop across the clip)
- Output: per-frame, per-layer XYZ; 16 stacked time-steps × 6 depth layers
- Training data: dynamic-object synthetic clips + curated real-world dynamic clips
Files
| File | Size | Format |
|---|---|---|
model.pt |
7.80 GB | bare state_dict, float32 |
EMA weights only — ~26 % of the original training checkpoint.
Usage
git clone https://github.com/haoz19/world-tracing
cd world-tracing
pip install -e ".[viz]"
python examples/infer_video.py \
--image_dir examples/test_images/dynamic/davis__camel/ \
--ckpt r76 \
--config r76 \
--out /tmp/wt_camel.rrd
Bare --ckpt r76 triggers huggingface_hub.hf_hub_download against
this repo. The clip directory must contain 16 frames (or pass
--frame_indices "0,2,4,..." to subsample).
Citation
@misc{zhang2026worldtracing,
title = {World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible},
author = {Hao Zhang and Mohamed El Banani and Jen-Hao Cheng and Paul Zhang
and Yi Hua and Ben Mildenhall and Christoph Lassner
and Narendra Ahuja and Gengshan Yang},
year = {2026},
eprint = {TODO},
archivePrefix = {arXiv},
primaryClass = {cs.CV}
}
License
MIT — see the GitHub repo.