scene-model-6layer / README.md
haoz19's picture
Add gate request form + Access section
034bdbf verified
metadata
license: mit
tags:
  - 3d
  - depth-estimation
  - multilayer-depth
  - point-cloud
  - diffusion
  - scene-understanding
library_name: torch
pipeline_tag: image-to-3d
extra_gated_heading: Request access to the World Tracing scene model
extra_gated_description: >
  These checkpoints are released for research and product experimentation under
  the **MIT license**.  Please share a few details below so we can keep a light
  audit trail of how the weights are used in the wild.  Requests are reviewed
  manually, typically within **1-3 business days**.
extra_gated_button_content: Submit access request
extra_gated_fields:
  Full name: text
  Affiliation (university / company): text
  Country: country
  Primary intended use:
    type: select
    options:
      - Academic research
      - Personal / hobbyist project
      - Industrial research
      - Commercial product
      - Other
  Brief description of your intended use: text
  I agree to cite the World Tracing paper in any publication or release that uses these weights: checkbox

World Tracing — Scene Model (6-layer, r69e)

Access

The checkpoints in this repo are released under the MIT license, but downloads are gated so we can keep a light audit trail of how the model is used. To download:

  1. Scroll up and fill in the "Submit access request" form (basic contact info + a short note on intended use).
  2. We review every request manually, usually within 1-3 business days. You will receive an email from Hugging Face once your request is approved.
  3. After approval, log in with huggingface-cli login (or set HF_TOKEN) and run any of the inference examples from the GitHub repo — the wt package picks the token up automatically and --ckpt r75b / r69e / r76 triggers a normal hf_hub_download.

Note: this is a manual review flow, not an auto-approve click-through. We read every request individually, so please give a one-line description of what you plan to use the weights for.

EMA-only release weights for the r69e scene model from World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible.

  • Repo: https://github.com/haoz19/world-tracing
  • Project page: https://haoz19.github.io/world-tracing-page/
  • Config name: r69e (and r69g, same architecture, different XYZ normalisation mode)
  • Architecture: MultilayerXYZModel, 1.5 B params
  • Input: 504 × 504 full-frame RGB (no alpha)
  • Output: per-layer XYZ in camera space, 6 stacked depth maps
  • Training data: Evermotion + IT-Happy indoor renders. The model is trained on indoor renders without sky — pre-mask the sky externally for outdoor inputs.

Files

File Size Format
model.pt 5.59 GB bare state_dict, float32

EMA weights only — ~26 % of the original training checkpoint.

Usage

git clone https://github.com/haoz19/world-tracing
cd world-tracing
pip install -e ".[viz]"

python examples/infer_scene.py \
    --image  examples/test_images/scene/scene_indoor_01_modern_living_room__seed42.png \
    --ckpt   r69e \
    --config r69e \
    --out    /tmp/wt_scene.rrd

Bare --ckpt r69e (or --ckpt r69g) triggers huggingface_hub.hf_hub_download against this repo.

Citation

@misc{zhang2026worldtracing,
  title         = {World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible},
  author        = {Hao Zhang and Mohamed El Banani and Jen-Hao Cheng and Paul Zhang
                   and Yi Hua and Ben Mildenhall and Christoph Lassner
                   and Narendra Ahuja and Gengshan Yang},
  year          = {2026},
  eprint        = {TODO},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV}
}

License

MIT — see the GitHub repo.