PaGeR / README.md

Drop iCity attribution from model card (now lives in dataset card only)

5e77ca5 verified 2 days ago

7.09 kB

	---
	license: cc-by-nc-4.0
	tags:
	- depth-estimation
	- surface-normals
	- panoramic-images
	- equirectangular
	- high-resolution
	- computer-vision
	- in-the-wild
	- zero-shot
	pipeline_tag: depth-estimation
	---

	<h1 align="center"> 📟 PaGeR — Unified Panoramic Geometry Estimation Model Card</h1>

	<p align="center">
	<a title="Github" href="https://github.com/prs-eth/PaGeR" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
	<img src="https://img.shields.io/github/stars/prs-eth/PaGeR?label=GitHub%20%E2%98%85&logo=github&color=C8C" alt="Github">
	</a>
	<a title="Website" href="https://prs-eth.github.io/PaGeR/" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
	<img src="https://img.shields.io/badge/%E2%99%A5%20Project%20-Website-blue" alt="Website">
	</a>
	<a title="arXiv" href="https://arxiv.org/abs/TBD" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
	<img src="https://img.shields.io/badge/%F0%9F%93%84%20Read%20-Paper-AF3436" alt="arXiv">
	</a>
	<a title="Hugging Face" href="https://huggingface.co/spaces/prs-eth/PaGeR" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
	<img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-FFD21E" alt="Hugging Face Spaces">
	</a>
	<a title="PanoInfinigen dataset" href="https://huggingface.co/datasets/prs-eth/PanoInfinigen" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
	<img src="https://img.shields.io/badge/Dataset-PanoInfinigen-7e57c2" alt="PanoInfinigen dataset">
	</a>
	<a title="License" href="LICENSE" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
	<img src="https://img.shields.io/badge/License-CC%20BY--NC%204.0-yellowgreen" alt="License">
	</a>
	</p>

	`PaGeR` is the unified geometry-estimation checkpoint released with our paper:

	- Paper: Unified Panoramic Geometry Estimation via Multi-View Foundation Models — [arXiv (TBD)](https://arxiv.org/abs/TBD)

	From a single equirectangular (ERP) panorama, one forward pass returns:

	- Scale-invariant depth at full panoramic resolution,
	- Metric depth in metres via a parallel coarse scale head,
	- Surface normals as unit vectors in the panorama's world frame,
	- Sky segmentation for masking unbounded depth regions.

	Indoor and outdoor scenes are served by twin scale heads selected at inference time by a lightweight Places365 classifier, so a single checkpoint covers both regimes.

	You can also browse the rest of our [PaGeR HF collection](https://huggingface.co/collections/prs-eth/pager) or try the [interactive demo](https://huggingface.co/spaces/prs-eth/PaGeR).

	## Model Details

	- Developed by: [Vukasin Bozic](https://vulus98.github.io/), [Isidora Slavkovic](https://linkedin.com/in/isidora-slavkovic), [Dominik Narnhofer](https://scholar.google.com/citations?user=tFx8AhkAAAAJ&hl=en), [Nando Metzger](https://nandometzger.github.io/), [Denis Rozumny](https://rozumden.github.io/), [Konrad Schindler](https://scholar.google.com/citations?user=FZuNgqIAAAAJ), [Nikolai Kalischek](https://scholar.google.com/citations?user=XwzlnZoAAAAJ&hl=de).
	- Model type: Feed-forward, multi-view foundation-model adaptation for single-image panoramic geometry estimation (depth + normals + sky + metric scale).
	- Backbone: [Depth Anything 3](https://github.com/ByteDance-Seed/Depth-Anything-3) (`da3-giant`, ViT-Giant), repurposed for cubemap-based multi-view processing of the panorama.
	- Inputs: A single ERP panorama, internally projected onto a 6-face cubemap at 504 px per face.
	- Outputs (in one forward pass):
	- Scale-invariant depth map at panoramic resolution.
	- Metric depth (metres), produced by combining the depth map with the selected indoor / outdoor scale head.
	- Surface normals as unit vectors in the panorama's world frame.
	- Sky mask for filling/masking unbounded regions in the depth and normal outputs.
	- Indoor / outdoor routing: A lightweight Places365 classifier auto-selects between the twin scale heads at inference time; the routing can be overridden by the user (`--scene_mode {auto,indoor,outdoor}`).
	- Resolution: Designed for high-resolution ERP inputs, up to 3K.
	- License: [CC BY-NC 4.0](LICENSE) — academic / non-commercial use only. The released weights are derivative works of the [Depth Anything 3](https://github.com/ByteDance-Seed/Depth-Anything-3) `da3-giant` backbone, released by ByteDance under CC BY-NC 4.0, and inherit that restriction. Commercial use is not permitted.
	- Resources for more information: [Project Website](https://prs-eth.github.io/PaGeR/), [Paper](https://arxiv.org/abs/TBD), [Code](https://github.com/prs-eth/PaGeR).

	### Other released checkpoints

	\| Checkpoint \| Hugging Face id \| Depth \| Normals \| Sky \|
	\|---\|---\|---\|---\|---\|
	\| PaGeR (this card, recommended) \| [`prs-eth/PaGeR`](https://huggingface.co/prs-eth/PaGeR) \| ✅ \| ✅ \| ✅ \|
	\| PaGeR-Metric-Depth \| [`prs-eth/PaGeR-metric-depth`](https://huggingface.co/prs-eth/PaGeR-metric-depth) \| ✅ (metric) \| \| \|
	\| PaGeR-Normals \| [`prs-eth/PaGeR-normals`](https://huggingface.co/prs-eth/PaGeR-normals) \| \| ✅ \| \|

	## Usage

	A minimal Python snippet that runs the unified model on a single panorama:

	```python
	from pathlib import Path

	import matplotlib.pyplot as plt
	import numpy as np
	import torch
	from huggingface_hub import hf_hub_download
	from omegaconf import OmegaConf
	from PIL import Image

	from src.pager import Pager
	from src.utils.geometry_utils import erp_to_cubemap
	from src.utils.utils import prepare_depth_for_logging, prepare_normals_for_logging

	checkpoint = "prs-eth/PaGeR" # or a local directory
	device = torch.device("cuda")

	config_path = hf_hub_download(repo_id=checkpoint, filename="config.yaml")
	cfg = OmegaConf.load(config_path)

	pager = Pager(checkpoint, cfg=cfg, device=device)
	pager.get_intrinsics_extrinsics(image_size=cfg.face_size, fov=getattr(cfg, "cube_fov", 90.0))
	pager.model.to(device).eval()

	panorama = np.array(Image.open("examples/example_1.jpg").convert("RGB")) / 255.0
	panorama = torch.from_numpy(panorama).permute(2, 0, 1).float() * 2 - 1
	rgb_cubemap = erp_to_cubemap(panorama, face_w=cfg.face_size,
	fov=getattr(cfg, "cube_fov", 90.0)).unsqueeze(0).to(device)

	with torch.inference_mode():
	pred = pager(rgb_cubemap, dtype=torch.float16, skip_heads={"scale_indoor"})

	cmap = plt.get_cmap("Spectral")
	H, W = panorama.shape[-2:]
	depth_metric, _ = prepare_depth_for_logging(
	pager, pred["depth"][0], pred["sky"][0], (H, W), cmap,
	log_scale=pred["scale"],
	)
	normals, _ = prepare_normals_for_logging(
	pager, pred["normals"][0], pred["sky"][0], (H, W),
	)
	```

	`depth_metric` is a `(1, H, W)` float32 array of metric depth (metres); `normals` is a `(3, H, W)` unit-normal field. Both already have the predicted sky region filled in. See the [GitHub repository](https://github.com/prs-eth/PaGeR) for the full CLI (`inference.py`), evaluation scripts, the Gradio demo (`app.py`), and the point-cloud exporter.