YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
MetricScenes
A metrically-grounded, in-the-wild dataset. For more details, please visit the project page.
Paper
Title: Honey, I Shrunk the Arc de Triomphe!
Authors: Yuanbo Xiangli, Hanyu Chen, Xueqing Tsang, Noah Snavely
Project page: https://metricscenes.github.io/
Abstract
Metric scale monocular geometry estimation has seen significant progress through large-scale data aggregation, yet current foundation models suffer from a persistent ''scale-collapse'' phenomenon: distant landmarks and vast landscapes are metrically underestimated. This performance gap stems from a training data bottleneck, where existing metric-scale datasets are hardware-constrained to unvaried street-level LiDAR or short-range indoor scans, or consist of synthetic data that lacks the semantic complexity of the physical world. To bridge this gap, we curate a new metrically-grounded, in-the-wild dataset that we call Metricscenes, gathered from a variety of sources including Internet photo collections and stereo imagery. We estimate camera poses and initial depth maps for each scene using off-the-shelf methods, and recover absolute scale from geo-tagged metadata as well as known stereo camera baselines. We also improve the quality of depth maps derived from MetricScenes via a new two-stage Poisson completion method. Fine-tuning MoGe-2 on our dataset significantly mitigates scale-collapse and achieves superior metric accuracy in unconstrained, open-domain scenes while maintaining state-of-the-art performance on standard benchmarks.
WildMoGe
WildMoGe is a MoGe-2 ViT-Large-Normal model fine-tuned on our MetricScenes dataset. Please refer to MoGe for model inference and evaluation. We provide two fine-tuned versions: model.pt, which is the version evaluated in the paper; and model_v2.pt, which offers comparable or slightly better metrological performance on in-the-wild scenes.