metadata
tags:
- robotics
- anima
- urd
- uniscale
- 3d-reconstruction
- depth-estimation
- multi-view
- robot-flow-labs
library_name: pytorch
pipeline_tag: depth-estimation
license: apache-2.0
URD — UniScale Unified Scale-Aware Multi-View 3D Reconstruction
Part of the ANIMA Perception Suite by Robot Flow Labs.
Paper
UniScale: Unified Scale-Aware Multi-View 3D Reconstruction arXiv: 2602.23224
Architecture
UniScale combines camera intrinsics, extrinsics, metric depth, and 3D point cloud generation into a single neural network forward pass. The core design leverages frozen DINOv2 ViT-B/14 foundation model features with lightweight scale-aware pose and depth decoders, enabling metrically consistent multi-view 3D reconstruction without iterative optimization (no RANSAC, no bundle adjustment).
Key components:
- Foundation Encoder: Frozen DINOv2 ViT-B/14 (86M params)
- Scale-Aware Pose Decoder: Estimates intrinsics + extrinsics + metric scale
- Metric Depth Generator: Dense depth maps with confidence estimation
- Point Cloud Generator: Direct 3D point maps in metric world coordinates
Exported Formats
| Format | File | Size | Use Case |
|---|---|---|---|
| PyTorch (.pth) | pytorch/urd_v1.pth |
~1.0 GB | Training, fine-tuning, resume |
| SafeTensors | pytorch/urd_v1.safetensors |
~347 MB | Fast loading, safe deserialization |
| ONNX | onnx/urd_v1.onnx |
~347 MB | Cross-platform inference |
| TensorRT FP16 | tensorrt/urd_v1_fp16.engine |
~177 MB | Edge deployment (Jetson/L4) |
| TensorRT FP32 | tensorrt/urd_v1_fp32.engine |
~355 MB | Full precision inference |
Training
- Dataset: NYU Depth V2 (654 train / 654 val)
- Hardware: NVIDIA L4 (23GB VRAM)
- Checkpoint:
final.pth(epoch 30/30) - Stages: 2-stage curriculum
- Stage 1 (epochs 1-5): Frozen encoder, batch=64, lr=1e-4
- Stage 2 (epochs 6-30): Unfrozen encoder, batch=16, lr=1e-5, gradient checkpointing
- Best val_loss: 0.1175 (epoch 18)
- Training time: ~97 minutes
- Optimizer: AdamW (weight_decay=0.01)
- Scheduler: Cosine annealing with 2-epoch warmup
- Seed: 42
See configs/ for full hyperparameters and logs/training_history.json for loss curves.
Usage
import torch
from anima_urd.model import UniScale
# Load from checkpoint
model = UniScale.load("pytorch/urd_v1.pth", device="cuda")
model.eval()
# Inference: 4 multi-view images at 512x512
images = torch.randn(1, 4, 3, 512, 512, device="cuda")
with torch.no_grad():
output = model(images)
depth = output.depth_maps # [1, 4, 512, 512] metric depth (meters)
confidence = output.depth_confidence # [1, 4, 512, 512]
intrinsics = output.intrinsics # [1, 3, 3]
scale = output.scale_factors # [1] metric scale
Multi-Robot Deployment
UniScale is designed for multi-robot coordination:
- Each robot runs feed-forward inference locally (no iterative optimization)
- Predicted metric scale enables direct point cloud merging across robots
- Linear scaling O(NM) vs quadratic O(N^2M^2) for traditional BA
Citation
@article{UniScale_2026,
title={UniScale Unified Scale-Aware Multi-View 3D Reconstruction},
year={2026},
eprint={2602.23224},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.23224}
}
License
Apache 2.0 — Robot Flow Labs / AIFLOW LABS LIMITED