Vortex-Depth-V6-Pretrained (Cornerstone)

A 5.31 × 10⁶ parameter monocular depth + 6-class segmentation student trained with multi-domain pretraining on SUN RGB-D and DIODE Indoor, followed by NYU Depth V2 fine-tuning. The recommended fine-tuning base for additional domain specialists in the Vortex-Depth lineage.

Property	Value
Codename	Cornerstone
Lineage version	V6
Architecture	EfficientViT-B1 encoder + dual transposed-convolution decoder
Parameters	5.31 × 10⁶
Input	RGB, 240 × 320, ImageNet-normalized within forward pass
Output	depth `[B, 1, 240, 320]` in meters; segmentation `[B, 6, 240, 320]` logits
Stage 1 corpus	SUN RGB-D (~~10 × 10³ frames) + DIODE Indoor (~~8 × 10³ frames)
Stage 2 corpus	NYU Depth V2 (1.159 × 10³ train) with V5 augmentation pipeline
Teacher	DA3-Metric-Large
Inference latency	~5 ms on Jetson Orin Nano (TensorRT FP16)

Use case

Recommended as the fine-tuning base for users developing additional domain-specialist depth models. The multi-domain pretraining stage establishes a richer encoder prior than NYU-only training, which transfers to subsequent domain specialization more effectively.

This is demonstrated empirically in the lineage: a corridor specialist fine-tuned from V6 (vortex-depth-v9-corridor (Lighthouse)) achieves 0.382 m LILocBench corridor RMSE, a 14 % relative improvement over the same fine-tuning protocol applied to the V5-initialized variant (V7: 0.445 m).

V6 itself is the lineage's best NYU result:

NYU val RMSE: 0.519 m (lowest in lineage)
NYU val mIoU (6-class): 48.5 %

The mIoU regression relative to vortex-depth-v5-general (Atlas) (63.7 %) is an artifact of mixed-supervision effects during multi-domain pretraining (SUN RGB-D and DIODE Indoor have no segmentation annotations). For deployments where general-purpose segmentation accuracy matters, V5 (Atlas) is the recommended checkpoint.

Fine-tuning template

To produce a corridor or room specialist from this base:

import torch
from models.student import build_student
from config import Config

cfg = Config()
cfg.LR = 3e-4
cfg.ENCODER_LR_SCALE = 0.1   # encoder LR = 3e-5

model = build_student(num_classes=cfg.NUM_CLASSES, pretrained=False, backbone=cfg.BACKBONE)
state = torch.load("best_depth_v6.pt", map_location="cpu")
model.load_state_dict(state)

# Continue training on your domain-specific corpus for ~30-50 epochs
# Refer to train.py in the project codebase for the full training loop

Training

Two-stage training schedule:

Stage 1 (multi-domain pretrain): 50 epochs across SUN RGB-D + DIODE Indoor + NYU Depth V2 with the V5 augmentation pipeline. HPC job 3093046, snapshotted at 38 / 50 epochs.
Stage 2 (NYU fine-tune): 200 epochs on NYU Depth V2 alone with the same augmentation pipeline. HPC job 3098656, snapshotted at 154 / 200 epochs (walltime).

Optimizer: AdamW, encoder LR 3 × 10⁻⁵, decoder LR 3 × 10⁻⁴, cosine annealing, batch size 16, encoder frozen for the first 5 epochs.

The Stage 1 pretrain required a loss-function guard for the cross-entropy term: SUN RGB-D and DIODE Indoor have no segmentation annotations, so the segmentation supervision is skipped on those batches. Without the guard, nn.CrossEntropyLoss(ignore_index=255) returns NaN on all-ignore batches and propagates through the multi-task loss, crashing the optimizer.

Training was performed on NVIDIA L40S 48 GB hardware (NYU Greene HPC, partition l40s_public).

Bootstrap perception context

This checkpoint is one component of a three-checkpoint family released as part of the Vortex bootstrap-perception pipeline for indoor robot navigation under hardware depth failure. See vortex-depth-v5-general (Atlas) for the recommended general-purpose deployment checkpoint, and vortex-depth-v9-corridor (Lighthouse) for the production corridor specialist derived from this base.

Project resources

Codebase: github.com/Nishant-ZFYII/ml_inference
Documentation: nishant-zfyii.github.io/ml_inference
V6 model page: Cornerstone (V6)

Reference

If you use this model in your work, please reference the project repository:

https://github.com/Nishant-ZFYII/ml_inference

License

MIT.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Depth Estimation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NishantPushparaju/vortex-depth-v6-pretrained

Finetunes

1 model

Dataset used to train NishantPushparaju/vortex-depth-v6-pretrained

Collection including NishantPushparaju/vortex-depth-v6-pretrained

Bootstrap Perception — Monocular Depth Models

Collection

Three monocular depth student models distilled from DA3-Metric-Large for indoor robot navigation under hardware depth failure. • 3 items • Updated about 8 hours ago

Evaluation results

NYU val RMSE (m) on NYU Depth V2 (val)
self-reported

0.519
6-class Segmentation mIoU (%) on NYU Depth V2 (val)
self-reported

48.500