Vortex-Depth-V6-Pretrained (Cornerstone)

A 5.31 × 10⁶ parameter monocular depth + 6-class segmentation student trained with multi-domain pretraining on SUN RGB-D and DIODE Indoor, followed by NYU Depth V2 fine-tuning. The recommended fine-tuning base for additional domain specialists in the Vortex-Depth lineage.

Property Value
Codename Cornerstone
Lineage version V6
Architecture EfficientViT-B1 encoder + dual transposed-convolution decoder
Parameters 5.31 × 10⁶
Input RGB, 240 × 320, ImageNet-normalized within forward pass
Output depth [B, 1, 240, 320] in meters; segmentation [B, 6, 240, 320] logits
Stage 1 corpus SUN RGB-D (10 × 10³ frames) + DIODE Indoor (8 × 10³ frames)
Stage 2 corpus NYU Depth V2 (1.159 × 10³ train) with V5 augmentation pipeline
Teacher DA3-Metric-Large
Inference latency ~5 ms on Jetson Orin Nano (TensorRT FP16)

Use case

Recommended as the fine-tuning base for users developing additional domain-specialist depth models. The multi-domain pretraining stage establishes a richer encoder prior than NYU-only training, which transfers to subsequent domain specialization more effectively.

This is demonstrated empirically in the lineage: a corridor specialist fine-tuned from V6 (vortex-depth-v9-corridor (Lighthouse)) achieves 0.382 m LILocBench corridor RMSE, a 14 % relative improvement over the same fine-tuning protocol applied to the V5-initialized variant (V7: 0.445 m).

V6 itself is the lineage's best NYU result:

  • NYU val RMSE: 0.519 m (lowest in lineage)
  • NYU val mIoU (6-class): 48.5 %

The mIoU regression relative to vortex-depth-v5-general (Atlas) (63.7 %) is an artifact of mixed-supervision effects during multi-domain pretraining (SUN RGB-D and DIODE Indoor have no segmentation annotations). For deployments where general-purpose segmentation accuracy matters, V5 (Atlas) is the recommended checkpoint.

Fine-tuning template

To produce a corridor or room specialist from this base:

import torch
from models.student import build_student
from config import Config

cfg = Config()
cfg.LR = 3e-4
cfg.ENCODER_LR_SCALE = 0.1   # encoder LR = 3e-5

model = build_student(num_classes=cfg.NUM_CLASSES, pretrained=False, backbone=cfg.BACKBONE)
state = torch.load("best_depth_v6.pt", map_location="cpu")
model.load_state_dict(state)

# Continue training on your domain-specific corpus for ~30-50 epochs
# Refer to train.py in the project codebase for the full training loop

Training

Two-stage training schedule:

  • Stage 1 (multi-domain pretrain): 50 epochs across SUN RGB-D + DIODE Indoor + NYU Depth V2 with the V5 augmentation pipeline. HPC job 3093046, snapshotted at 38 / 50 epochs.
  • Stage 2 (NYU fine-tune): 200 epochs on NYU Depth V2 alone with the same augmentation pipeline. HPC job 3098656, snapshotted at 154 / 200 epochs (walltime).

Optimizer: AdamW, encoder LR 3 × 10⁻⁵, decoder LR 3 × 10⁻⁴, cosine annealing, batch size 16, encoder frozen for the first 5 epochs.

The Stage 1 pretrain required a loss-function guard for the cross-entropy term: SUN RGB-D and DIODE Indoor have no segmentation annotations, so the segmentation supervision is skipped on those batches. Without the guard, nn.CrossEntropyLoss(ignore_index=255) returns NaN on all-ignore batches and propagates through the multi-task loss, crashing the optimizer.

Training was performed on NVIDIA L40S 48 GB hardware (NYU Greene HPC, partition l40s_public).

Bootstrap perception context

This checkpoint is one component of a three-checkpoint family released as part of the Vortex bootstrap-perception pipeline for indoor robot navigation under hardware depth failure. See vortex-depth-v5-general (Atlas) for the recommended general-purpose deployment checkpoint, and vortex-depth-v9-corridor (Lighthouse) for the production corridor specialist derived from this base.

Project resources

Reference

If you use this model in your work, please reference the project repository:

https://github.com/Nishant-ZFYII/ml_inference

License

MIT.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NishantPushparaju/vortex-depth-v6-pretrained

Finetunes
1 model

Dataset used to train NishantPushparaju/vortex-depth-v6-pretrained

Collection including NishantPushparaju/vortex-depth-v6-pretrained

Evaluation results

  • NYU val RMSE (m) on NYU Depth V2 (val)
    self-reported
    0.519
  • 6-class Segmentation mIoU (%) on NYU Depth V2 (val)
    self-reported
    48.500