Vortex-Depth-V9-Corridor (Lighthouse)

A 5.31 × 10⁶ parameter monocular depth + 6-class segmentation student specialized for corridor-class indoor environments. The production corridor checkpoint of the Vortex-Depth lineage. Closed-loop validated against ground-truth depth in simulation.

Property	Value
Codename	Lighthouse
Lineage version	V9
Architecture	EfficientViT-B1 encoder + dual transposed-convolution decoder
Parameters	5.31 × 10⁶
Input	RGB, 240 × 320, ImageNet-normalized within forward pass
Output	depth `[B, 1, 240, 320]` in meters; segmentation `[B, 6, 240, 320]` logits
Initialization	vortex-depth-v6-pretrained (Cornerstone)
Fine-tuning corpus	LILocBench `dynamics_0` (Intel RealSense D455, ~5 × 10³ corridor frames)
Teacher	DA3-Metric-Large
Inference latency	~5 ms on Jetson Orin Nano (TensorRT FP16)

Headline result

LILocBench corridor RMSE: 0.382 m — the lineage's best corridor depth measurement.

Closed-loop navigation parity with ground-truth depth in Gazebo Fortress corridor experiments:

Depth source	Success rate (10 seeds)	Time-to-goal	Collisions
Ground-truth depth	9 / 10	17.77 ± 0.90 s	0
V9 (TensorRT FP16)	9 / 10	17.99 ± 0.45 s	0

The single failure is shared between configurations (identical seed, equivalent final pose error), indicating a planner-level failure mode rather than a depth-quality difference.

Use case

Recommended for fixed-route corridor deployment in indoor robot navigation. The intended deployment context is corridors with the structural characteristics of typical institutional indoor spaces: long parallel walls, mid-field-dominant depth distributions (1–15 m), uniform or fluorescent lighting. The model has been fine-tuned on the LILocBench Bonn corridor dataset and achieves competitive corridor depth metrics on that distribution.

For environments outside the corridor distribution, see:

vortex-depth-v5-general (Atlas) — general-purpose indoor depth across diverse room geometries
vortex-depth-v6-pretrained (Cornerstone) — fine-tuning base for additional corridor-class specialists

Operational limitations

V9 is specialized for corridor environments. Predictable performance regressions occur under the following conditions:

Condition	Mechanism	Recommendation
Open rooms or large unbounded spaces	The model expects depth to compress against a back wall in the 6–15 m range; absent that geometry, predictions become structurally inconsistent	Use V5 (Atlas) or V6 (Cornerstone)
Reflective surfaces outside training distribution (granite, polished metal, water, large mirrors)	The model has been exposed to glass via LILocBench but has not generalized to other specular surfaces at comparable scale	Use DA3-Small zero-shot
Outdoor scenes	Out of training distribution	Not recommended
General room-scene segmentation	NYU val mIoU degrades to 31.6 % from V5's 63.7 % as a consequence of single-domain fine-tuning	Use V5 (Atlas)

Loading

import torch
from models.student import build_student
from config import Config

cfg = Config()
model = build_student(num_classes=cfg.NUM_CLASSES, pretrained=False, backbone=cfg.BACKBONE)
state = torch.load("best_depth_v9.pt", map_location="cpu")
model.load_state_dict(state)
model.eval()

with torch.no_grad():
    depth, seg_logits = model(rgb_tensor)  # rgb_tensor: [B, 3, 240, 320]

Deployment realization

At runtime, V9 depth is consumed by a confidence-gated fusion node combining ToF and student outputs per pixel:

s = median(d_tof / d_v9)        over pixels where ToF confidence ≥ 0.5

if confidence(i, j) ≥ 0.5 AND 0.05 m < tof_depth(i, j) < 10.0 m:
    fused(i, j) = tof_depth(i, j)            # use hardware reading directly
else:
    fused(i, j) = s · v9_depth(i, j)         # use scaled learned depth

The per-frame median-scale step anchors the V9 prediction to metric units using the surviving ToF pixels. The model is therefore not required to produce absolute scale accuracy — it must produce accurate depth structure, which the runtime then anchors per frame.

Training

Fine-tuned from the V6 multi-domain pretrained checkpoint on LILocBench dynamics_0 for 50 epochs. AdamW optimizer, encoder LR 3 × 10⁻⁵, decoder LR 3 × 10⁻⁴, cosine annealing, batch size 16. Loss formulation unchanged from earlier lineage stages: berHu (depth) + cross-entropy (segmentation) + edge-aware smoothness, Kendall-weighted with log σ² clamped to [-2, 2].

The complete training pipeline that produces V9 spans four sequential stages:

ImageNet pretraining (timm default, ~1.3 × 10⁶ classification labels).
Multi-domain pretraining on SUN RGB-D + DIODE Indoor + NYU Depth V2 (V6 pretrain checkpoint).
NYU Depth V2 fine-tuning with augmentation pipeline (V6 final checkpoint).
LILocBench dynamics_0 fine-tuning (V9 final checkpoint).

The first three stages are inherited from V6; only the final corridor-specialization stage is unique to V9.

Bootstrap perception context

This checkpoint is the production corridor model in a three-checkpoint family released as part of the Vortex bootstrap-perception pipeline. The pipeline addresses the operational reality that Time-of-Flight depth sensors lose ~78 % of their pixels on reflective indoor surfaces (polished floors, glass walls); the student fills the dead pixels with consistent learned geometry, and runtime fusion combines surviving sensor pixels with the student output.

Project resources

Codebase: github.com/Nishant-ZFYII/ml_inference
Documentation: nishant-zfyii.github.io/ml_inference
V9 model page: Lighthouse (V9)

Reference

If you use this model in your work, please reference the project repository:

https://github.com/Nishant-ZFYII/ml_inference

License

MIT.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Depth Estimation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NishantPushparaju/vortex-depth-v9-corridor

Base model

NishantPushparaju/vortex-depth-v6-pretrained

Finetuned

(1)

this model

Dataset used to train NishantPushparaju/vortex-depth-v9-corridor

Collection including NishantPushparaju/vortex-depth-v9-corridor

Bootstrap Perception — Monocular Depth Models

Collection

Three monocular depth student models distilled from DA3-Metric-Large for indoor robot navigation under hardware depth failure. • 3 items • Updated about 5 hours ago

Evaluation results

LILocBench Corridor RMSE (m) on LILocBench dynamics_0 (RealSense D455)
self-reported

0.382
NYU val RMSE (m) on NYU Depth V2 (val)
self-reported

1.553