ABYSSOS -- Metric Depth Estimation (ANIMA Module)

Part of the ANIMA Perception Suite by Robot Flow Labs.

Paper

PromptDA: Prompt-Guided Depth Anything for Metric Depth Estimation (arXiv:2502.11277)

Based on Depth Anything V2 with sparse LiDAR prompt fusion for metric depth.

Model

  • Architecture: PromptDA ViT-S (Vision Transformer Small + DPT Decoder + Gated Cross-Attention Prompt Fusion)
  • Parameters: 25.1M
  • Input: RGB image (518x518) + sparse depth prompt (518x518)
  • Output: Dense metric depth map (518x518)
  • Base model: depth-anything/prompt-depth-anything-vits-hf

Training

Parameter Value
Hardware 8x NVIDIA L4 (23GB each)
Framework PyTorch 2.11 + DDP
Dataset NYU Depth V2 (1304 train, 72 val, 73 test)
Batch size 24/GPU, effective 192
Optimizer AdamW (head lr=5e-5, backbone lr=5e-6)
Schedule 500-step warmup + cosine decay
Loss L1 + 0.5 * gradient (edge-aware)
Precision bf16 mixed precision
Steps 5000 (best at step 250)
Training time 2.5 hours

Results

Metric Value Target
Test AbsRel (with prompt) 0.0203 < 0.040
Test RMSE 0.1009 --
Test delta<1.25 0.9954 --
Val AbsRel (best, step 250) 0.0192 --
DAv2 baseline (no prompt) 0.073 --

3.6x better than DAv2 metric baseline with sparse depth prompts.

Exported Formats

Format File Size Status
SafeTensors pytorch/model.safetensors 96 MB Available
ONNX onnx/abyssos_v1.onnx 102 MB Available
TensorRT FP16 -- -- Deferred (bilinear upsample TRT limitation)
TensorRT FP32 -- -- Deferred
Checkpoint (resume) checkpoints/ 287 MB Available (includes optimizer state)

Usage

from transformers import AutoModelForDepthEstimation, AutoImageProcessor
import torch

model = AutoModelForDepthEstimation.from_pretrained("ilessio-aiflowlab/project_abyssos", subfolder="pytorch")
processor = AutoImageProcessor.from_pretrained("ilessio-aiflowlab/project_abyssos", subfolder="pytorch")

# Inference with prompt
pixel_values = torch.randn(1, 3, 518, 518)  # preprocessed RGB
prompt_depth = torch.randn(1, 1, 518, 518)  # sparse depth from LiDAR
output = model(pixel_values=pixel_values, prompt_depth=prompt_depth)
depth_map = output.predicted_depth  # (1, 518, 518) metric depth in meters

ONNX Inference

import onnxruntime as ort
import numpy as np

sess = ort.InferenceSession("onnx/abyssos_v1.onnx")
depth = sess.run(None, {
    "pixel_values": np.random.randn(1, 3, 518, 518).astype(np.float32),
    "prompt_depth": np.random.randn(1, 1, 518, 518).astype(np.float32),
})[0]  # (1, 518, 518)

Repository Structure

.
β”œβ”€β”€ README.md
β”œβ”€β”€ anima_module.yaml
β”œβ”€β”€ pytorch/
β”‚   β”œβ”€β”€ model.safetensors
β”‚   β”œβ”€β”€ config.json
β”‚   └── preprocessor_config.json
β”œβ”€β”€ onnx/
β”‚   └── abyssos_v1.onnx
β”œβ”€β”€ checkpoints/
β”‚   β”œβ”€β”€ model.safetensors
β”‚   β”œβ”€β”€ training_state.pt
β”‚   └── eval_metrics.json
β”œβ”€β”€ configs/
β”‚   β”œβ”€β”€ train_8gpu_ddp.yaml
β”‚   β”œβ”€β”€ train_8gpu_dp_safe.yaml
β”‚   └── ...
└── logs/
    β”œβ”€β”€ train_v3_report.json
    └── training_history.jsonl

License

Apache 2.0 -- Robot Flow Labs / AIFLOW LABS LIMITED

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support