ABYSSOS -- Metric Depth Estimation (ANIMA Module)

Part of the ANIMA Perception Suite by Robot Flow Labs.

Paper

PromptDA: Prompt-Guided Depth Anything for Metric Depth Estimation (arXiv:2502.11277)

Based on Depth Anything V2 with sparse LiDAR prompt fusion for metric depth.

Model

Architecture: PromptDA ViT-S (Vision Transformer Small + DPT Decoder + Gated Cross-Attention Prompt Fusion)
Parameters: 25.1M
Input: RGB image (518x518) + sparse depth prompt (518x518)
Output: Dense metric depth map (518x518)
Base model: depth-anything/prompt-depth-anything-vits-hf

Training

Parameter	Value
Hardware	8x NVIDIA L4 (23GB each)
Framework	PyTorch 2.11 + DDP
Dataset	NYU Depth V2 (1304 train, 72 val, 73 test)
Batch size	24/GPU, effective 192
Optimizer	AdamW (head lr=5e-5, backbone lr=5e-6)
Schedule	500-step warmup + cosine decay
Loss	L1 + 0.5 * gradient (edge-aware)
Precision	bf16 mixed precision
Steps	5000 (best at step 250)
Training time	2.5 hours

Results

Metric	Value	Target
Test AbsRel (with prompt)	0.0203	< 0.040
Test RMSE	0.1009	--
Test delta<1.25	0.9954	--
Val AbsRel (best, step 250)	0.0192	--
DAv2 baseline (no prompt)	0.073	--

3.6x better than DAv2 metric baseline with sparse depth prompts.

Exported Formats

Format	File	Size	Status
SafeTensors	`pytorch/model.safetensors`	96 MB	Available
ONNX	`onnx/abyssos_v1.onnx`	102 MB	Available
TensorRT FP16	--	--	Deferred (bilinear upsample TRT limitation)
TensorRT FP32	--	--	Deferred
Checkpoint (resume)	`checkpoints/`	287 MB	Available (includes optimizer state)

Usage

from transformers import AutoModelForDepthEstimation, AutoImageProcessor
import torch

model = AutoModelForDepthEstimation.from_pretrained("ilessio-aiflowlab/project_abyssos", subfolder="pytorch")
processor = AutoImageProcessor.from_pretrained("ilessio-aiflowlab/project_abyssos", subfolder="pytorch")

# Inference with prompt
pixel_values = torch.randn(1, 3, 518, 518)  # preprocessed RGB
prompt_depth = torch.randn(1, 1, 518, 518)  # sparse depth from LiDAR
output = model(pixel_values=pixel_values, prompt_depth=prompt_depth)
depth_map = output.predicted_depth  # (1, 518, 518) metric depth in meters

ONNX Inference

import onnxruntime as ort
import numpy as np

sess = ort.InferenceSession("onnx/abyssos_v1.onnx")
depth = sess.run(None, {
    "pixel_values": np.random.randn(1, 3, 518, 518).astype(np.float32),
    "prompt_depth": np.random.randn(1, 1, 518, 518).astype(np.float32),
})[0]  # (1, 518, 518)

Repository Structure

.
├── README.md
├── anima_module.yaml
├── pytorch/
│   ├── model.safetensors
│   ├── config.json
│   └── preprocessor_config.json
├── onnx/
│   └── abyssos_v1.onnx
├── checkpoints/
│   ├── model.safetensors
│   ├── training_state.pt
│   └── eval_metrics.json
├── configs/
│   ├── train_8gpu_ddp.yaml
│   ├── train_8gpu_dp_safe.yaml
│   └── ...
└── logs/
    ├── train_v3_report.json
    └── training_history.jsonl

License

Apache 2.0 -- Robot Flow Labs / AIFLOW LABS LIMITED

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Depth Estimation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for ilessio-aiflowlab/project_abyssos

"When I lost it, they dragged me out": How Care Encounters Empower Marginalized Young Adults' Aspiration and Mental Health Care-Seeking

Paper • 2502.11277 • Published Feb 16, 2025