Depth Anything V2 Large β SafeTensors
Depth Anything V2 (Large, ViT-L backbone) converted to SafeTensors format for safe, fast loading in robotic depth estimation pipelines. 335M parameters for high-quality monocular depth maps.
This model is part of the RobotFlowLabs model library, built for the ANIMA agentic robotics platform β a modular ROS2-native AI system that brings foundation model intelligence to real robots operating in the real world.
Why This Model Exists
Monocular depth estimation is fundamental to robotic navigation and manipulation β robots need to know how far away things are from a single camera. Depth Anything V2 produces the highest-quality relative depth maps from a single image. The original weights are distributed as raw .pth files. We converted them to SafeTensors format for safe, zero-copy memory-mapped loading.
Model Details
| Property | Value |
|---|---|
| Architecture | DPT head + ViT-Large encoder |
| Parameters | 335M |
| Encoder | ViT-L/14 (DINOv2-based) |
| Input Resolution | Flexible (recommended 518Γ518) |
| Output | Dense relative depth map |
| Training | Synthetic + real depth labels (multi-stage) |
| Original Model | depth-anything/Depth-Anything-V2-Large |
| License | Apache-2.0 |
Included Files
depth-anything-v2-large/
βββ model.safetensors # 1.3 GB β Full model weights
βββ README.md # This file
Quick Start
from safetensors.torch import load_file
import torch
# Load SafeTensors weights
state_dict = load_file("model.safetensors")
# Load into Depth Anything V2 architecture
from depth_anything_v2.dpt import DepthAnythingV2
model = DepthAnythingV2(encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024])
model.load_state_dict(state_dict)
model.to("cuda").eval()
# Predict depth
depth = model.infer_image(image) # Returns relative depth map
With Transformers
from transformers import AutoModelForDepthEstimation, AutoImageProcessor
import torch
processor = AutoImageProcessor.from_pretrained("depth-anything/Depth-Anything-V2-Large")
model = AutoModelForDepthEstimation.from_pretrained("depth-anything/Depth-Anything-V2-Large")
model.to("cuda").eval()
inputs = processor(images=image, return_tensors="pt").to("cuda")
with torch.no_grad():
depth = model(**inputs).predicted_depth
With FORGE (ANIMA Integration)
from forge.vision import VisionEncoderRegistry
depth_estimator = VisionEncoderRegistry.load("depth-anything-v2-large")
depth_map = depth_estimator(image_tensor) # Relative depth map
Use Cases in ANIMA
Depth estimation is critical across ANIMA modules:
- Obstacle Avoidance β Real-time depth maps for safe navigation
- Grasp Planning β Estimate object distance for manipulation reach calculations
- 3D Reconstruction β Dense depth for point cloud generation from single camera
- Safety Zones β Distance-based safety boundaries for human-robot collaboration
- Path Planning β Identify traversable spaces and obstacle heights
Depth Anything V2 Family
| Model | Params | Size | Best For |
|---|---|---|---|
| depth-anything-v2-large | 335M | 1.3 GB | Highest quality depth |
| depth-anything-v2-small | 24.8M | 95 MB | Real-time edge deployment |
Intended Use
Designed For
- Monocular depth estimation for robotic navigation
- Dense depth maps for manipulation planning
- Point cloud generation from RGB cameras
- Obstacle detection and distance estimation
Limitations
- Produces relative (not metric) depth β requires calibration for absolute distances
- Performance degrades on reflective, transparent, or textureless surfaces
- Single-frame estimation β no temporal consistency for video
- Inherits biases from training data distribution
Out of Scope
- Safety-critical autonomous driving without additional validation
- Medical depth estimation
- Surveillance applications
Attribution
- Original Model:
depth-anything/Depth-Anything-V2-Largeby TUM & HKU - License: Apache-2.0
- Paper: Depth Anything V2 β Yang et al., 2024
- Converted by: RobotFlowLabs using FORGE
Citation
@article{yang2024depth_anything_v2,
title={Depth Anything V2},
author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
journal={arXiv preprint arXiv:2406.09414},
year={2024}
}
Built with FORGE by RobotFlowLabs
Optimizing foundation models for real robots.
Model tree for robotflowlabs/depth-anything-v2-large
Base model
depth-anything/Depth-Anything-V2-LargeCollection including robotflowlabs/depth-anything-v2-large
Paper for robotflowlabs/depth-anything-v2-large
Evaluation results
- Model Size (MB)self-reported1279.000