Depth Anything V2 Small β SafeTensors
Depth Anything V2 (Small, ViT-S backbone) converted to SafeTensors for real-time robotic depth estimation. At just 95 MB, this is the lightest production-quality monocular depth model available β perfect for edge devices like Jetson Nano.
This model is part of the RobotFlowLabs model library, built for the ANIMA agentic robotics platform.
Why This Model Exists
Depth estimation needs to run alongside segmentation, features, and action models β all on the same edge GPU. At 95 MB, Depth Anything V2 Small is tiny enough to fit in any perception stack while still producing high-quality relative depth maps. Converted from raw .pth to SafeTensors for safe, zero-copy loading.
Model Details
| Property | Value |
|---|---|
| Architecture | DPT head + ViT-Small encoder |
| Parameters | 24.8M |
| Encoder | ViT-S/14 (DINOv2-based) |
| Input Resolution | Flexible (recommended 518Γ518) |
| Output | Dense relative depth map |
| Original Model | depth-anything/Depth-Anything-V2-Small |
| License | Apache-2.0 |
Quick Start
from safetensors.torch import load_file
state_dict = load_file("model.safetensors")
from depth_anything_v2.dpt import DepthAnythingV2
model = DepthAnythingV2(encoder='vits', features=64, out_channels=[48, 96, 192, 384])
model.load_state_dict(state_dict)
model.to("cuda").eval()
depth = model.infer_image(image)
Use Cases in ANIMA
- Real-Time Obstacle Avoidance β Fastest depth estimation for navigation at camera framerate
- Grasp Distance β Quick depth estimate for reach planning
- Mobile Robots β Fits on Jetson Nano-class devices alongside other models
- Multi-Camera Setups β Small enough to run one instance per camera
Depth Anything V2 Family
| Model | Params | Size | Best For |
|---|---|---|---|
| depth-anything-v2-large | 335M | 1.3 GB | Highest quality depth |
| depth-anything-v2-small | 24.8M | 95 MB | Real-time edge deployment |
Limitations
- Relative depth only β not metric (needs calibration for absolute distances)
- Lower accuracy than Large variant on complex scenes
- Single-frame estimation β no temporal consistency
Attribution
- Original Model:
depth-anything/Depth-Anything-V2-Smallby TUM & HKU - License: Apache-2.0
- Paper: Depth Anything V2 β Yang et al., 2024
- Converted by: RobotFlowLabs using FORGE
Citation
@article{yang2024depth_anything_v2,
title={Depth Anything V2},
author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
journal={arXiv preprint arXiv:2406.09414},
year={2024}
}
Built with FORGE by RobotFlowLabs
Optimizing foundation models for real robots.
Model tree for robotflowlabs/depth-anything-v2-small
Base model
depth-anything/Depth-Anything-V2-SmallCollection including robotflowlabs/depth-anything-v2-small
Paper for robotflowlabs/depth-anything-v2-small
Evaluation results
- Model Size (MB)self-reported95.000