Depth Anything V2 Small — SafeTensors

Depth Anything V2 (Small, ViT-S backbone) converted to SafeTensors for real-time robotic depth estimation. At just 95 MB, this is the lightest production-quality monocular depth model available — perfect for edge devices like Jetson Nano.

This model is part of the RobotFlowLabs model library, built for the ANIMA agentic robotics platform.

Why This Model Exists

Depth estimation needs to run alongside segmentation, features, and action models — all on the same edge GPU. At 95 MB, Depth Anything V2 Small is tiny enough to fit in any perception stack while still producing high-quality relative depth maps. Converted from raw .pth to SafeTensors for safe, zero-copy loading.

Model Details

Property	Value
Architecture	DPT head + ViT-Small encoder
Parameters	24.8M
Encoder	ViT-S/14 (DINOv2-based)
Input Resolution	Flexible (recommended 518×518)
Output	Dense relative depth map
Original Model	`depth-anything/Depth-Anything-V2-Small`
License	Apache-2.0

Quick Start

from safetensors.torch import load_file

state_dict = load_file("model.safetensors")

from depth_anything_v2.dpt import DepthAnythingV2
model = DepthAnythingV2(encoder='vits', features=64, out_channels=[48, 96, 192, 384])
model.load_state_dict(state_dict)
model.to("cuda").eval()

depth = model.infer_image(image)

Use Cases in ANIMA

Real-Time Obstacle Avoidance — Fastest depth estimation for navigation at camera framerate
Grasp Distance — Quick depth estimate for reach planning
Mobile Robots — Fits on Jetson Nano-class devices alongside other models
Multi-Camera Setups — Small enough to run one instance per camera

Depth Anything V2 Family

Model	Params	Size	Best For
depth-anything-v2-large	335M	1.3 GB	Highest quality depth
depth-anything-v2-small	24.8M	95 MB	Real-time edge deployment

Limitations

Relative depth only — not metric (needs calibration for absolute distances)
Lower accuracy than Large variant on complex scenes
Single-frame estimation — no temporal consistency

Attribution

Original Model: depth-anything/Depth-Anything-V2-Small by TUM & HKU
License: Apache-2.0
Paper: Depth Anything V2 — Yang et al., 2024
Converted by: RobotFlowLabs using FORGE

Citation

@article{yang2024depth_anything_v2,
  title={Depth Anything V2},
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  journal={arXiv preprint arXiv:2406.09414},
  year={2024}
}

Built with FORGE by RobotFlowLabs
Optimizing foundation models for real robots.