| ---
|
| license: cc-by-nc-sa-4.0
|
| language:
|
| - en
|
| pipeline_tag: depth-estimation
|
| tags:
|
| - depth-estimation
|
| - metric-depth-estimation
|
| - monocular-depth-estimation
|
| - aerial
|
| - UAV
|
| - drone
|
| ---
|
|
|
| # OccuFly's Aerial DepthAnythingV2
|
|
|
| ## Introduction
|
|
|
|
|
| Following its acceptance as a [CVPR 2026 Oral](https://cvpr.thecvf.com/virtual/2026/oral/40308), we release our fine-tuned [DepthAnythingV2](https://depth-anything-v2.github.io/) model, specialized for aerial imagery. It was trained using the [OccuFly dataset](https://markus-42.github.io/publications/2026/occufly/), the first large-scale, real-world benchmark for aerial Metric Monocular Depth Estimation and Semantic Scene Completion.
|
|
|
| This model represents the depth estimation component of our [OccuFly project](https://markus-42.github.io/publications/2026/occufly/), in which fine-tuned `DepthAnythingV2-ViT-S` to infer accurate metric depth (in meters) from a single aerial image.
|
|
|
| ### Key Features
|
| - **Aerial-specialized**: Fine-tuned on diverse aerial imagery from urban, industrial, and rural environments.
|
| - **Multi-altitude performance**: Trained on data from 50m, 40m, and 30m altitudes.
|
| - **Seasonal robustness**: Captures data across all seasons for improved generalization.
|
| - **Lightweight**: Uses the ViT-S backbone for efficient inference.
|
|
|
| ## Installation
|
|
|
| ```bash
|
| git clone https://huggingface.co/spaces/depth-anything/Depth-Anything-V2
|
| cd Depth-Anything-V2
|
| pip install -r requirements.txt
|
| ```
|
|
|
| ## Quickstart
|
|
|
| Download the [model checkpoint](https://huggingface.co/markus-42/OccuFly-DepthAnythingV2/resolve/main/OccuFly-DepthAnything2.pth) and place it in your desired directory:
|
|
|
| ```python
|
| import cv2
|
| import torch
|
| from depth_anything_v2.dpt import DepthAnythingV2
|
|
|
| # Load the fine-tuned aerial model
|
| model = DepthAnythingV2(encoder='vits', features=64, out_channels=[48, 96, 192, 384])
|
| model.load_state_dict(torch.load('OccuFly-DepthAnything2.pth', map_location='cpu'))
|
| model.eval()
|
|
|
| # Inference
|
| with torch.no_grad():
|
| raw_img = cv2.imread('example.jpg')
|
| depth = model.infer_image(raw_img) # HxW metric depth map
|
| ```
|
|
|
| ## OccuFly Dataset
|
|
|
| The model is fine-tuned on [OccuFly](https://huggingface.co/datasets/markus-42/OccuFly), which includes:
|
|
|
| - **20,000+ aerial RGB images** with corresponding depth maps
|
| - **Multiple altitudes**: 30m, 40m, 50m flight altitudes
|
| - **Seasonal diversity**: Spring, Summer, Fall, Winter
|
| - **Multiple environments**: Urban, industrial, rural
|
| - **21 semantic classes** with dense voxel grid annotations
|
|
|
| ## Citation
|
|
|
| If our work was helpful to you, we would appreciate citing our paper and the original DepthAnythingV2 work, or giving the repository a like β€οΈ
|
|
|
| ```bibtex
|
| @inproceedings{gross2026occufly,
|
| title={{OccuFly: A 3D Vision Benchmark for Semantic Scene Completion from the Aerial Perspective}},
|
| author={Markus Gross and Sai B. Matha and Aya Fahmy and Rui Song and Daniel Cremers and Henri Meess},
|
| booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
|
| year={2026},
|
| }
|
|
|
| @article{depth_anything_v2,
|
| title={Depth Anything V2},
|
| author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
|
| journal={arXiv preprint arXiv:2406.09414},
|
| year={2024}
|
| }
|
| ```
|
|
|
| ## Related Resources
|
|
|
| π [OccuFly Project Page](https://markus-42.github.io/publications/2026/occufly/)<br>
|
| π€ [OccuFly Dataset on HuggingFace](https://huggingface.co/datasets/markus-42/OccuFly)<br>
|
| π [OccuFly Paper](https://arxiv.org/abs/2512.20770)<br>
|
| π [Original DepthAnythingV2](https://github.com/DepthAnything/Depth-Anything-V2)
|
|
|
| ## License
|
|
|
| This work is licensed under the [CC BY-NC-SA 4.0 license](https://creativecommons.org/licenses/by-nc-sa/4.0/). See the LICENSE file for the full legal terms.
|
|
|