nielsr HF Staff commited on
Commit
623fb5a
Β·
verified Β·
1 Parent(s): b6fb620

Improve model card with comprehensive details and add library name tag

Browse files

This PR significantly improves the model card for DepthAnything-AC by incorporating more detailed information from the paper and its associated GitHub repository.

Key improvements include:
- **Metadata**: Added `library_name: transformers` tag for better discoverability and integration.
- **Paper Abstract**: Included the full abstract to provide a clear overview of the research.
- **Project Page & Code**: Added prominent links to the project page (`https://ghost233lism.github.io/depthanything-AC-page/`) and the GitHub repository (`https://github.com/HVision-NKU/DepthAnythingAC`).
- **Comprehensive Introduction**: Expanded the introduction section with details from the GitHub README.
- **Model Architecture**: Added a section to describe the model's architecture.
- **Detailed Usage**: Enhanced the usage section with instructions on model download, quick inference, training, and evaluation, mirroring the GitHub guide.
- **Quantitative Results**: Included the quantitative performance tables from the paper's GitHub README for various benchmarks.
- **License, Contact, and Acknowledgements**: Added explicit sections for the license details, contact information, and acknowledgements as found in the GitHub README, ensuring all relevant information is present on the Hugging Face Hub.

These updates aim to provide researchers with a more complete and accessible understanding of the model, its capabilities, and how to use it.

Files changed (1) hide show
  1. README.md +153 -10
README.md CHANGED
@@ -1,39 +1,182 @@
1
  ---
2
- license: cc-by-nc-4.0
3
- language:
4
- - en
5
  base_model:
6
  - depth-anything/Depth-Anything-V2-Small
 
 
 
 
 
7
  tags:
8
  - depth
9
  - relative depth
10
  - depth anything
11
- pipeline_tag: depth-estimation
12
  ---
13
 
14
- # DepthAnything-AC
 
 
 
 
 
 
 
 
 
 
15
 
16
  ## Introduction
17
- [DepthAnything-AC](https://arxiv.org/abs/2507.01634) is a robust monocular depth estimation model fine-tuned from Depth-Anything-V2-Small, designed to enable zero-shot relative depth estimation under all-weather conditions.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ## Installation
20
- ```
 
 
 
 
 
 
 
 
 
 
 
21
  git clone https://github.com/HVision-NKU/DepthAnythingAC.git
22
  cd DepthAnythingAC
23
  conda create -n depth_anything_ac python=3.9
24
  conda activate depth_anything_ac
25
  pip install -r requirements.txt
26
  ```
 
27
  ## Usage
28
- You may refer to our [github](https://github.com/HVision-NKU/DepthAnythingAC) repo for detailed inference scripts.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ## Citation
 
31
  If you find this work useful, please consider citing:
32
- ```
 
33
  @article{sun2025depth,
34
  title={Depth Anything at Any Condition},
35
  author={Sun, Boyuan and Modi Jin and Bowen Yin and Hou, Qibin},
36
  journal={arXiv preprint arXiv:2507.01634},
37
  year={2025}
38
  }
39
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  base_model:
3
  - depth-anything/Depth-Anything-V2-Small
4
+ language:
5
+ - en
6
+ license: cc-by-nc-4.0
7
+ pipeline_tag: depth-estimation
8
+ library_name: transformers
9
  tags:
10
  - depth
11
  - relative depth
12
  - depth anything
 
13
  ---
14
 
15
+ # Depth Anything at Any Condition
16
+
17
+ ## Paper, Project Page, and Code
18
+
19
+ The model was presented in the paper [Depth Anything at Any Condition](https://huggingface.co/papers/2507.01634).
20
+ Project page: [https://ghost233lism.github.io/depthanything-AC-page/](https://ghost233lism.github.io/depthanything-AC-page/)
21
+ Code: [https://github.com/HVision-NKU/DepthAnythingAC](https://github.com/HVision-NKU/DepthAnythingAC)
22
+
23
+ ## Abstract
24
+
25
+ We present Depth Anything at Any Condition (DepthAnything-AC), a foundation monocular depth estimation (MDE) model capable of handling diverse environmental conditions. Previous foundation MDE models achieve impressive performance across general scenes but not perform well in complex open-world environments that involve challenging conditions, such as illumination variations, adverse weather, and sensor-induced distortions. To overcome the challenges of data scarcity and the inability of generating high-quality pseudo-labels from corrupted images, we propose an unsupervised consistency regularization finetuning paradigm that requires only a relatively small amount of unlabeled data. Furthermore, we propose the Spatial Distance Constraint to explicitly enforce the model to learn patch-level relative relationships, resulting in clearer semantic boundaries and more accurate details. Experimental results demonstrate the zero-shot capabilities of DepthAnything-AC across diverse benchmarks, including real-world adverse weather benchmarks, synthetic corruption benchmarks, and general benchmarks.
26
 
27
  ## Introduction
28
+
29
+ [DepthAnything-AC](https://arxiv.org/abs/2507.01634) is a robust monocular depth estimation (MDE) model fine-tuned from [DepthAnything-V2](https://github.com/DepthAnything/Depth-Anything-V2), designed for **zero-shot depth estimation under diverse and challenging environmental conditions**, including low light, adverse weather, and sensor distortions.
30
+
31
+ To address the lack of high-quality annotations in corrupted scenes, we introduce a lightweight **unsupervised consistency regularization** framework that enables training on unlabeled data. Additionally, our proposed **Spatial Distance Constraint** helps the model learn patch-level geometric relationships, enhancing semantic boundaries and fine details.
32
+
33
+ ![teaser](https://github.com/HVision-NKU/DepthAnythingAC/blob/main/assets/teaser.png?raw=true)
34
+
35
+ <div align="center">
36
+ <img src="https://github.com/HVision-NKU/DepthAnythingAC/blob/main/assets/depthanything-AC-video.gif?raw=true" alt="video" width="100%">
37
+ </div>
38
+
39
+ ## Model Architecture
40
+
41
+ The architecture of DepthAnything-AC is illustrated below:
42
+
43
+ ![architecture](https://github.com/HVision-NKU/DepthAnythingAC/blob/main/assets/architecture.png?raw=true)
44
 
45
  ## Installation
46
+
47
+ ### Requirements
48
+
49
+ - Python>=3.9
50
+ - torch==2.3.0
51
+ - torchvision==0.18.0
52
+ - torchaudio==2.3.0
53
+ - cuda==12.1
54
+
55
+ ### Setup
56
+
57
+ ```bash
58
  git clone https://github.com/HVision-NKU/DepthAnythingAC.git
59
  cd DepthAnythingAC
60
  conda create -n depth_anything_ac python=3.9
61
  conda activate depth_anything_ac
62
  pip install -r requirements.txt
63
  ```
64
+
65
  ## Usage
66
+
67
+ ### Get Depth-Anything-AC Model
68
+ Download the pre-trained checkpoints from huggingface:
69
+ ```bash
70
+ mkdir checkpoints
71
+ cd checkpoints
72
+
73
+ # (Optional) Using huggingface mirrors
74
+ export HF_ENDPOINT=https://hf-mirror.com
75
+
76
+ # download DepthAnything-AC model from huggingface
77
+ huggingface-cli download --resume-download ghost233lism/DepthAnything-AC --local-dir ghost233lism/DepthAnything-AC
78
+ ```
79
+
80
+ ### Quick Inference
81
+
82
+ We provide quick inference scripts for single/batch image input in `tools/`. Please refer to the [infer README](https://github.com/HVision-NKU/DepthAnythingAC/blob/main/tools/README.md) for detailed information.
83
+
84
+ ### Training
85
+ We provide the full training process of DepthAnything-AC, including consistency regularization, spatial distance extraction/constraint and wide-used Affine-Invariant Loss Function.
86
+
87
+ Prepare your configuration in `configs/` file and run:
88
+
89
+ ```bash
90
+ bash tools/train.sh <num_gpu> <port>
91
+ ```
92
+
93
+ ### Evaluation
94
+ We provide the direct evaluation for DA-2K, enhanced DA-2K, KITTI, NYU-D, Sintel, ETH3D, DIODE, NuScenes-Night, RobotCar-night, DS-rain/cloud/fog, KITTI-C benchmarks. You may refer to `configs/` for more details.
95
+
96
+ ```bash
97
+ bash tools/val.sh <num_gpu> <port> <dataset>
98
+ ```
99
+
100
+ ## Results
101
+
102
+ ### Quantitative Results
103
+
104
+ #### DA-2K Multi-Condition Robustness Results
105
+
106
+ Quantitative results on the enhanced multi-condition DA-2K benchmark, including complex light and climate conditions. The evaluation metric is **Accuracy** ↑.
107
+
108
+ | Method | Encoder | **DA-2K** | **DA-2K dark** | **DA-2K fog** | **DA-2K snow** | **DA-2K blur** |
109
+ |:-------|:-------:|:---------:|:---------------:|:--------------:|:---------------:|:---------------:|
110
+ | DynaDepth | ResNet | 0.655 | 0.652 | 0.613 | 0.605 | 0.633 |
111
+ | EC-Depth | ViT-S | 0.753 | 0.732 | 0.724 | 0.713 | 0.701 |
112
+ | STEPS | ResNet | 0.577 | 0.587 | 0.581 | 0.561 | 0.577 |
113
+ | RobustDepth | ViT-S | 0.724 | 0.716 | 0.686 | 0.668 | 0.680 |
114
+ | Weather-Depth | ViT-S | 0.745 | 0.724 | 0.716 | 0.697 | 0.666 |
115
+ | DepthPro | ViT-S | 0.947 | 0.872 | 0.902 | 0.793 | 0.772 |
116
+ | DepthAnything V1 | ViT-S | 0.884 | 0.859 | 0.836 | 0.880 | 0.821 |
117
+ | DepthAnything V2 | ViT-S | 0.952 | 0.910 | 0.922 | 0.880 | 0.862 |
118
+ | **Depth Anything AC** | ViT-S | **0.953** | **0.923** | **0.929** | **0.892** | **0.880** |
119
+
120
+ #### Zero-shot Relative Depth Estimation on Real Complex Benchmarks
121
+
122
+ Zero-shot evaluation results on challenging real-world scenarios including night scenes, adverse weather conditions, and complex environmental factors. All results use ViT-S encoder.
123
+
124
+ | Method | Encoder | **NuScenes-night** | | **RobotCar-night** | | **DS-rain** | | **DS-cloud** | | **DS-fog** | |
125
+ |:-------|:-------:|:----------------:|:---:|:----------------:|:---:|:---------:|:---:|:----------:|:---:|:--------:|:---:|
126
+ | | | AbsRel ↓ | δ₁ ↑ | AbsRel ↓ | δ₁ ↑ | AbsRel ↓ | δ₁ ↑ | AbsRel ↓ | δ₁ ↑ | AbsRel ↓ | δ₁ ↑ |
127
+ | DynaDepth | ResNet | 0.381 | 0.394 | 0.512 | 0.294 | 0.239 | 0.606 | 0.172 | 0.608 | 0.144 | 0.901 |
128
+ | EC-Depth | ViT-S | 0.243 | 0.623 | 0.228 | 0.552 | 0.155 | 0.766 | 0.158 | 0.767 | 0.109 | 0.861 |
129
+ | STEPS | ResNet | 0.252 | 0.588 | 0.350 | 0.367 | 0.301 | 0.480 | 0.252 | 0.588 | 0.216 | 0.641 |
130
+ | RobustDepth | ViT-S | 0.260 | 0.597 | 0.311 | 0.521 | 0.167 | 0.755 | 0.168 | 0.775 | 0.105 | 0.882 |
131
+ | Weather-Depth | ViT-S | - | - | - | - | 0.158 | 0.764 | 0.160 | 0.767 | 0.105 | 0.879 |
132
+ | Syn2Real | ViT-S | - | - | - | - | 0.171 | 0.729 | - | - | 0.128 | 0.845 |
133
+ | DepthPro | ViT-S | 0.218 | 0.669 | 0.237 | 0.534 | **0.124** | **0.841** | 0.158 | 0.779 | **0.102** | **0.892** |
134
+ | DepthAnything V1 | ViT-S | 0.232 | 0.679 | 0.239 | 0.518 | 0.133 | 0.819 | 0.150 | **0.801** | 0.098 | 0.891 |
135
+ | DepthAnything V2 | ViT-S | 0.200 | 0.725 | 0.239 | 0.518 | 0.125 | 0.840 | 0.151 | 0.798 | 0.103 | 0.890 |
136
+ | **Depth Anything AC** | ViT-S | **0.198** | **0.727** | **0.227** | **0.555** | 0.125 | 0.840 | **0.149** | **0.801** | 0.103 | 0.889 |
137
+
138
+ *Bold: Best performance, Underlined: Second best performance. NuScenes-night and RobotCar-night represent nighttime driving scenarios. DS-rain, DS-cloud, and DS-fog are DrivingStereo weather variation datasets.*
139
+
140
+ #### Zero-shot Relative Depth Estimation on Synthetic KITTI-C Benchmarks
141
+
142
+ Zero-shot evaluation results on synthetic KITTI-C corruption benchmarks, testing robustness against various image degradations and corruptions.
143
+
144
+ | Method | Encoder | **Dark** | | **Snow** | | **Motion** | | **Gaussian** | |
145
+ |:-------|:-------:|:--------:|:---:|:--------:|:---:|:----------:|:---:|:------------:|:---:|
146
+ | | | AbsRel ↓ | δ₁ ↑ | AbsRel ↓ | δ₁ ↑ | AbsRel ↓ | δ₁ ↑ | AbsRel ↓ | δ₁ ↑ |
147
+ | DynaDepth | ResNet | 0.163 | 0.752 | 0.338 | 0.393 | 0.234 | 0.609 | 0.274 | 0.501 |
148
+ | STEPS | ResNet | 0.230 | 0.631 | 0.242 | 0.622 | 0.291 | 0.508 | 0.204 | 0.692 |
149
+ | DepthPro | ViT-S | 0.145 | 0.793 | 0.197 | 0.685 | 0.170 | 0.746 | 0.170 | 0.745 |
150
+ | DepthAnything V2 | ViT-S | **0.130** | 0.832 | 0.115 | 0.872 | 0.127 | 0.840 | 0.157 | 0.785 |
151
+ | **Depth Anything AC** | ViT-S | **0.130** | **0.834** | **0.114** | **0.873** | **0.126** | **0.841** | **0.153** | **0.793** |
152
+
153
+ *KITTI-C includes synthetic corruptions: Dark (low-light conditions), Snow (weather simulation), Motion (motion blur), and Gaussian (noise corruption).*
154
 
155
  ## Citation
156
+
157
  If you find this work useful, please consider citing:
158
+
159
+ ```bibtex
160
  @article{sun2025depth,
161
  title={Depth Anything at Any Condition},
162
  author={Sun, Boyuan and Modi Jin and Bowen Yin and Hou, Qibin},
163
  journal={arXiv preprint arXiv:2507.01634},
164
  year={2025}
165
  }
166
+ ```
167
+
168
+ ## License
169
+
170
+ This code is licensed under the [Creative Commons Attribution-NonCommercial 4.0 International](https://creativecommons.org/licenses/by-nc/4.0/) for non-commercial use only.
171
+ Please note that any commercial use of this code requires formal permission prior to use.
172
+
173
+ ## Contact
174
+
175
+ For technical questions, please contact
176
+ sbysbysby123[AT]gmail.com or jin_modi[AT]mail.nankai.edu.cn
177
+
178
+ For commercial licensing, please contact andrewhoux[AT]gmail.com.
179
+
180
+ ## Acknowledgements
181
+
182
+ We thank the authors of [DepthAnything](https://github.com/LiheYoung/Depth-Anything) and [DepthAnything V2](https://github.com/DepthAnything/Depth-Anything-V2) for their foundational work. We also acknowledge [DINOv2](https://github.com/facebookresearch/dinov2) for the robust visual encoder, [CorrMatch](https://github.com/BBBBchan/CorrMatch) for their codebase, and [RoboDepth](https://github.com/ldkong1205/RoboDepth) for their contributions.