Add model card for DVD
#1
by nielsr HF Staff - opened
README.md
ADDED
|
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-4.0
|
| 3 |
+
pipeline_tag: depth-estimation
|
| 4 |
+
tags:
|
| 5 |
+
- video-depth-estimation
|
| 6 |
+
- generative-priors
|
| 7 |
+
- computer-vision
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# DVD: Deterministic Video Depth Estimation with Generative Priors
|
| 11 |
+
|
| 12 |
+
This repository contains the official pre-trained weights for **DVD**, a framework designed for robust, zero-shot relative video depth estimation.
|
| 13 |
+
|
| 14 |
+
[**Project Page**](https://dvd-project.github.io/) | [**Paper (arXiv)**](https://huggingface.co/papers/2603.12250) | [**GitHub Repository**](https://github.com/EnVision-Research/DVD) | [**Gradio Demo**](https://huggingface.co/spaces/haodongli/DVD)
|
| 15 |
+
|
| 16 |
+
## ๐ Introduction
|
| 17 |
+
|
| 18 |
+
**DVD** (Deterministic Video Depth) is the first framework that deterministically adapts pre-trained Video Diffusion Models (such as WanV2.1) into single-pass depth regressors. By stripping away generative stochasticity, DVD unites the profound semantic priors of generative models with the structural stability of discriminative regressors, breaking the trade-off between stochastic hallucinations and semantic ambiguity.
|
| 19 |
+
|
| 20 |
+
### โจ Key Highlights
|
| 21 |
+
|
| 22 |
+
* ๐ **Extreme Data Efficiency:** DVD effectively unlocks profound generative priors using only **367K frames**โwhich is **163ร less** task-specific training data than leading discriminative baselines.
|
| 23 |
+
* โฑ๏ธ **Deterministic & Fast:** Bypasses iterative ODE integration. Inference is performed in a single forward pass, ensuring absolute temporal stability without generative hallucinations.
|
| 24 |
+
* ๐ **Unparalleled Structural Fidelity:** Powered by Latent Manifold Rectification (LMR), DVD achieves state-of-the-art high-frequency boundary precision compared to overly smoothed baselines.
|
| 25 |
+
* ๐ฅ **Long-Video Inference:** Equipped with a training-free *Global Affine Coherence* module, DVD seamlessly stitches sliding windows to support long-video rollouts with negligible scale drift.
|
| 26 |
+
|
| 27 |
+
## ๐ ๏ธ Installation
|
| 28 |
+
|
| 29 |
+
To use DVD, clone the repository and install the dependencies:
|
| 30 |
+
|
| 31 |
+
```bash
|
| 32 |
+
git clone https://github.com/EnVision-Research/DVD.git
|
| 33 |
+
cd DVD
|
| 34 |
+
conda create -n dvd python=3.10 -y
|
| 35 |
+
conda activate dvd
|
| 36 |
+
pip install -e .
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
## ๐น๏ธ Inference
|
| 40 |
+
|
| 41 |
+
You can run a quick start with demo videos using the provided script:
|
| 42 |
+
|
| 43 |
+
```bash
|
| 44 |
+
bash infer_bash/openworld.sh
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
## ๐ Citation
|
| 48 |
+
|
| 49 |
+
If you find this work useful in your research, please consider citing:
|
| 50 |
+
|
| 51 |
+
```bibtex
|
| 52 |
+
@article{zhang2026dvd,
|
| 53 |
+
title={DVD: Deterministic Video Depth Estimation with Generative Priors},
|
| 54 |
+
author={Zhang, Hongfei and Chen, Harold Haodong and Liao, Chenfei and He, Jing and Zhang, Zixin and Li, Haodong and Liang, Yihao and Chen, Kanghao and Ren, Bin and Zheng, Xu and Yang, Shuai and Zhou, Kun and Li, Yinchuan and Sebe, Nicu and Chen, Ying-Cong},
|
| 55 |
+
journal={arXiv preprint arXiv:2603.12250},
|
| 56 |
+
year={2026}
|
| 57 |
+
}
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
## ๐ License
|
| 61 |
+
|
| 62 |
+
- **Code:** Released under the [Apache 2.0 License](https://github.com/EnVision-Research/DVD/blob/main/LICENSE).
|
| 63 |
+
- **Model Weights:** Released under the [CC BY-NC 4.0 License](https://creativecommons.org/licenses/by-nc/4.0/), which strictly limits usage to non-commercial, academic, and research purposes.
|