Improve model card: add pipeline tag, paper link, and documentation (#1)
Browse files- Improve model card: add pipeline tag, paper link, and documentation (1a60261bed2f0c1c392997b39a8e0bf96a6f17cf)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,26 +1,50 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
```
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: depth-estimation
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# Diffusion Knows Transparency (DKT)
|
| 7 |
+
|
| 8 |
+
DKT is a foundation model for **transparent-object**, **in-the-wild**, and **arbitrary-length** video depth and normal estimation. It repurposes video diffusion priors to achieve robust and temporally coherent perception for challenging real-world scenarios involving glass, plastic, and metal materials.
|
| 9 |
+
|
| 10 |
+
[**Paper**](https://huggingface.co/papers/2512.23705) | [**Project Page**](https://daniellli.github.io/projects/DKT/) | [**GitHub**](https://github.com/Daniellli/DKT)
|
| 11 |
+
|
| 12 |
+
## Introduction
|
| 13 |
+
|
| 14 |
+
Transparent objects pose a significant challenge for traditional perception systems due to refraction and reflection. DKT leverages modern video diffusion models that have internalized optical rules, using lightweight LoRA adapters to predict depth and normals. It yields temporally consistent predictions for arbitrary-length input videos and achieves state-of-the-art results on benchmarks like ClearPose and DREDS.
|
| 15 |
+
|
| 16 |
+
## Usage
|
| 17 |
+
|
| 18 |
+
To use this model, please follow the installation instructions in the [official GitHub repository](https://github.com/Daniellli/DKT).
|
| 19 |
+
|
| 20 |
+
```python
|
| 21 |
+
from dkt.pipelines.pipelines import DKTPipeline
|
| 22 |
+
import os
|
| 23 |
+
from tools.common_utils import save_video
|
| 24 |
+
|
| 25 |
+
# Initialize the pipeline
|
| 26 |
+
pipe = DKTPipeline()
|
| 27 |
+
|
| 28 |
+
# Define input path
|
| 29 |
+
demo_path = 'examples/1.mp4'
|
| 30 |
+
|
| 31 |
+
# Run inference
|
| 32 |
+
prediction = pipe(demo_path)
|
| 33 |
+
|
| 34 |
+
# Save result
|
| 35 |
+
save_dir = 'logs'
|
| 36 |
+
os.makedirs(save_dir, exist_ok=True)
|
| 37 |
+
output_path = os.path.join(save_dir, 'demo.mp4')
|
| 38 |
+
save_video(prediction['colored_depth_map'], output_path, fps=25)
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
## Citation
|
| 42 |
+
|
| 43 |
+
```bibtex
|
| 44 |
+
@article{dkt2025,
|
| 45 |
+
title = {Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation},
|
| 46 |
+
author = {Shaocong Xu and Songlin Wei and Qizhe Wei and Zheng Geng and Hong Li and Licheng Shen and Qianpu Sun and Shu Han and Bin Ma and Bohan Li and Chongjie Ye and Yuhang Zheng and Nan Wang and Saining Zhang and Hao Zhao},
|
| 47 |
+
journal = {https://arxiv.org/abs/2512.23705},
|
| 48 |
+
year = {2025}
|
| 49 |
+
}
|
| 50 |
```
|