Update model card with pipeline tag, links and citation

Hi! I'm Niels, part of the community science team at Hugging Face.

This PR improves the model card for DKT by:
- Adding the `image-to-image` pipeline tag to the metadata for better discoverability.
- Adding links to the [research paper](https://huggingface.co/papers/2512.23705), [project page](https://daniellli.github.io/projects/DKT/), and [GitHub repository](https://github.com/Daniellli/DKT).
- Including a code snippet for usage as found in the official repository.
- Adding a BibTeX citation for the paper.

These changes help users understand and use your model more effectively on the Hugging Face Hub.

Files changed (1) hide show

README.md +45 -26

README.md CHANGED Viewed

@@ -1,27 +1,46 @@
----
-license: apache-2.0
----
-# Model Card for DKT
-This repository contains the weights of `Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation`
-## Usage
-See the Github repository: [DKT](https://github.com/Daniellli/DKT) regarding installation instructions.
-The model can then be used as follows:
-```python
-from dkt.pipelines.pipelines import DKTPipeline
-import os
-from tools.common_utils import save_video
-pipe = DKTPipeline()
-demo_path = 'examples/1.mp4'
-prediction = pipe(demo_path)
-save_dir = 'logs'
-os.makedirs(save_dir, exist_ok=True)
-output_path = os.path.join(save_dir, 'demo.mp4')
-save_video(prediction['colored_depth_map'], output_path, fps=25)
 ```

+---
+license: apache-2.0
+pipeline_tag: image-to-image
+---
+# Diffusion Knows Transparency (DKT)
+This repository contains the weights for **DKT** (Diffusion Knows Transparency), a foundation model for transparent-object, in-the-wild, and arbitrary-length video depth and normal estimation.
+[**Project Page**](https://daniellli.github.io/projects/DKT/) | [**GitHub**](https://github.com/Daniellli/DKT) | [**Paper**](https://huggingface.co/papers/2512.23705)
+## Introduction
+DKT repurposes generative video priors from large-scale diffusion models into robust, temporally coherent perception tasks. By learning a video-to-video translator for depth and normals via lightweight LoRA adapters, it achieves zero-shot SOTA results on benchmarks involving transparency, such as ClearPose and DREDS.
+## Usage
+To use this model, please clone the [GitHub repository](https://github.com/Daniellli/DKT) and follow the installation instructions.
+```python
+from dkt.pipelines.pipelines import DKTPipeline
+import os
+from tools.common_utils import save_video
+# Initialize the pipeline
+pipe = DKTPipeline()
+# Path to your input video
+demo_path = 'examples/1.mp4'
+prediction = pipe(demo_path)
+# Save the output
+save_dir = 'logs'
+os.makedirs(save_dir, exist_ok=True)
+output_path = os.path.join(save_dir, 'demo.mp4')
+save_video(prediction['colored_depth_map'], output_path, fps=25)
+```
+## Citation
+```bibtex
+@article{dkt2025,
+  title   = {Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation},
+  author  = {Shaocong Xu and Songlin Wei and Qizhe Wei and Zheng Geng and Hong Li and Licheng Shen and Qianpu Sun and Shu Han and Bin Ma and Bohan Li and Chongjie Ye and Yuhang Zheng and Nan Wang and Saining Zhang and Hao Zhao},
+  journal = {arXiv preprint arXiv:2512.23705},
+  year    = {2025}
+}
 ```