Add pipeline tag, library name, and explicit links to model card
Browse filesThis PR enhances the model card for StereoPilot by adding crucial metadata and explicit links, improving its discoverability and utility on the Hugging Face Hub.
The updates include:
- **Metadata**: Added `pipeline_tag: image-to-video` to accurately categorize the model's functionality and `library_name: diffusers` due to its use of diffusion transformers and `.safetensors` checkpoints, enabling automated usage snippets.
- **Content**: Inserted explicit links to the project page, arXiv paper, and the GitHub repository below the author block to ensure easy access to core project resources. This aligns the model card more closely with the structure found in the original GitHub README and adheres to best practices for model cards, while respecting the instruction to not replace the arXiv link.
The existing comprehensive content, including installation and inference instructions, remains unchanged.
|
@@ -1,6 +1,9 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
| 4 |
# StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors
|
| 5 |
|
| 6 |
<!-- <div align="center" style="margin-top: 0px; margin-bottom: 0px;">
|
|
@@ -17,6 +20,8 @@ _**[Guibao Shen](https://a-bigbao.github.io)<sup>1,3*†</sup>, [Yihua Du](https
|
|
| 17 |
|
| 18 |
</div>
|
| 19 |
|
|
|
|
|
|
|
| 20 |
## 📖 Introduction
|
| 21 |
|
| 22 |
**TL;DR:** We propose **StereoPilot**, an efficient feed-forward architecture that leverages pretrained video diffusion transformers to directly synthesize novel views, overcoming the limitations of *Depth-Warp-Inpaint* methods without iterative denoising. With a domain switcher and cycle consistency loss, it enables robust multi-format stereo conversion. We also introduce **UniStereo**, the first large-scale unified dataset featuring both parallel and converged stereo formats.
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
pipeline_tag: image-to-video
|
| 4 |
+
library_name: diffusers
|
| 5 |
---
|
| 6 |
+
|
| 7 |
# StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors
|
| 8 |
|
| 9 |
<!-- <div align="center" style="margin-top: 0px; margin-bottom: 0px;">
|
|
|
|
| 20 |
|
| 21 |
</div>
|
| 22 |
|
| 23 |
+
### [[Project Page]](https://hit-perfect.github.io/StereoPilot/) [[arXiv]](https://arxiv.org/abs/2512.16915) [[Code]](https://github.com/KlingTeam/StereoPilot) [Dataset]
|
| 24 |
+
|
| 25 |
## 📖 Introduction
|
| 26 |
|
| 27 |
**TL;DR:** We propose **StereoPilot**, an efficient feed-forward architecture that leverages pretrained video diffusion transformers to directly synthesize novel views, overcoming the limitations of *Depth-Warp-Inpaint* methods without iterative denoising. With a domain switcher and cycle consistency loss, it enables robust multi-format stereo conversion. We also introduce **UniStereo**, the first large-scale unified dataset featuring both parallel and converged stereo formats.
|