Add pipeline tag, library name, and explicit links to model card
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,6 +1,9 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
| 4 |
# StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors
|
| 5 |
|
| 6 |
<!-- <div align="center" style="margin-top: 0px; margin-bottom: 0px;">
|
|
@@ -17,6 +20,8 @@ _**[Guibao Shen](https://a-bigbao.github.io)<sup>1,3*โ </sup>, [Yihua Du](https
|
|
| 17 |
|
| 18 |
</div>
|
| 19 |
|
|
|
|
|
|
|
| 20 |
## ๐ Introduction
|
| 21 |
|
| 22 |
**TL;DR:** We propose **StereoPilot**, an efficient feed-forward architecture that leverages pretrained video diffusion transformers to directly synthesize novel views, overcoming the limitations of *Depth-Warp-Inpaint* methods without iterative denoising. With a domain switcher and cycle consistency loss, it enables robust multi-format stereo conversion. We also introduce **UniStereo**, the first large-scale unified dataset featuring both parallel and converged stereo formats.
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
pipeline_tag: image-to-video
|
| 4 |
+
library_name: diffusers
|
| 5 |
---
|
| 6 |
+
|
| 7 |
# StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors
|
| 8 |
|
| 9 |
<!-- <div align="center" style="margin-top: 0px; margin-bottom: 0px;">
|
|
|
|
| 20 |
|
| 21 |
</div>
|
| 22 |
|
| 23 |
+
### [[Project Page]](https://hit-perfect.github.io/StereoPilot/) [[arXiv]](https://arxiv.org/abs/2512.16915) [[Code]](https://github.com/KlingTeam/StereoPilot) [Dataset]
|
| 24 |
+
|
| 25 |
## ๐ Introduction
|
| 26 |
|
| 27 |
**TL;DR:** We propose **StereoPilot**, an efficient feed-forward architecture that leverages pretrained video diffusion transformers to directly synthesize novel views, overcoming the limitations of *Depth-Warp-Inpaint* methods without iterative denoising. With a domain switcher and cycle consistency loss, it enables robust multi-format stereo conversion. We also introduce **UniStereo**, the first large-scale unified dataset featuring both parallel and converged stereo formats.
|