| license: apache-2.0 | |
| tags: | |
| - Art | |
| - Image Generation | |
| - Image Editing | |
| - Video Generation | |
| - Vision Translation | |
| - Bridge Model | |
| pipeline_tag: any-to-any | |
| # 🎥 ViBT: Vision Bridge Transformer at Scale | |
| <div style="text-align: center; display: flex; justify-content: left; gap: 5px;"> | |
| <a href="https://yuanshi9815.github.io/ViBT_homepage"><img src="https://img.shields.io/badge/Web-Project Page-1d72b8.svg" alt="Project Page"></a> | |
| <a href="https://arxiv.org/abs/2511.23199"><img src="https://img.shields.io/badge/ariXv-Paper-A42C25.svg" alt="arXiv"></a> | |
| <a href="https://huggingface.co/spaces/Yuanshi/ViBT"><img src="https://img.shields.io/badge/🤗Huggingface-Space-ffbd45.svg" alt="HuggingFace"></a> | |
| <a href="https://github.com/Yuanshi9815/ViBT"><img src="https://img.shields.io/badge/GitHub-Code-blue.svg?logo=github&" alt="GitHub"></a> | |
| </div> | |
| This repository introduces **Vision Bridge Transformer (ViBT)**, a large-scale instantiation of Brownian Bridge Models designed for efficient conditional generation. ViBT directly models the trajectory between inputs and outputs, creating an efficient data-to-data translation paradigm. The models demonstrate effectiveness for various image and video translation tasks, including instruction-based image editing and complex video translation. |