|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- Art |
|
|
- Image Generation |
|
|
- Image Editing |
|
|
- Video Generation |
|
|
- Vision Translation |
|
|
- Bridge Model |
|
|
pipeline_tag: any-to-any |
|
|
library_name: diffusion-single-file |
|
|
--- |
|
|
|
|
|
# 🎥 ViBT: Vision Bridge Transformer at Scale |
|
|
<div style="text-align: center; display: flex; justify-content: left; gap: 5px;"> |
|
|
<a href="https://yuanshi9815.github.io/ViBT_homepage"><img src="https://img.shields.io/badge/Web-Project Page-1d72b8.svg" alt="Project Page"></a> |
|
|
<a href="https://arxiv.org/abs/2511.23199"><img src="https://img.shields.io/badge/ariXv-Paper-A42C25.svg" alt="arXiv"></a> |
|
|
<a href="https://huggingface.co/spaces/Yuanshi/ViBT"><img src="https://img.shields.io/badge/🤗Huggingface-Space-ffbd45.svg" alt="HuggingFace"></a> |
|
|
<a href="https://github.com/Yuanshi9815/ViBT"><img src="https://img.shields.io/badge/GitHub-Code-blue.svg?logo=github&" alt="GitHub"></a> |
|
|
</div> |
|
|
|
|
|
This repository introduces **Vision Bridge Transformer (ViBT)**, a large-scale instantiation of Brownian Bridge Models designed for efficient conditional generation. ViBT directly models the trajectory between inputs and outputs, creating an efficient data-to-data translation paradigm. The models demonstrate effectiveness for various image and video translation tasks, including instruction-based image editing and complex video translation. |