File size: 1,331 Bytes
5b53ce6
 
 
 
 
 
 
 
 
5621cf5
46170e1
d467a84
 
 
 
b5da6d2
d467a84
 
 
7367b20
 
5621cf5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
---
license: apache-2.0
tags:
- Art
- Image Generation
- Image Editing
- Video Generation
- Vision Translation
- Bridge Model
pipeline_tag: any-to-any
library_name: diffusion-single-file
---

# 🎥 ViBT: Vision Bridge Transformer at Scale  
<div style="text-align: center; display: flex; justify-content: left; gap: 5px;">
<a href="https://yuanshi9815.github.io/ViBT_homepage"><img src="https://img.shields.io/badge/Web-Project Page-1d72b8.svg" alt="Project Page"></a>
<a href="https://arxiv.org/abs/2511.23199"><img src="https://img.shields.io/badge/ariXv-Paper-A42C25.svg" alt="arXiv"></a>
<a href="https://huggingface.co/spaces/Yuanshi/ViBT"><img src="https://img.shields.io/badge/🤗Huggingface-Space-ffbd45.svg" alt="HuggingFace"></a>
<a href="https://github.com/Yuanshi9815/ViBT"><img src="https://img.shields.io/badge/GitHub-Code-blue.svg?logo=github&" alt="GitHub"></a>
</div>

This repository introduces **Vision Bridge Transformer (ViBT)**, a large-scale instantiation of Brownian Bridge Models designed for efficient conditional generation. ViBT directly models the trajectory between inputs and outputs, creating an efficient data-to-data translation paradigm. The models demonstrate effectiveness for various image and video translation tasks, including instruction-based image editing and complex video translation.