iitolstykh commited on
Commit
0808787
·
verified ·
1 Parent(s): 53cc5b2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -3
README.md CHANGED
@@ -1,3 +1,52 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: image-to-image
6
+ tags:
7
+ - image-editing
8
+ - text-guided-editing
9
+ - diffusion
10
+ - sana
11
+ - qwen-vl
12
+ - multimodal
13
+ base_model:
14
+ - Efficient-Large-Model/SANA1.5_1.6B_1024px
15
+ - Qwen/Qwen3-VL-2B-Instruct
16
+ library_name: diffusers
17
+ ---
18
+
19
+ # VIBE: Visual Instruction Based Editor
20
+
21
+ **VIBE** is a powerful open-source framework for text-guided image editing. It leverages the efficiency of the [Sana1.5-1.6B](github.com/NVlabs/Sana) diffusion model and the visual understanding capabilities of [Qwen3-VL-2B-Instruct](github.com/QwenLM/Qwen3-VL) to provide **exceptionally fast** and high-quality, instruction-based image manipulation.
22
+
23
+ ## Model Details
24
+
25
+ - **Name:** VIBE
26
+ - **Task:** Text-Guided Image Editing
27
+ - **Architecture:**
28
+ - **Diffusion Backbone:** Sana1.5 (1.6B parameters) with Linear Attention.
29
+ - **Condition Encoder:** Qwen3-VL (2B parameters) for multimodal understanding.
30
+ - **Framework:** Built on `diffusers` and `transformers`.
31
+
32
+ ## Features
33
+
34
+ - **Text-Guided Editing:** Edit images using natural language instructions (e.g., "Add a cat on the sofa").
35
+ - **Compact & Efficient:** Combines a 1.6B parameter diffusion model with a 2B parameter encoder for a lightweight footprint.
36
+ - **High-Speed Inference:** Utilizes Sana1.5's linear attention mechanism for rapid generation.
37
+ - **Multimodal Understanding:** Qwen3-VL ensures strong alignment between visual content and text instructions.
38
+
39
+ ## Citation
40
+
41
+ If you use this model in your research or applications, please acknowledge the original projects:
42
+
43
+ - [Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer](github.com/NVlabs/Sana)
44
+ - [Qwen3-VL](github.com/QwenLM/Qwen3-VL)
45
+
46
+ ```bibtex
47
+ @misc{vibe2025,
48
+ author = {Grigorii Alekseenko, Aleksandr Gordeev, Irina Tostykh, Bulat Suleimanov, Vladimir Dokholyan, Georgii Fedorov, Sergey Yakubson, Aleksandra Tsybina, Mikhail Chernyshov, Maksim Kuprashevich},
49
+ title = {VIBE: Visual Instruction Based Editor},
50
+ year = {2025},
51
+ }
52
+ ```