iitolstykh
/

VIBE-Image-Edit

VIBESanaEditingPipeline

text-guided-editing

Model card Files Files and versions

iitolstykh commited on 17 days ago

Commit

0808787

·

verified ·

1 Parent(s): 53cc5b2

Update README.md

Files changed (1) hide show

README.md +52 -3

README.md CHANGED Viewed

@@ -1,3 +1,52 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- en
+pipeline_tag: image-to-image
+tags:
+- image-editing
+- text-guided-editing
+- diffusion
+- sana
+- qwen-vl
+- multimodal
+base_model:
+- Efficient-Large-Model/SANA1.5_1.6B_1024px
+- Qwen/Qwen3-VL-2B-Instruct
+library_name: diffusers
+---
+# VIBE: Visual Instruction Based Editor
+**VIBE** is a powerful open-source framework for text-guided image editing. It leverages the efficiency of the [Sana1.5-1.6B](github.com/NVlabs/Sana) diffusion model and the visual understanding capabilities of [Qwen3-VL-2B-Instruct](github.com/QwenLM/Qwen3-VL) to provide **exceptionally fast** and high-quality, instruction-based image manipulation.
+## Model Details
+- **Name:** VIBE
+- **Task:** Text-Guided Image Editing
+- **Architecture:**
+  - **Diffusion Backbone:** Sana1.5 (1.6B parameters) with Linear Attention.
+  - **Condition Encoder:** Qwen3-VL (2B parameters) for multimodal understanding.
+- **Framework:** Built on `diffusers` and `transformers`.
+## Features
+- **Text-Guided Editing:** Edit images using natural language instructions (e.g., "Add a cat on the sofa").
+- **Compact & Efficient:** Combines a 1.6B parameter diffusion model with a 2B parameter encoder for a lightweight footprint.
+- **High-Speed Inference:** Utilizes Sana1.5's linear attention mechanism for rapid generation.
+- **Multimodal Understanding:** Qwen3-VL ensures strong alignment between visual content and text instructions.
+## Citation
+If you use this model in your research or applications, please acknowledge the original projects:
+- [Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer](github.com/NVlabs/Sana)
+- [Qwen3-VL](github.com/QwenLM/Qwen3-VL)
+```bibtex
+@misc{vibe2025,
+  author = {Grigorii Alekseenko, Aleksandr Gordeev, Irina Tostykh, Bulat Suleimanov, Vladimir Dokholyan, Georgii Fedorov, Sergey Yakubson, Aleksandra Tsybina, Mikhail Chernyshov, Maksim Kuprashevich},
+  title = {VIBE: Visual Instruction Based Editor},
+  year = {2025},
+}
+```