--- language: - en pipeline_tag: image-to-image tags: - image-editing - text-guided-editing - diffusion - sana - qwen-vl - multimodal - distilled - cfg-distillation base_model: - iitolstykh/VIBE-Image-Edit library_name: diffusers --- # VIBE: Visual Instruction Based Editor
🌐 Project Page | 📜 Paper on arXiv | Github | 🤗 Space | 🤗 VIBE-Image-Edit |
**VIBE-DistilledCFG** is a specialized version of the original [VIBE-Image-Edit](https://huggingface.co/iitolstykh/VIBE-Image-Edit) model. This model can be run without classifier-free guidance, substantially reducing image generation time while maintaining high quality outputs. ## Performance Comparison Below is a comparison of total inference time between the original VIBE model (using CFG) and this DistilledCFG model (without CFG). The distillation process yields an approx **1.8x - 2x speedup**. | Resolution | Original Model (with CFG) | DistilledCFG Model (No CFG) | | :--- | :--- | :--- | | **1024x1024** | 1.1453s | **0.6389s** | | **2048x2048** | 4.0837s | **1.9687s** | ## Model Details - **Name:** VIBE-DistilledCFG - **Parent Model:** [iitolstykh/VIBE-Image-Edit](https://huggingface.co/iitolstykh/VIBE-Image-Edit) - **Task:** Text-Guided Image Editing - **Architecture:** - **Diffusion Backbone:** Sana1.5 (1.6B parameters) with Linear Attention. - **Condition Encoder:** Qwen3-VL (2B parameters). - **Technique:** Classifier-Free Guidance (CFG) Distillation. - **Model precision**: torch.bfloat16 (BF16) - **Model resolution**: Optimized for up to 2048px images. ## Features - **Blazing Fast Inference:** Runs approximately 2x faster than the original model by skipping the guidance pass. - **Text-Guided Editing:** Edit images using natural language instructions. - **Compact & Efficient:** Retains the lightweight footprint of the original 1.6B/2B architecture. - **Multimodal Understanding:** Powered by Qwen3-VL for precise instruction following. - **Text-to-Image** support. # Inference Requirements - `vibe` library ```bash pip install git+https://github.com/ai-forever/VIBE ``` - requirements for `vibe` library: ```bash pip install transformers==4.57.1 torchvision==0.21.0 torch==2.6.0 diffusers==0.33.1 loguru==0.7.3 ``` # Quick start **Note:** When using this distilled model, please set `image_guidance_scale` and `guidance_scale` to 0.0 to disable CFG. ```python from PIL import Image import requests from io import BytesIO from huggingface_hub import snapshot_download from vibe.editor import ImageEditor # Download model model_path = snapshot_download( repo_id="iitolstykh/VIBE-Image-Edit-DistilledCFG", repo_type="model", ) # Load model # Note: Guidance scales are removed for the distilled version editor = ImageEditor( checkpoint_path=model_path, num_inference_steps=20, image_guidance_scale=0.0, guidance_scale=0.0, device="cuda:0", ) # Download test image resp = requests.get('https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/3f58a82a-b4b4-40c3-a318-43f9350fcd02/original=true,quality=90/115610275.jpeg') image = Image.open(BytesIO(resp.content)) # Generate edited image edited_image = editor.generate_edited_image( instruction="let this case swim in the river", conditioning_image=image, num_images_per_prompt=1, )[0] edited_image.save(f"edited_image.jpg", quality=100) ``` ## License This project is built upon the SANA. Please refer to the original SANA license for usage terms: [SANA License](https://huggingface.co/Efficient-Large-Model/SANA1.5_4.8B_1024px_diffusers/blob/main/LICENSE.txt) ## Citation If you use this model in your research or applications, please acknowledge the original projects: - [SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer](https://github.com/NVlabs/Sana) - [Qwen3-VL](https://github.com/QwenLM/Qwen3-VL) ```bibtex @misc{vibe2026, Author = {Grigorii Alekseenko and Aleksandr Gordeev and Irina Tolstykh and Bulat Suleimanov and Vladimir Dokholyan and Georgii Fedorov and Sergey Yakubson and Aleksandra Tsybina and Mikhail Chernyshov and Maksim Kuprashevich}, Title = {VIBE: Visual Instruction Based Editor}, Year = {2026}, Eprint = {arXiv:2601.02242}, } ```