VIBE: Visual Instruction Based Editor

VIBE

🌐 Project Page | πŸ“œ Paper on arXiv | Github | πŸ€— Space | πŸ€— VIBE-Image-Edit |

VIBE-DistilledCFG is a specialized version of the original VIBE-Image-Edit model.

This model can be run without classifier-free guidance, substantially reducing image generation time while maintaining high quality outputs.

Performance Comparison

Below is a comparison of total inference time between the original VIBE model (using CFG) and this DistilledCFG model (without CFG). The distillation process yields an approx 1.8x - 2x speedup.

Resolution Original Model (with CFG) DistilledCFG Model (No CFG)
1024x1024 1.1453s 0.6389s
2048x2048 4.0837s 1.9687s

Model Details

  • Name: VIBE-DistilledCFG
  • Parent Model: iitolstykh/VIBE-Image-Edit
  • Task: Text-Guided Image Editing
  • Architecture:
    • Diffusion Backbone: Sana1.5 (1.6B parameters) with Linear Attention.
    • Condition Encoder: Qwen3-VL (2B parameters).
  • Technique: Classifier-Free Guidance (CFG) Distillation.
  • Model precision: torch.bfloat16 (BF16)
  • Model resolution: Optimized for up to 2048px images.

Features

  • Blazing Fast Inference: Runs approximately 2x faster than the original model by skipping the guidance pass.
  • Text-Guided Editing: Edit images using natural language instructions.
  • Compact & Efficient: Retains the lightweight footprint of the original 1.6B/2B architecture.
  • Multimodal Understanding: Powered by Qwen3-VL for precise instruction following.
  • Text-to-Image support.

Inference Requirements

  • vibe library
pip install git+https://github.com/ai-forever/VIBE
  • requirements for vibe library:
pip install transformers==4.57.1 torchvision==0.21.0 torch==2.6.0 diffusers==0.33.1 loguru==0.7.3

Quick start

Note: When using this distilled model, please set image_guidance_scale and guidance_scale to 0.0 to disable CFG.

from PIL import Image
import requests
from io import BytesIO
from huggingface_hub import snapshot_download

from vibe.editor import ImageEditor

# Download model
model_path = snapshot_download(
    repo_id="iitolstykh/VIBE-Image-Edit-DistilledCFG",
    repo_type="model",
)

# Load model
# Note: Guidance scales are removed for the distilled version
editor = ImageEditor(
    checkpoint_path=model_path,
    num_inference_steps=20,
    image_guidance_scale=0.0,
    guidance_scale=0.0,
    device="cuda:0",
)

# Download test image
resp = requests.get('https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/3f58a82a-b4b4-40c3-a318-43f9350fcd02/original=true,quality=90/115610275.jpeg')
image = Image.open(BytesIO(resp.content))

# Generate edited image
edited_image = editor.generate_edited_image(
    instruction="let this case swim in the river",
    conditioning_image=image,
    num_images_per_prompt=1,
)[0]

edited_image.save(f"edited_image.jpg", quality=100)

License

This project is built upon the SANA. Please refer to the original SANA license for usage terms: SANA License

Citation

If you use this model in your research or applications, please acknowledge the original projects:

@misc{vibe2026,
  Author = {Grigorii Alekseenko and Aleksandr Gordeev and Irina Tolstykh and Bulat Suleimanov and Vladimir Dokholyan and Georgii Fedorov and Sergey Yakubson and Aleksandra Tsybina and Mikhail Chernyshov and Maksim Kuprashevich},
  Title = {VIBE: Visual Instruction Based Editor},
  Year = {2026},
  Eprint = {arXiv:2601.02242},
}
Downloads last month
38
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for iitolstykh/VIBE-Image-Edit-DistilledCFG

Unable to build the model tree, the base model loops to the model itself. Learn more.

Space using iitolstykh/VIBE-Image-Edit-DistilledCFG 1

Collection including iitolstykh/VIBE-Image-Edit-DistilledCFG

Paper for iitolstykh/VIBE-Image-Edit-DistilledCFG