Spaces:
Running
on
Zero
A newer version of the Gradio SDK is available:
6.1.0
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Ovis-Image is a 7-billion parameter text-to-image generation model optimized for high-quality text rendering. It's built upon Ovis-U1 and designed for consumer-grade GPU deployment. This is an inference-only repository.
Running the Application
# Install dependencies
pip install -r requirements.txt
# Run web UI (uses diffusers pipeline)
python app.py
# Standalone inference test
python ovis_image/test.py \
--model_path <path/to/ovis_image.safetensors> \
--ovis_path AIDC-AI/Ovis2.5-2B \
--vae_path <path/to/ae.safetensors> \
--prompt "your text prompt"
Models auto-download from HuggingFace Hub. Set HF_TOKEN environment variable if needed.
Architecture
Pipeline Flow: Prompt → OvisTokenizer → OvisEmbedder (Ovis2.5-2B LLM) → OvisImageModel (denoising) → AutoEncoder (VAE decode) → Image
Key Components:
ovis_image/sampling.py- Core inference pipeline withgenerate_image()functionovis_image/model/model.py- OvisImageModel: 6 DoubleStreamBlock + 27 SingleStreamBlock layersovis_image/model/autoencoder.py- VAE for latent-to-pixel decodingovis_image/model/hf_embedder.py- OvisEmbedder wrapping Ovis2.5-2B as text encoderovis_image/model/layers.py- Transformer blocks, attention, embeddingsovis_image/model/ops.py- Attention backends (Flash3, SDPA, eager)
Entry Point:
app.py- UsesOvisImagePipeline.from_pretrained()for simple, clean loadingapp_old.py- Backup of modular implementation that loads individual components (for reference)
Configuration
Model hyperparameters are defined in ovis_image/__init__.py via ovis_image_configs["ovis-image-7b"]:
- Hidden size: 3072, heads: 24
- Double blocks: 6, single blocks: 27
- Uses RoPE positional embeddings and classifier-free guidance
OvisImageModelArgs dataclass in ovis_image/model/args.py contains all model parameters.
Hardware Requirements
- CUDA GPU required (~14GB VRAM minimum for bfloat16)
- Automatic attention backend selection:
- PyTorch 2.7+: SDPA with CUDNN
- H100: Flash Attention 3 (optional)
- Fallback: Eager attention
Dependencies
Key packages: torch, transformers >= 4.53.0, einops, safetensors, gradio
Requires custom diffusers fork: https://github.com/DoctorKey/diffusers.git@ovis-image
UI Standards for Image Generation Apps
Use Gradio 6.0.2+ with Apple-style CSS for all image generation UIs:
# Define theme separately
custom_theme = gr.themes.Soft(...).set(...)
# Minimal gr.Blocks
with gr.Blocks(title="App Name", fill_height=False) as demo:
# UI components...
demo.load(None, None, None, js=js_code) # JS via load()
# Theme and CSS in launch()
demo.launch(theme=custom_theme, css=apple_css)
Key points:
themeandcssgo indemo.launch(), NOTgr.Blocks()- JS loaded via
demo.load()for proper timing - No
headparameter needed - Gradio 6 handles dark theme properly - Set
sdk_version: 6.0.2in README.md for HuggingFace Spaces