Ovis-Image / CLAUDE.md
tchung1970's picture
Consolidate to single app.py entry point
d41a998

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Ovis-Image is a 7-billion parameter text-to-image generation model optimized for high-quality text rendering. It's built upon Ovis-U1 and designed for consumer-grade GPU deployment. This is an inference-only repository.

Running the Application

# Install dependencies
pip install -r requirements.txt

# Run web UI (uses diffusers pipeline)
python app.py

# Standalone inference test
python ovis_image/test.py \
  --model_path <path/to/ovis_image.safetensors> \
  --ovis_path AIDC-AI/Ovis2.5-2B \
  --vae_path <path/to/ae.safetensors> \
  --prompt "your text prompt"

Models auto-download from HuggingFace Hub. Set HF_TOKEN environment variable if needed.

Architecture

Pipeline Flow: Prompt → OvisTokenizer → OvisEmbedder (Ovis2.5-2B LLM) → OvisImageModel (denoising) → AutoEncoder (VAE decode) → Image

Key Components:

  • ovis_image/sampling.py - Core inference pipeline with generate_image() function
  • ovis_image/model/model.py - OvisImageModel: 6 DoubleStreamBlock + 27 SingleStreamBlock layers
  • ovis_image/model/autoencoder.py - VAE for latent-to-pixel decoding
  • ovis_image/model/hf_embedder.py - OvisEmbedder wrapping Ovis2.5-2B as text encoder
  • ovis_image/model/layers.py - Transformer blocks, attention, embeddings
  • ovis_image/model/ops.py - Attention backends (Flash3, SDPA, eager)

Entry Point:

  • app.py - Uses OvisImagePipeline.from_pretrained() for simple, clean loading
  • app_old.py - Backup of modular implementation that loads individual components (for reference)

Configuration

Model hyperparameters are defined in ovis_image/__init__.py via ovis_image_configs["ovis-image-7b"]:

  • Hidden size: 3072, heads: 24
  • Double blocks: 6, single blocks: 27
  • Uses RoPE positional embeddings and classifier-free guidance

OvisImageModelArgs dataclass in ovis_image/model/args.py contains all model parameters.

Hardware Requirements

  • CUDA GPU required (~14GB VRAM minimum for bfloat16)
  • Automatic attention backend selection:
    • PyTorch 2.7+: SDPA with CUDNN
    • H100: Flash Attention 3 (optional)
    • Fallback: Eager attention

Dependencies

Key packages: torch, transformers >= 4.53.0, einops, safetensors, gradio

Requires custom diffusers fork: https://github.com/DoctorKey/diffusers.git@ovis-image

UI Standards for Image Generation Apps

Use Gradio 6.0.2+ with Apple-style CSS for all image generation UIs:

# Define theme separately
custom_theme = gr.themes.Soft(...).set(...)

# Minimal gr.Blocks
with gr.Blocks(title="App Name", fill_height=False) as demo:
    # UI components...
    demo.load(None, None, None, js=js_code)  # JS via load()

# Theme and CSS in launch()
demo.launch(theme=custom_theme, css=apple_css)

Key points:

  • theme and css go in demo.launch(), NOT gr.Blocks()
  • JS loaded via demo.load() for proper timing
  • No head parameter needed - Gradio 6 handles dark theme properly
  • Set sdk_version: 6.0.2 in README.md for HuggingFace Spaces