Ovis-Image

Running on Zero

App Files Files Community

Ovis-Image / CLAUDE.md

tchung1970

Consolidate to single app.py entry point

d41a998 5 days ago

preview code

raw

history blame contribute delete

3.15 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Ovis-Image is a 7-billion parameter text-to-image generation model optimized for high-quality text rendering. It's built upon Ovis-U1 and designed for consumer-grade GPU deployment. This is an inference-only repository.

Running the Application

# Install dependencies
pip install -r requirements.txt

# Run web UI (uses diffusers pipeline)
python app.py

# Standalone inference test
python ovis_image/test.py \
  --model_path <path/to/ovis_image.safetensors> \
  --ovis_path AIDC-AI/Ovis2.5-2B \
  --vae_path <path/to/ae.safetensors> \
  --prompt "your text prompt"

Models auto-download from HuggingFace Hub. Set HF_TOKEN environment variable if needed.

Architecture

Pipeline Flow: Prompt → OvisTokenizer → OvisEmbedder (Ovis2.5-2B LLM) → OvisImageModel (denoising) → AutoEncoder (VAE decode) → Image

Key Components:

ovis_image/sampling.py - Core inference pipeline with generate_image() function
ovis_image/model/model.py - OvisImageModel: 6 DoubleStreamBlock + 27 SingleStreamBlock layers
ovis_image/model/autoencoder.py - VAE for latent-to-pixel decoding
ovis_image/model/hf_embedder.py - OvisEmbedder wrapping Ovis2.5-2B as text encoder
ovis_image/model/layers.py - Transformer blocks, attention, embeddings
ovis_image/model/ops.py - Attention backends (Flash3, SDPA, eager)

Entry Point:

app.py - Uses OvisImagePipeline.from_pretrained() for simple, clean loading
app_old.py - Backup of modular implementation that loads individual components (for reference)

Configuration

Model hyperparameters are defined in ovis_image/__init__.py via ovis_image_configs["ovis-image-7b"]:

Hidden size: 3072, heads: 24
Double blocks: 6, single blocks: 27
Uses RoPE positional embeddings and classifier-free guidance

OvisImageModelArgs dataclass in ovis_image/model/args.py contains all model parameters.

Hardware Requirements

CUDA GPU required (~14GB VRAM minimum for bfloat16)
Automatic attention backend selection:
- PyTorch 2.7+: SDPA with CUDNN
- H100: Flash Attention 3 (optional)
- Fallback: Eager attention

Dependencies

Key packages: torch, transformers >= 4.53.0, einops, safetensors, gradio

Requires custom diffusers fork: https://github.com/DoctorKey/diffusers.git@ovis-image

UI Standards for Image Generation Apps

Use Gradio 6.0.2+ with Apple-style CSS for all image generation UIs:

# Define theme separately
custom_theme = gr.themes.Soft(...).set(...)

# Minimal gr.Blocks
with gr.Blocks(title="App Name", fill_height=False) as demo:
    # UI components...
    demo.load(None, None, None, js=js_code)  # JS via load()

# Theme and CSS in launch()
demo.launch(theme=custom_theme, css=apple_css)

Key points:

theme and css go in demo.launch(), NOT gr.Blocks()
JS loaded via demo.load() for proper timing
No head parameter needed - Gradio 6 handles dark theme properly
Set sdk_version: 6.0.2 in README.md for HuggingFace Spaces