Ovis-Image

Running on Zero

App Files Files Community

Ovis-Image / CLAUDE.md

tchung1970

Consolidate to single app.py entry point

d41a998 6 days ago

preview code

raw

history blame contribute delete

3.15 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Project Overview

	Ovis-Image is a 7-billion parameter text-to-image generation model optimized for high-quality text rendering. It's built upon Ovis-U1 and designed for consumer-grade GPU deployment. This is an inference-only repository.

	## Running the Application

	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Run web UI (uses diffusers pipeline)
	python app.py

	# Standalone inference test
	python ovis_image/test.py \
	--model_path <path/to/ovis_image.safetensors> \
	--ovis_path AIDC-AI/Ovis2.5-2B \
	--vae_path <path/to/ae.safetensors> \
	--prompt "your text prompt"
	```

	Models auto-download from HuggingFace Hub. Set `HF_TOKEN` environment variable if needed.

	## Architecture

	Pipeline Flow: Prompt → OvisTokenizer → OvisEmbedder (Ovis2.5-2B LLM) → OvisImageModel (denoising) → AutoEncoder (VAE decode) → Image

	Key Components:
	- `ovis_image/sampling.py` - Core inference pipeline with `generate_image()` function
	- `ovis_image/model/model.py` - OvisImageModel: 6 DoubleStreamBlock + 27 SingleStreamBlock layers
	- `ovis_image/model/autoencoder.py` - VAE for latent-to-pixel decoding
	- `ovis_image/model/hf_embedder.py` - OvisEmbedder wrapping Ovis2.5-2B as text encoder
	- `ovis_image/model/layers.py` - Transformer blocks, attention, embeddings
	- `ovis_image/model/ops.py` - Attention backends (Flash3, SDPA, eager)

	Entry Point:
	- `app.py` - Uses `OvisImagePipeline.from_pretrained()` for simple, clean loading
	- `app_old.py` - Backup of modular implementation that loads individual components (for reference)

	## Configuration

	Model hyperparameters are defined in `ovis_image/__init__.py` via `ovis_image_configs["ovis-image-7b"]`:
	- Hidden size: 3072, heads: 24
	- Double blocks: 6, single blocks: 27
	- Uses RoPE positional embeddings and classifier-free guidance

	`OvisImageModelArgs` dataclass in `ovis_image/model/args.py` contains all model parameters.

	## Hardware Requirements

	- CUDA GPU required (~14GB VRAM minimum for bfloat16)
	- Automatic attention backend selection:
	- PyTorch 2.7+: SDPA with CUDNN
	- H100: Flash Attention 3 (optional)
	- Fallback: Eager attention

	## Dependencies

	Key packages: `torch`, `transformers >= 4.53.0`, `einops`, `safetensors`, `gradio`

	Requires custom diffusers fork: `https://github.com/DoctorKey/diffusers.git@ovis-image`

	## UI Standards for Image Generation Apps

	Use Gradio 6.0.2+ with Apple-style CSS for all image generation UIs:

	```python
	# Define theme separately
	custom_theme = gr.themes.Soft(...).set(...)

	# Minimal gr.Blocks
	with gr.Blocks(title="App Name", fill_height=False) as demo:
	# UI components...
	demo.load(None, None, None, js=js_code) # JS via load()

	# Theme and CSS in launch()
	demo.launch(theme=custom_theme, css=apple_css)
	```

	Key points:
	- `theme` and `css` go in `demo.launch()`, NOT `gr.Blocks()`
	- JS loaded via `demo.load()` for proper timing
	- No `head` parameter needed - Gradio 6 handles dark theme properly
	- Set `sdk_version: 6.0.2` in README.md for HuggingFace Spaces