Spaces:
Running
on
Zero
Running
on
Zero
| # CLAUDE.md | |
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | |
| ## Project Overview | |
| Ovis-Image is a 7-billion parameter text-to-image generation model optimized for high-quality text rendering. It's built upon Ovis-U1 and designed for consumer-grade GPU deployment. This is an **inference-only** repository. | |
| ## Running the Application | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Run web UI (uses diffusers pipeline) | |
| python app.py | |
| # Standalone inference test | |
| python ovis_image/test.py \ | |
| --model_path <path/to/ovis_image.safetensors> \ | |
| --ovis_path AIDC-AI/Ovis2.5-2B \ | |
| --vae_path <path/to/ae.safetensors> \ | |
| --prompt "your text prompt" | |
| ``` | |
| Models auto-download from HuggingFace Hub. Set `HF_TOKEN` environment variable if needed. | |
| ## Architecture | |
| **Pipeline Flow:** Prompt → OvisTokenizer → OvisEmbedder (Ovis2.5-2B LLM) → OvisImageModel (denoising) → AutoEncoder (VAE decode) → Image | |
| **Key Components:** | |
| - `ovis_image/sampling.py` - Core inference pipeline with `generate_image()` function | |
| - `ovis_image/model/model.py` - OvisImageModel: 6 DoubleStreamBlock + 27 SingleStreamBlock layers | |
| - `ovis_image/model/autoencoder.py` - VAE for latent-to-pixel decoding | |
| - `ovis_image/model/hf_embedder.py` - OvisEmbedder wrapping Ovis2.5-2B as text encoder | |
| - `ovis_image/model/layers.py` - Transformer blocks, attention, embeddings | |
| - `ovis_image/model/ops.py` - Attention backends (Flash3, SDPA, eager) | |
| **Entry Point:** | |
| - `app.py` - Uses `OvisImagePipeline.from_pretrained()` for simple, clean loading | |
| - `app_old.py` - Backup of modular implementation that loads individual components (for reference) | |
| ## Configuration | |
| Model hyperparameters are defined in `ovis_image/__init__.py` via `ovis_image_configs["ovis-image-7b"]`: | |
| - Hidden size: 3072, heads: 24 | |
| - Double blocks: 6, single blocks: 27 | |
| - Uses RoPE positional embeddings and classifier-free guidance | |
| `OvisImageModelArgs` dataclass in `ovis_image/model/args.py` contains all model parameters. | |
| ## Hardware Requirements | |
| - CUDA GPU required (~14GB VRAM minimum for bfloat16) | |
| - Automatic attention backend selection: | |
| - PyTorch 2.7+: SDPA with CUDNN | |
| - H100: Flash Attention 3 (optional) | |
| - Fallback: Eager attention | |
| ## Dependencies | |
| Key packages: `torch`, `transformers >= 4.53.0`, `einops`, `safetensors`, `gradio` | |
| Requires custom diffusers fork: `https://github.com/DoctorKey/diffusers.git@ovis-image` | |
| ## UI Standards for Image Generation Apps | |
| Use **Gradio 6.0.2+** with Apple-style CSS for all image generation UIs: | |
| ```python | |
| # Define theme separately | |
| custom_theme = gr.themes.Soft(...).set(...) | |
| # Minimal gr.Blocks | |
| with gr.Blocks(title="App Name", fill_height=False) as demo: | |
| # UI components... | |
| demo.load(None, None, None, js=js_code) # JS via load() | |
| # Theme and CSS in launch() | |
| demo.launch(theme=custom_theme, css=apple_css) | |
| ``` | |
| **Key points:** | |
| - `theme` and `css` go in `demo.launch()`, NOT `gr.Blocks()` | |
| - JS loaded via `demo.load()` for proper timing | |
| - No `head` parameter needed - Gradio 6 handles dark theme properly | |
| - Set `sdk_version: 6.0.2` in README.md for HuggingFace Spaces | |