tchung1970 Claude Opus 4.5 commited on
Commit
c9d3110
·
1 Parent(s): 8cd75bf

Document UI standards for image generation apps

Browse files

Add Gradio 6.0.2+ with apple-css pattern to CLAUDE.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Files changed (1) hide show
  1. CLAUDE.md +91 -0
CLAUDE.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ Ovis-Image is a 7-billion parameter text-to-image generation model optimized for high-quality text rendering. It's built upon Ovis-U1 and designed for consumer-grade GPU deployment. This is an **inference-only** repository.
8
+
9
+ ## Running the Application
10
+
11
+ ```bash
12
+ # Install dependencies
13
+ pip install -r requirements.txt
14
+
15
+ # Run web UI (custom implementation)
16
+ python app.py
17
+
18
+ # Run web UI (diffusers-based, simpler)
19
+ python app_diffusers.py
20
+
21
+ # Standalone inference test
22
+ python ovis_image/test.py \
23
+ --model_path <path/to/ovis_image.safetensors> \
24
+ --ovis_path AIDC-AI/Ovis2.5-2B \
25
+ --vae_path <path/to/ae.safetensors> \
26
+ --prompt "your text prompt"
27
+ ```
28
+
29
+ Models auto-download from HuggingFace Hub. Set `HF_TOKEN` environment variable if needed.
30
+
31
+ ## Architecture
32
+
33
+ **Pipeline Flow:** Prompt → OvisTokenizer → OvisEmbedder (Ovis2.5-2B LLM) → OvisImageModel (denoising) → AutoEncoder (VAE decode) → Image
34
+
35
+ **Key Components:**
36
+ - `ovis_image/sampling.py` - Core inference pipeline with `generate_image()` function
37
+ - `ovis_image/model/model.py` - OvisImageModel: 6 DoubleStreamBlock + 27 SingleStreamBlock layers
38
+ - `ovis_image/model/autoencoder.py` - VAE for latent-to-pixel decoding
39
+ - `ovis_image/model/hf_embedder.py` - OvisEmbedder wrapping Ovis2.5-2B as text encoder
40
+ - `ovis_image/model/layers.py` - Transformer blocks, attention, embeddings
41
+ - `ovis_image/model/ops.py` - Attention backends (Flash3, SDPA, eager)
42
+
43
+ **Two Entry Points:**
44
+ - `app.py` - Loads individual components (modular, educational)
45
+ - `app_diffusers.py` - Uses `OvisImagePipeline.from_pretrained()` (simpler)
46
+
47
+ ## Configuration
48
+
49
+ Model hyperparameters are defined in `ovis_image/__init__.py` via `ovis_image_configs["ovis-image-7b"]`:
50
+ - Hidden size: 3072, heads: 24
51
+ - Double blocks: 6, single blocks: 27
52
+ - Uses RoPE positional embeddings and classifier-free guidance
53
+
54
+ `OvisImageModelArgs` dataclass in `ovis_image/model/args.py` contains all model parameters.
55
+
56
+ ## Hardware Requirements
57
+
58
+ - CUDA GPU required (~14GB VRAM minimum for bfloat16)
59
+ - Automatic attention backend selection:
60
+ - PyTorch 2.7+: SDPA with CUDNN
61
+ - H100: Flash Attention 3 (optional)
62
+ - Fallback: Eager attention
63
+
64
+ ## Dependencies
65
+
66
+ Key packages: `torch`, `transformers >= 4.53.0`, `einops`, `safetensors`, `gradio`
67
+
68
+ Requires custom diffusers fork: `https://github.com/DoctorKey/diffusers.git@ovis-image`
69
+
70
+ ## UI Standards for Image Generation Apps
71
+
72
+ Use **Gradio 6.0.2+** with Apple-style CSS for all image generation UIs:
73
+
74
+ ```python
75
+ # Define theme separately
76
+ custom_theme = gr.themes.Soft(...).set(...)
77
+
78
+ # Minimal gr.Blocks
79
+ with gr.Blocks(title="App Name", fill_height=False) as demo:
80
+ # UI components...
81
+ demo.load(None, None, None, js=js_code) # JS via load()
82
+
83
+ # Theme and CSS in launch()
84
+ demo.launch(theme=custom_theme, css=apple_css)
85
+ ```
86
+
87
+ **Key points:**
88
+ - `theme` and `css` go in `demo.launch()`, NOT `gr.Blocks()`
89
+ - JS loaded via `demo.load()` for proper timing
90
+ - No `head` parameter needed - Gradio 6 handles dark theme properly
91
+ - Set `sdk_version: 6.0.2` in README.md for HuggingFace Spaces