Spaces:
Running
on
Zero
Running
on
Zero
| # CLAUDE.md | |
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | |
| ## Project Overview | |
| This is a Gradio application for image generation using the Qwen-Image model with Lightning LoRA acceleration. It's designed to run on Hugging Face Spaces with GPU support, providing fast 8-step image generation with advanced text rendering capabilities. | |
| ## Commands | |
| ### Run the application locally | |
| ```bash | |
| python app.py | |
| ``` | |
| ### Install dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ## Architecture | |
| ### Core Components | |
| 1. **Model Pipeline** (`app.py:130-164`) | |
| - Uses `Qwen/Qwen-Image` diffusion model with custom FlowMatchEulerDiscreteScheduler | |
| - Loads Lightning LoRA weights for 8-step acceleration | |
| - Configured for bfloat16 precision on CUDA | |
| 2. **Prompt Enhancement System** (`app.py:41-125`) | |
| - `polish_prompt()`: Uses Hugging Face InferenceClient with Cerebras provider to enhance prompts | |
| - `get_caption_language()`: Detects Chinese vs English prompts | |
| - `rewrite()`: Language-specific prompt enhancement with different system prompts for Chinese/English | |
| - Requires `HF_TOKEN` environment variable for API access | |
| 3. **Style Presets System** (`app.py:16-87`) | |
| - `load_style_presets()`: Loads style presets from `style_presets.yaml` | |
| - `apply_style_preset()`: Applies selected style to prompts | |
| - Supports custom styles and random style selection | |
| - Each preset includes prefix, suffix, and negative prompt components | |
| 4. **Page Layouts System** (`app.py:89-145`) | |
| - `load_page_layouts()`: Loads multi-image layouts from `page_layouts.yaml` | |
| - `get_layout_choices()`: Returns available layouts for a given number of images | |
| - `get_layout_metadata()`: Extracts panel metadata (type, focus, composition) for each position | |
| - Supports 1-8 images per page with 5-6 layout variations each | |
| - Dynamic layout selection based on number of images | |
| - **Panel Metadata System**: Each panel position includes metadata that describes: | |
| - `panel_type`: establishing/action/closeup/dialogue/reaction/transition/detail/splash | |
| - `focus`: environment/character/characters/action/emotion/object/event | |
| - `composition`: wide/tall/square/portrait/landscape | |
| - Metadata is used to guide the LLM in generating appropriate scene descriptions | |
| 5. **Story Generation System** (`app.py:147-265`) | |
| - `generate_story_scenes()`: Uses Hugging Face InferenceClient with Qwen3-235B to generate scene descriptions | |
| - Takes panel metadata as input to generate contextually appropriate content | |
| - Adapts descriptions based on panel type, focus, and composition | |
| - Returns structured scene data with captions and dialogue | |
| - `parse_yaml_scenes()`: Parses LLM output into structured scene data | |
| 6. **Image Size Calculation** (`app.py:267-330`) | |
| - `get_image_size_for_position()`: Calculates precise image dimensions based on layout aspect ratio | |
| - Uses 8px rounding for model compatibility while maintaining aspect ratio accuracy | |
| - Ensures images fill their layout containers without floating | |
| - `get_layout_position_for_image()`: Retrieves position data for a specific panel | |
| 7. **PDF Generation** (`app.py:450-540`) | |
| - `create_single_page_pdf()`: Creates PDF page with images arranged per layout | |
| - `create_multi_page_pdf()`: Combines multiple pages into a single document | |
| - Uses ReportLab for high-quality PDF generation | |
| - Preserves image quality at 95% JPEG compression | |
| - A4 page size with flexible positioning system | |
| - Smart filling: fills space completely when aspect ratios match (<2% difference) | |
| 8. **Multi-Image Generation** (`app.py:545-650`) | |
| - `infer_page()`: Main generation orchestrator | |
| - Generates multiple images and combines into PDF | |
| - Progressive generation with status updates | |
| - Seed management for reproducibility across multiple images | |
| - Returns PDF file, preview image, and seed information | |
| 9. **Gradio Interface** (`app.py:750-900+`) | |
| - Slider for selecting 1-8 images per page | |
| - Dynamic layout dropdown that updates based on image count | |
| - Style preset dropdown with custom style text option | |
| - PDF download and image preview outputs | |
| - Advanced settings for all generation parameters | |
| ## Key Configuration | |
| - **Scheduler Config** (`app.py:133-148`): Custom configuration for FlowMatchEulerDiscreteScheduler with exponential time shifting | |
| - **Aspect Ratios** (`app.py:170-188`): Predefined aspect ratios optimized for 1024 base resolution | |
| - **Style Presets** (`style_presets.yaml`): Configurable style presets with prompt modifiers and negative prompts | |
| - **Page Layouts** (`page_layouts.yaml`): Flexible layout system for 1-4 images per page | |
| - **Default Settings**: 8 inference steps, guidance scale 1.0, prompt enhancement enabled, 1 image per page | |
| ## Environment Variables | |
| - `HF_TOKEN`: Required for prompt enhancement via Hugging Face InferenceClient | |
| - Used for accessing Cerebras provider for Qwen3-235B model | |
| ## Key Features | |
| - **Session-based storage**: Each user session gets a unique temporary directory that persists for 24 hours | |
| - **Multi-page PDF generation**: Users can generate up to 128 pages in a single document | |
| - **Dynamic page addition**: Click "Generate page N" to add the next page to the PDF | |
| - **Flexible layouts**: Different layout options for 1-4 images per page | |
| - **Style presets**: 20+ predefined artistic styles | |
| - **Automatic cleanup**: Old sessions are automatically cleaned after 24 hours | |
| ## Model Dependencies | |
| - Main model: `Qwen/Qwen-Image` | |
| - LoRA weights: `lightx2v/Qwen-Image-Lightning` (V1.1 safetensors) | |
| - Prompt enhancement model: `Qwen/Qwen3-235B-A22B-Instruct-2507` via Cerebras |