AiComicFactory2

Running on Zero

App Files Files Community

AiComicFactory2 / CLAUDE.md

Julian Bilcke

improve everything using AI

355629c 2 months ago

preview code

raw

history blame contribute delete

5.68 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Project Overview

	This is a Gradio application for image generation using the Qwen-Image model with Lightning LoRA acceleration. It's designed to run on Hugging Face Spaces with GPU support, providing fast 8-step image generation with advanced text rendering capabilities.

	## Commands

	### Run the application locally
	```bash
	python app.py
	```

	### Install dependencies
	```bash
	pip install -r requirements.txt
	```

	## Architecture

	### Core Components

	1. Model Pipeline (`app.py:130-164`)
	- Uses `Qwen/Qwen-Image` diffusion model with custom FlowMatchEulerDiscreteScheduler
	- Loads Lightning LoRA weights for 8-step acceleration
	- Configured for bfloat16 precision on CUDA

	2. Prompt Enhancement System (`app.py:41-125`)
	- `polish_prompt()`: Uses Hugging Face InferenceClient with Cerebras provider to enhance prompts
	- `get_caption_language()`: Detects Chinese vs English prompts
	- `rewrite()`: Language-specific prompt enhancement with different system prompts for Chinese/English
	- Requires `HF_TOKEN` environment variable for API access

	3. Style Presets System (`app.py:16-87`)
	- `load_style_presets()`: Loads style presets from `style_presets.yaml`
	- `apply_style_preset()`: Applies selected style to prompts
	- Supports custom styles and random style selection
	- Each preset includes prefix, suffix, and negative prompt components

	4. Page Layouts System (`app.py:89-145`)
	- `load_page_layouts()`: Loads multi-image layouts from `page_layouts.yaml`
	- `get_layout_choices()`: Returns available layouts for a given number of images
	- `get_layout_metadata()`: Extracts panel metadata (type, focus, composition) for each position
	- Supports 1-8 images per page with 5-6 layout variations each
	- Dynamic layout selection based on number of images
	- Panel Metadata System: Each panel position includes metadata that describes:
	- `panel_type`: establishing/action/closeup/dialogue/reaction/transition/detail/splash
	- `focus`: environment/character/characters/action/emotion/object/event
	- `composition`: wide/tall/square/portrait/landscape
	- Metadata is used to guide the LLM in generating appropriate scene descriptions

	5. Story Generation System (`app.py:147-265`)
	- `generate_story_scenes()`: Uses Hugging Face InferenceClient with Qwen3-235B to generate scene descriptions
	- Takes panel metadata as input to generate contextually appropriate content
	- Adapts descriptions based on panel type, focus, and composition
	- Returns structured scene data with captions and dialogue
	- `parse_yaml_scenes()`: Parses LLM output into structured scene data

	6. Image Size Calculation (`app.py:267-330`)
	- `get_image_size_for_position()`: Calculates precise image dimensions based on layout aspect ratio
	- Uses 8px rounding for model compatibility while maintaining aspect ratio accuracy
	- Ensures images fill their layout containers without floating
	- `get_layout_position_for_image()`: Retrieves position data for a specific panel

	7. PDF Generation (`app.py:450-540`)
	- `create_single_page_pdf()`: Creates PDF page with images arranged per layout
	- `create_multi_page_pdf()`: Combines multiple pages into a single document
	- Uses ReportLab for high-quality PDF generation
	- Preserves image quality at 95% JPEG compression
	- A4 page size with flexible positioning system
	- Smart filling: fills space completely when aspect ratios match (<2% difference)

	8. Multi-Image Generation (`app.py:545-650`)
	- `infer_page()`: Main generation orchestrator
	- Generates multiple images and combines into PDF
	- Progressive generation with status updates
	- Seed management for reproducibility across multiple images
	- Returns PDF file, preview image, and seed information

	9. Gradio Interface (`app.py:750-900+`)
	- Slider for selecting 1-8 images per page
	- Dynamic layout dropdown that updates based on image count
	- Style preset dropdown with custom style text option
	- PDF download and image preview outputs
	- Advanced settings for all generation parameters

	## Key Configuration

	- Scheduler Config (`app.py:133-148`): Custom configuration for FlowMatchEulerDiscreteScheduler with exponential time shifting
	- Aspect Ratios (`app.py:170-188`): Predefined aspect ratios optimized for 1024 base resolution
	- Style Presets (`style_presets.yaml`): Configurable style presets with prompt modifiers and negative prompts
	- Page Layouts (`page_layouts.yaml`): Flexible layout system for 1-4 images per page
	- Default Settings: 8 inference steps, guidance scale 1.0, prompt enhancement enabled, 1 image per page

	## Environment Variables

	- `HF_TOKEN`: Required for prompt enhancement via Hugging Face InferenceClient
	- Used for accessing Cerebras provider for Qwen3-235B model

	## Key Features

	- Session-based storage: Each user session gets a unique temporary directory that persists for 24 hours
	- Multi-page PDF generation: Users can generate up to 128 pages in a single document
	- Dynamic page addition: Click "Generate page N" to add the next page to the PDF
	- Flexible layouts: Different layout options for 1-4 images per page
	- Style presets: 20+ predefined artistic styles
	- Automatic cleanup: Old sessions are automatically cleaned after 24 hours

	## Model Dependencies

	- Main model: `Qwen/Qwen-Image`
	- LoRA weights: `lightx2v/Qwen-Image-Lightning` (V1.1 safetensors)
	- Prompt enhancement model: `Qwen/Qwen3-235B-A22B-Instruct-2507` via Cerebras