AiComicFactory2

Running on Zero

App Files Files Community

AiComicFactory2 / CLAUDE.md

Julian Bilcke

rethinking this project to be Gradio+API+PDF based

5f4445f 3 months ago

preview code

raw

history blame

4.07 kB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a Gradio application for image generation using the Qwen-Image model with Lightning LoRA acceleration. It's designed to run on Hugging Face Spaces with GPU support, providing fast 8-step image generation with advanced text rendering capabilities.

Commands

Run the application locally

python app.py

Install dependencies

pip install -r requirements.txt

Architecture

Core Components

Model Pipeline (app.py:130-164)
- Uses Qwen/Qwen-Image diffusion model with custom FlowMatchEulerDiscreteScheduler
- Loads Lightning LoRA weights for 8-step acceleration
- Configured for bfloat16 precision on CUDA
Prompt Enhancement System (app.py:41-125)
- polish_prompt(): Uses Hugging Face InferenceClient with Cerebras provider to enhance prompts
- get_caption_language(): Detects Chinese vs English prompts
- rewrite(): Language-specific prompt enhancement with different system prompts for Chinese/English
- Requires HF_TOKEN environment variable for API access
Style Presets System (app.py:16-87)
- load_style_presets(): Loads style presets from style_presets.yaml
- apply_style_preset(): Applies selected style to prompts
- Supports custom styles and random style selection
- Each preset includes prefix, suffix, and negative prompt components
Page Layouts System (app.py:89-111)
- load_page_layouts(): Loads multi-image layouts from page_layouts.yaml
- Supports 1-4 images per page with various layout configurations
- Dynamic layout selection based on number of images
PDF Generation (app.py:166-223)
- create_pdf_with_layout(): Creates PDF with multiple images in selected layout
- Uses ReportLab for high-quality PDF generation
- Preserves image quality at 95% JPEG compression
- A4 page size with flexible positioning system
Multi-Image Generation (app.py:225-307)
- infer_multiple(): Generates multiple images and combines into PDF
- Progressive generation with status updates
- Seed management for reproducibility across multiple images
- Returns PDF file, preview image, and seed information
Gradio Interface (app.py:380-500+)
- Slider for selecting 1-4 images per page
- Dynamic layout dropdown that updates based on image count
- Style preset dropdown with custom style text option
- PDF download and image preview outputs
- Advanced settings for all generation parameters

Key Configuration

Scheduler Config (app.py:133-148): Custom configuration for FlowMatchEulerDiscreteScheduler with exponential time shifting
Aspect Ratios (app.py:170-188): Predefined aspect ratios optimized for 1024 base resolution
Style Presets (style_presets.yaml): Configurable style presets with prompt modifiers and negative prompts
Page Layouts (page_layouts.yaml): Flexible layout system for 1-4 images per page
Default Settings: 8 inference steps, guidance scale 1.0, prompt enhancement enabled, 1 image per page

Environment Variables

HF_TOKEN: Required for prompt enhancement via Hugging Face InferenceClient
Used for accessing Cerebras provider for Qwen3-235B model

Key Features

Session-based storage: Each user session gets a unique temporary directory that persists for 24 hours
Multi-page PDF generation: Users can generate up to 128 pages in a single document
Dynamic page addition: Click "Generate page N" to add the next page to the PDF
Flexible layouts: Different layout options for 1-4 images per page
Style presets: 20+ predefined artistic styles
Automatic cleanup: Old sessions are automatically cleaned after 24 hours

Model Dependencies

Main model: Qwen/Qwen-Image
LoRA weights: lightx2v/Qwen-Image-Lightning (V1.1 safetensors)
Prompt enhancement model: Qwen/Qwen3-235B-A22B-Instruct-2507 via Cerebras