--- title: AI Comic Studio emoji: ๐ŸŽฌ colorFrom: red colorTo: yellow sdk: gradio sdk_version: "6.18.0" app_file: app.py pinned: true license: apache-2.0 short_description: Full comic book generated live โ€” Gemma writes, FLUX draws. tags: - thousand-token-wood - off-brand - best-agent - best-demo - modal - track:thousand-token-wood - sponsor:modal - achievement:offbrand - achievement:sharing - achievement:fieldnotes models: - google/gemma-4-26B-A4B-it - black-forest-labs/FLUX.2-klein-9B datasets: [] --- # AI Comic Studio You type one sentence. You get a 25-page, 50-panel comic book. Title, cast, story arc, consistent character art across every panel. Generated live in under 90 seconds. AI Comic Studio chains two small models through a five-stage pipeline. Gemma 4 26B writes the entire comic (safety gate, story bible, 50 panel scripts). FLUX.2 Klein 9B draws every panel. A Gradio reader presents the result with page navigation. No human intervention between idea and finished book. ## Demo [Watch the full demo on YouTube](https://youtu.be/ma_SNY1qats) ## Social Post [View on X](https://x.com/Asura1612913/status/2066635430314095013) ## How It Works **Stage 1: Safety Gate** Gemma reviews the idea. Fictional adventure, action, mystery, horror, romance all pass. Only genuinely harmful requests get refused. **Stage 2: Story Bible** Gemma produces: title, logline, a fixed cast of 1-4 characters (each with a 30-word visual description reused verbatim in every image prompt), global art style, color palette, and a 25-page synopsis with full narrative arc. **Stage 3: Panel Script** Gemma writes 50 panels across 5 batches of 5 pages. Each batch receives a recap of all prior panels for story continuity. **Stage 4: Image Render** FLUX.2 renders every panel at 832x576. Character appearance text from the bible is injected into each prompt. Deterministic seeds per panel keep the art consistent from page 1 to page 25. **Stage 5: Reader** The Gradio UI streams panels live as they render, then presents the finished comic: two image+caption panels per page, left/right navigation, and a frozen generation timer. ## Pipeline ``` "Idea" โ”‚ โ”œโ”€ Gemma 4 26B-A4B (Writer) โ”‚ โ”œโ”€ Safety gate โ”‚ โ”œโ”€ Story bible (title, cast, style, 25-page synopsis) โ”‚ โ””โ”€ 50 panels in 5 batches (continuity recap per batch) โ”‚ โ”œโ”€ FLUX.2 Klein 9B (Artist) โ”‚ โ”œโ”€ Character injection per prompt โ”‚ โ”œโ”€ Deterministic seeds โ”‚ โ””โ”€ Batched renders (4 panels per GPU pass) โ”‚ โ””โ”€ Gradio Reader โ”œโ”€ Live panel streaming โ”œโ”€ Page navigation โ””โ”€ Generation timer ``` ## Performance | Metric | Value | |--------|-------| | Warm generation (both GPUs hot) | ~60-90s | | Cold start (first call) | +1-3 min | | Panels per batch | 4 | | Total panels | 50 | | Total pages | 25 | | Image resolution | 832x576 | | FLUX inference steps | 4 | ## Models | Model | Parameters | Role | Runtime | |-------|-----------|------|---------| | `google/gemma-4-26B-A4B-it` | 26B (4B active MoE) | All text: safety, bible, panels | vLLM, Modal H100 | | `black-forest-labs/FLUX.2-klein-9B` | 9B | All images: panel renders | Diffusers, Modal H100 | Both models stay under the 32B cap. Gemma never draws. FLUX never writes. ## Character Consistency The core problem in multi-panel comics: FLUX renders each panel independently. AI Comic Studio solves this with verbatim appearance injection. Every character gets a detailed visual description in the story bible (species, build, age, hair, face, clothing, colors, props). That exact text is injected into every FLUX prompt where the character appears. Combined with deterministic seeds, this keeps characters recognizable across 50 panels. ## Custom UI No stock Gradio components in the main flow. Everything is `gr.HTML` with handwritten CSS: halftone dot background, Anton typography, comic sticker buttons with 3D press effects, live ticking stopwatch at 100ms intervals, panel-by-panel streaming during generation. ## Run Locally ```bash pip install -r requirements.txt COMIC_BACKEND=mock python app.py # offline, full UI, no GPU COMIC_BACKEND=modal python app.py # live models on Modal ``` ## Qualification | Criteria | Status | |----------|--------| | Under 32B params | 26B + 9B, both under cap | | Gradio Space | Deployed on HF Spaces | | Demo video | [YouTube](https://youtu.be/ma_SNY1qats) | | Social post | [X/Twitter](https://x.com/Asura1612913/status/2066635430314095013) | | README tagged | Done | | Modal used | vLLM + Diffusers on Modal H100s | ## Badges | Badge | Qualifies | |-------|-----------| | Off Brand | Yes. Custom UI, no default Gradio chrome. | | Best Agent | Yes. 5-stage pipeline with continuity tracking across 50 panels. | | Best Demo | Yes. Live timer, panel streaming, full comic reader. | | Modal | Yes. Both backends on Modal H100s with scale-to-zero. | | Sharing is Caring | Yes. [Social post](https://x.com/Asura1612913/status/2066635430314095013) published. | | Field Notes | Pending. Build write-up to be published. | --- **Build Small Hackathon 2026 ยท Thousand Token Wood**