Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.19.0
title: AI Comic Studio
emoji: π¬
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 6.18.0
app_file: app.py
pinned: true
license: apache-2.0
short_description: Full comic book generated live β Gemma writes, FLUX draws.
tags:
- thousand-token-wood
- off-brand
- best-agent
- best-demo
- modal
- track:thousand-token-wood
- sponsor:modal
- achievement:offbrand
- achievement:sharing
- achievement:fieldnotes
models:
- google/gemma-4-26B-A4B-it
- black-forest-labs/FLUX.2-klein-9B
datasets: []
AI Comic Studio
You type one sentence. You get a 25-page, 50-panel comic book. Title, cast, story arc, consistent character art across every panel. Generated live in under 90 seconds.
AI Comic Studio chains two small models through a five-stage pipeline. Gemma 4 26B writes the entire comic (safety gate, story bible, 50 panel scripts). FLUX.2 Klein 9B draws every panel. A Gradio reader presents the result with page navigation. No human intervention between idea and finished book.
Demo
Watch the full demo on YouTube
Social Post
How It Works
Stage 1: Safety Gate Gemma reviews the idea. Fictional adventure, action, mystery, horror, romance all pass. Only genuinely harmful requests get refused.
Stage 2: Story Bible Gemma produces: title, logline, a fixed cast of 1-4 characters (each with a 30-word visual description reused verbatim in every image prompt), global art style, color palette, and a 25-page synopsis with full narrative arc.
Stage 3: Panel Script Gemma writes 50 panels across 5 batches of 5 pages. Each batch receives a recap of all prior panels for story continuity.
Stage 4: Image Render FLUX.2 renders every panel at 832x576. Character appearance text from the bible is injected into each prompt. Deterministic seeds per panel keep the art consistent from page 1 to page 25.
Stage 5: Reader The Gradio UI streams panels live as they render, then presents the finished comic: two image+caption panels per page, left/right navigation, and a frozen generation timer.
Pipeline
"Idea"
β
ββ Gemma 4 26B-A4B (Writer)
β ββ Safety gate
β ββ Story bible (title, cast, style, 25-page synopsis)
β ββ 50 panels in 5 batches (continuity recap per batch)
β
ββ FLUX.2 Klein 9B (Artist)
β ββ Character injection per prompt
β ββ Deterministic seeds
β ββ Batched renders (4 panels per GPU pass)
β
ββ Gradio Reader
ββ Live panel streaming
ββ Page navigation
ββ Generation timer
Performance
| Metric | Value |
|---|---|
| Warm generation (both GPUs hot) | ~60-90s |
| Cold start (first call) | +1-3 min |
| Panels per batch | 4 |
| Total panels | 50 |
| Total pages | 25 |
| Image resolution | 832x576 |
| FLUX inference steps | 4 |
Models
| Model | Parameters | Role | Runtime |
|---|---|---|---|
google/gemma-4-26B-A4B-it |
26B (4B active MoE) | All text: safety, bible, panels | vLLM, Modal H100 |
black-forest-labs/FLUX.2-klein-9B |
9B | All images: panel renders | Diffusers, Modal H100 |
Both models stay under the 32B cap. Gemma never draws. FLUX never writes.
Character Consistency
The core problem in multi-panel comics: FLUX renders each panel independently. AI Comic Studio solves this with verbatim appearance injection. Every character gets a detailed visual description in the story bible (species, build, age, hair, face, clothing, colors, props). That exact text is injected into every FLUX prompt where the character appears. Combined with deterministic seeds, this keeps characters recognizable across 50 panels.
Custom UI
No stock Gradio components in the main flow. Everything is gr.HTML with handwritten CSS: halftone dot background, Anton typography, comic sticker buttons with 3D press effects, live ticking stopwatch at 100ms intervals, panel-by-panel streaming during generation.
Run Locally
pip install -r requirements.txt
COMIC_BACKEND=mock python app.py # offline, full UI, no GPU
COMIC_BACKEND=modal python app.py # live models on Modal
Qualification
| Criteria | Status |
|---|---|
| Under 32B params | 26B + 9B, both under cap |
| Gradio Space | Deployed on HF Spaces |
| Demo video | YouTube |
| Social post | X/Twitter |
| README tagged | Done |
| Modal used | vLLM + Diffusers on Modal H100s |
Badges
| Badge | Qualifies |
|---|---|
| Off Brand | Yes. Custom UI, no default Gradio chrome. |
| Best Agent | Yes. 5-stage pipeline with continuity tracking across 50 panels. |
| Best Demo | Yes. Live timer, panel streaming, full comic reader. |
| Modal | Yes. Both backends on Modal H100s with scale-to-zero. |
| Sharing is Caring | Yes. Social post published. |
| Field Notes | Pending. Build write-up to be published. |
Build Small Hackathon 2026 Β· Thousand Token Wood