comicx / README.md
ASTRALK's picture
Upload README.md with huggingface_hub
289c12c verified
|
Raw
History Blame Contribute Delete
5.17 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: AI Comic Studio
emoji: 🎬
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 6.18.0
app_file: app.py
pinned: true
license: apache-2.0
short_description: Full comic book generated live β€” Gemma writes, FLUX draws.
tags:
  - thousand-token-wood
  - off-brand
  - best-agent
  - best-demo
  - modal
  - track:thousand-token-wood
  - sponsor:modal
  - achievement:offbrand
  - achievement:sharing
  - achievement:fieldnotes
models:
  - google/gemma-4-26B-A4B-it
  - black-forest-labs/FLUX.2-klein-9B
datasets: []

AI Comic Studio

You type one sentence. You get a 25-page, 50-panel comic book. Title, cast, story arc, consistent character art across every panel. Generated live in under 90 seconds.

AI Comic Studio chains two small models through a five-stage pipeline. Gemma 4 26B writes the entire comic (safety gate, story bible, 50 panel scripts). FLUX.2 Klein 9B draws every panel. A Gradio reader presents the result with page navigation. No human intervention between idea and finished book.

Demo

Watch the full demo on YouTube

Social Post

View on X

How It Works

Stage 1: Safety Gate Gemma reviews the idea. Fictional adventure, action, mystery, horror, romance all pass. Only genuinely harmful requests get refused.

Stage 2: Story Bible Gemma produces: title, logline, a fixed cast of 1-4 characters (each with a 30-word visual description reused verbatim in every image prompt), global art style, color palette, and a 25-page synopsis with full narrative arc.

Stage 3: Panel Script Gemma writes 50 panels across 5 batches of 5 pages. Each batch receives a recap of all prior panels for story continuity.

Stage 4: Image Render FLUX.2 renders every panel at 832x576. Character appearance text from the bible is injected into each prompt. Deterministic seeds per panel keep the art consistent from page 1 to page 25.

Stage 5: Reader The Gradio UI streams panels live as they render, then presents the finished comic: two image+caption panels per page, left/right navigation, and a frozen generation timer.

Pipeline

"Idea"
  β”‚
  β”œβ”€ Gemma 4 26B-A4B (Writer)
  β”‚   β”œβ”€ Safety gate
  β”‚   β”œβ”€ Story bible (title, cast, style, 25-page synopsis)
  β”‚   └─ 50 panels in 5 batches (continuity recap per batch)
  β”‚
  β”œβ”€ FLUX.2 Klein 9B (Artist)
  β”‚   β”œβ”€ Character injection per prompt
  β”‚   β”œβ”€ Deterministic seeds
  β”‚   └─ Batched renders (4 panels per GPU pass)
  β”‚
  └─ Gradio Reader
      β”œβ”€ Live panel streaming
      β”œβ”€ Page navigation
      └─ Generation timer

Performance

Metric Value
Warm generation (both GPUs hot) ~60-90s
Cold start (first call) +1-3 min
Panels per batch 4
Total panels 50
Total pages 25
Image resolution 832x576
FLUX inference steps 4

Models

Model Parameters Role Runtime
google/gemma-4-26B-A4B-it 26B (4B active MoE) All text: safety, bible, panels vLLM, Modal H100
black-forest-labs/FLUX.2-klein-9B 9B All images: panel renders Diffusers, Modal H100

Both models stay under the 32B cap. Gemma never draws. FLUX never writes.

Character Consistency

The core problem in multi-panel comics: FLUX renders each panel independently. AI Comic Studio solves this with verbatim appearance injection. Every character gets a detailed visual description in the story bible (species, build, age, hair, face, clothing, colors, props). That exact text is injected into every FLUX prompt where the character appears. Combined with deterministic seeds, this keeps characters recognizable across 50 panels.

Custom UI

No stock Gradio components in the main flow. Everything is gr.HTML with handwritten CSS: halftone dot background, Anton typography, comic sticker buttons with 3D press effects, live ticking stopwatch at 100ms intervals, panel-by-panel streaming during generation.

Run Locally

pip install -r requirements.txt
COMIC_BACKEND=mock python app.py     # offline, full UI, no GPU
COMIC_BACKEND=modal python app.py    # live models on Modal

Qualification

Criteria Status
Under 32B params 26B + 9B, both under cap
Gradio Space Deployed on HF Spaces
Demo video YouTube
Social post X/Twitter
README tagged Done
Modal used vLLM + Diffusers on Modal H100s

Badges

Badge Qualifies
Off Brand Yes. Custom UI, no default Gradio chrome.
Best Agent Yes. 5-stage pipeline with continuity tracking across 50 panels.
Best Demo Yes. Live timer, panel streaming, full comic reader.
Modal Yes. Both backends on Modal H100s with scale-to-zero.
Sharing is Caring Yes. Social post published.
Field Notes Pending. Build write-up to be published.

Build Small Hackathon 2026 Β· Thousand Token Wood