Spaces:
Running on Zero
Running on Zero
Fix ZeroGPU build: real FLUX+VoxCPM, 3-model cleanup, remove Tiny Mode
#1
by sush0401 - opened
- AGENT_HANDOFF.md +0 -105
- B2_doodlebook_prompt.md +0 -406
- DEPLOY.md +0 -102
- EXECUTION_PLAN.md +0 -414
- README.md +292 -289
- app.py +697 -726
- app_zerogpu.py +0 -152
- config.py +6 -84
- docs/blog.md +0 -121
- docs/superpowers/specs/2026-06-14-coloring-book-loader-pdf-cover-design.md +0 -174
- docs/superpowers/specs/2026-06-14-flux-lineart-coloring-design.md +0 -93
- lora_finetune/dataset_prep.py +0 -39
- lora_finetune/train_lora.py +0 -287
- modal_workers/__init__.py +0 -0
- modal_workers/modal_image_gen.py +0 -332
- modal_workers/modal_story_gen.py +0 -397
- modal_workers/modal_tts.py +0 -235
- requirements.txt +26 -23
- run.py +0 -34
- run_modal.py +0 -287
- services/story.py +0 -71
- services/trace.py +0 -113
- services/tts.py +0 -50
- start_app.bat +0 -33
- ui/layout.py +1 -6
AGENT_HANDOFF.md
DELETED
|
@@ -1,105 +0,0 @@
|
|
| 1 |
-
# DoodleBook — Coding-Agent Handoff Prompt
|
| 2 |
-
Paste everything in the fenced block below into Codex / OpenCode / Cursor Agent / Claude Code as the build task.
|
| 3 |
-
It is self-contained and bakes in the 5 critical corrections (C1–C5) from `EXECUTION_PLAN.md`.
|
| 4 |
-
|
| 5 |
-
---
|
| 6 |
-
|
| 7 |
-
```
|
| 8 |
-
ROLE
|
| 9 |
-
You are the lead engineer building "DoodleBook" for the Build Small Hackathon 2026
|
| 10 |
-
(Adventure in Thousand Token Wood track). Build a Gradio app deployed to Hugging Face
|
| 11 |
-
Spaces that turns a child's crayon drawing into a consistent, narrated, illustrated
|
| 12 |
-
6-page storybook. Work in: D:\Project\Hugging_face_app\doodlebook
|
| 13 |
-
|
| 14 |
-
CONTEXT FILES (read first, in this order)
|
| 15 |
-
1. B2_doodlebook_prompt.md — original concept, exact code sketches, frontmatter, badges
|
| 16 |
-
2. EXECUTION_PLAN.md — architecture, phases, and the 5 corrections you MUST honor
|
| 17 |
-
|
| 18 |
-
NON-NEGOTIABLE CORRECTIONS (override the original prompt where they conflict)
|
| 19 |
-
C1 Do NOT train a LoRA per child at runtime (infeasible). Train ONE crayon-style LoRA
|
| 20 |
-
OFFLINE. Achieve per-character consistency at inference via: (a) locked seed S, page i
|
| 21 |
-
uses seed S+i; (b) reuse the identical character_description on every page; (c) feed the
|
| 22 |
-
uploaded doodle as an IMAGE PROMPT (IP-Adapter / FLUX Redux / img2img strength ~0.3-0.5).
|
| 23 |
-
The app MUST run on base FLUX with NO LoRA (degrade gracefully, show "LoRA coming").
|
| 24 |
-
C2 MiniCPM5-1B is unreliable at JSON. Use a few-shot prompt with ONE full exemplar, greedy
|
| 25 |
-
decode, and a 3-layer parser: (1) regex extract {...}; (2) json-repair/json5; (3) a
|
| 26 |
-
deterministic TEMPLATE fallback that always yields a valid 6-page book. App must NEVER
|
| 27 |
-
crash on bad model output.
|
| 28 |
-
C3 Modal cold-starts of a 12B diffusion model take minutes. Cache weights on a Modal Volume,
|
| 29 |
-
expose a keep_warm option, generate all 6 pages in ONE warm container call, and ALWAYS
|
| 30 |
-
ship a pre-generated sample book in assets/sample_book/ that loads instantly with zero compute.
|
| 31 |
-
C4 This is a SMALL-MODELS hackathon. Frame the 1B story + 2B voice as the "brain" and FLUX as
|
| 32 |
-
the "renderer" in the README (Tiny Titan argument). Implement a real "Tiny Mode" toggle that
|
| 33 |
-
swaps FLUX for an SD-Turbo/SDXL-Turbo + style-LoRA path (1-4 steps) runnable on a T4/edge GPU.
|
| 34 |
-
C5 Treat ALL model IDs as UNVERIFIED. FIRST TASK: verify each on the HF Hub and put the resolved
|
| 35 |
-
IDs + fallbacks in config.py. Fallbacks: FLUX.2-klein -> FLUX.1-schnell; MiniCPM5-1B ->
|
| 36 |
-
MiniCPM3-4B; VoxCPM2 -> Kokoro or MeloTTS. Nothing else imports a raw model string.
|
| 37 |
-
|
| 38 |
-
TECH STACK
|
| 39 |
-
Gradio 5.x (gr.Blocks, custom storybook CSS) on HF Spaces (CPU) · Modal for all GPU compute
|
| 40 |
-
(FLUX on A100, MiniCPM on T4, TTS on T4/A10G) · diffusers · peft · transformers · fpdf2 ·
|
| 41 |
-
Python 3.11. Space is a thin orchestrator; every heavy call is modal.Function.remote().
|
| 42 |
-
|
| 43 |
-
DELIVERABLES (build in this dependency order; commit after each phase)
|
| 44 |
-
PHASE 1 Foundation
|
| 45 |
-
- Verify model IDs on HF Hub; create config.py = single source of truth (IDs, fallbacks,
|
| 46 |
-
seeds, step counts, dimensions, lora repo, dataset repo).
|
| 47 |
-
- Scaffold the directory structure exactly as in B2_doodlebook_prompt.md plus:
|
| 48 |
-
config.py, services/, ui/, modal/.
|
| 49 |
-
- requirements.txt, .env.example (HF_TOKEN, MODAL endpoint/token).
|
| 50 |
-
- Bare Gradio shell that launches and displays the static sample book.
|
| 51 |
-
PHASE 2 Core text pipeline
|
| 52 |
-
- modal_story_gen.py: MiniCPM story -> JSON {title, character_description, pages:[{page,text,scene}]}
|
| 53 |
-
with the C2 3-layer parser + template fallback.
|
| 54 |
-
- book_builder.py: pages -> storybook HTML; PDF export via fpdf2 (gr.DownloadButton).
|
| 55 |
-
- assets/custom.css storybook styling. Wire text-only book end to end.
|
| 56 |
-
PHASE 3 AI integration
|
| 57 |
-
- modal_image_gen.py: FLUX pipeline; generate_book_pages() makes all 6 pages in one warm
|
| 58 |
-
container; C1 consistency stack (seed-lock + char_desc reuse + doodle image-prompt);
|
| 59 |
-
optional LoRA fuse (~0.85) with graceful base-model fallback; Modal Volume cache; keep_warm.
|
| 60 |
-
- modal_tts.py: VoxCPM2 narration of title+page texts -> wav, with fallback voice (C5).
|
| 61 |
-
- Full pipeline: doodle -> story -> 6 images -> narration -> assembled book.
|
| 62 |
-
PHASE 4 UX
|
| 63 |
-
- Convert create_book to a GENERATOR (yield) so status + pages stream in page-by-page
|
| 64 |
-
("Illustrating page N of 6..."). Add "Behind the magic" gr.Accordion (prompts/seeds/LoRA).
|
| 65 |
-
- Tiny Mode toggle (C4). Mobile-responsive CSS. Accessibility: alt text = page text on every
|
| 66 |
-
image, AA contrast, prefers-reduced-motion. Examples auto-load the sample on launch.
|
| 67 |
-
PHASE 5 Optimization & badges
|
| 68 |
-
- lora_finetune/: train_lora.py (DreamBooth-style, FLUX, rank16 alpha16, crayon style,
|
| 69 |
-
trigger [DOODLECHAR]), dataset_prep.py, README.md (reproduce steps). Publish LoRA to HF
|
| 70 |
-
(Well-Tuned). bf16/turbo settings; CPU-offload fallback for smaller GPUs.
|
| 71 |
-
- services/trace.py: log prompts/seeds/lora-version to HF dataset build-small-hackathon/
|
| 72 |
-
doodlebook-traces (Open Trace).
|
| 73 |
-
- Pre-generate and COMMIT the 6-page sample book to assets/sample_book/ (C3 non-negotiable).
|
| 74 |
-
PHASE 6 Submission
|
| 75 |
-
- README.md with the EXACT frontmatter from B2_doodlebook_prompt.md + Tiny Titan argument +
|
| 76 |
-
model table + architecture diagram + install/usage + screenshots + demo + reproducibility +
|
| 77 |
-
badges + license (Apache-2.0).
|
| 78 |
-
- Field Notes blog draft (docs/blog.md) on FLUX+LoRA character consistency.
|
| 79 |
-
- Deploy to HF Spaces; smoke test the cold-open judge path (sample loads with no compute);
|
| 80 |
-
verify live generation; run the §9 submission checklist from EXECUTION_PLAN.md.
|
| 81 |
-
|
| 82 |
-
INTERNAL CONTRACTS (keep stable)
|
| 83 |
-
generate_story(hero_name, theme, age=5) -> {title, character_description, pages:[{page,text,scene}]}
|
| 84 |
-
generate_book_pages(character_desc, story_beats, doodle=None, art_style, seed=42, tiny=False) -> list[bytes png]
|
| 85 |
-
speak_book(text, voice="warm") -> bytes wav
|
| 86 |
-
build_book_html(images, texts, title) -> html ; export_pdf(images, texts, title) -> path
|
| 87 |
-
log_trace(payload) -> dataset_url
|
| 88 |
-
|
| 89 |
-
ENGINEERING RULES
|
| 90 |
-
- Every remote/model call wrapped in try/except with a user-friendly fallback; the app must
|
| 91 |
-
never show a stack trace to a judge.
|
| 92 |
-
- config.py is the ONLY place model IDs/params live.
|
| 93 |
-
- Keep the sample-book path 100% independent of live compute so the demo always works.
|
| 94 |
-
- Match the storybook visual identity (paper #FEF9E7, page #FFFDE7, ink #3E2723, CTA #FF7043,
|
| 95 |
-
serif body). No default Gradio look (Off-Brand badge).
|
| 96 |
-
- Prefer small, readable modules over clever code. Comment the consistency logic and the parser.
|
| 97 |
-
|
| 98 |
-
DEFINITION OF DONE
|
| 99 |
-
Public HF Space loads the sample book instantly; a live "Make my book!" produces a consistent,
|
| 100 |
-
narrated 6-page book in <2 min warm; Tiny Mode works on a cheap GPU; LoRA repo + trace dataset
|
| 101 |
-
are public; README + frontmatter + blog published; submission checklist fully ticked.
|
| 102 |
-
|
| 103 |
-
START NOW with Phase 1, Task 0: verify the model IDs on the HF Hub and write config.py. Report
|
| 104 |
-
what you find before proceeding.
|
| 105 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
B2_doodlebook_prompt.md
DELETED
|
@@ -1,406 +0,0 @@
|
|
| 1 |
-
# B2 — "DoodleBook" | Claude Code Build Prompt
|
| 2 |
-
## Build Small Hackathon 2026 | Thousand Token Wood Track
|
| 3 |
-
|
| 4 |
-
---
|
| 5 |
-
|
| 6 |
-
## Mission
|
| 7 |
-
A child draws a crayon character. You photograph it. The app turns it into a consistent illustrated 6-page storybook — same character, same art style, across every page — narrated by MiniCPM5-1B and illustrated by a fine-tuned FLUX.2-klein LoRA. Nobody else in this competition is using FLUX.2-klein with a custom LoRA. That's the entire competitive moat.
|
| 8 |
-
|
| 9 |
-
---
|
| 10 |
-
|
| 11 |
-
## Models (ONLY sponsor models)
|
| 12 |
-
|
| 13 |
-
| Role | Model ID | Params | Sponsor |
|
| 14 |
-
|---|---|---|---|
|
| 15 |
-
| Image generation | `black-forest-labs/FLUX.2-klein` | ~12B | Black Forest Labs |
|
| 16 |
-
| LoRA fine-tune (character consistency) | Custom LoRA on FLUX.2-klein | — | Black Forest Labs |
|
| 17 |
-
| Story generation | `openbmb/MiniCPM5-1B` | 1B | OpenBMB |
|
| 18 |
-
| Voice narration | `openbmb/VoxCPM2` | 2B | OpenBMB |
|
| 19 |
-
|
| 20 |
-
Total: ~15B — under 32B cap.
|
| 21 |
-
|
| 22 |
-
**FLUX.2-klein model ID note:** Verify the exact HF model card ID at `https://huggingface.co/black-forest-labs`. At time of writing, check for `black-forest-labs/FLUX.2-klein` or `black-forest-labs/FLUX.1-schnell` as fallback. The hackathon docs specifically name FLUX.2-klein.
|
| 23 |
-
|
| 24 |
-
**Tiny Titan note:** Story model (MiniCPM5-1B) is 1B. If you argue the "primary AI" is the story generator (not the image model), you could claim Tiny Titan. Make this argument in the README.
|
| 25 |
-
|
| 26 |
-
---
|
| 27 |
-
|
| 28 |
-
## Badge stack
|
| 29 |
-
|
| 30 |
-
| Badge | How |
|
| 31 |
-
|---|---|
|
| 32 |
-
| ✅ Well-Tuned | Fine-tune LoRA on FLUX.2-klein for character consistency; publish to HF |
|
| 33 |
-
| ✅ Off-Brand | Custom storybook UI — not remotely like default Gradio |
|
| 34 |
-
| ✅ Field Notes | Blog post about FLUX.2-klein + LoRA character consistency approach |
|
| 35 |
-
| ✅ Open Trace | Publish generation traces (prompts, seeds, LoRA weights used) |
|
| 36 |
-
|
| 37 |
-
---
|
| 38 |
-
|
| 39 |
-
## Tech stack
|
| 40 |
-
- **Gradio 5.x** with `gr.Server` for storybook-style custom UI
|
| 41 |
-
- **Modal** for FLUX.2-klein image generation (A100 recommended — diffusion is memory-heavy)
|
| 42 |
-
- **Modal** for MiniCPM5-1B story generation + VoxCPM2 TTS
|
| 43 |
-
- **diffusers** library for FLUX pipeline
|
| 44 |
-
- **peft** for LoRA loading
|
| 45 |
-
- **Python 3.11**
|
| 46 |
-
|
| 47 |
-
---
|
| 48 |
-
|
| 49 |
-
## Directory structure
|
| 50 |
-
```
|
| 51 |
-
doodlebook/
|
| 52 |
-
├── app.py # Gradio entry point
|
| 53 |
-
├── modal_image_gen.py # FLUX.2-klein + LoRA generation on Modal
|
| 54 |
-
├── modal_story_gen.py # MiniCPM5-1B story generation on Modal
|
| 55 |
-
├── modal_tts.py # VoxCPM2 TTS on Modal
|
| 56 |
-
├── book_builder.py # Assembles pages into storybook HTML
|
| 57 |
-
├── lora_finetune/
|
| 58 |
-
│ ├── train_lora.py # FLUX LoRA training script (run locally)
|
| 59 |
-
│ ├── dataset_prep.py # Prepare character images for training
|
| 60 |
-
│ └── README.md # How to reproduce the fine-tune
|
| 61 |
-
├── requirements.txt
|
| 62 |
-
├── .env.example # MODAL_ENDPOINT_URL, HF_TOKEN
|
| 63 |
-
├── README.md
|
| 64 |
-
└── assets/
|
| 65 |
-
├── custom.css # Storybook CSS (yellowed pages, serif font)
|
| 66 |
-
├── page_template.html # Single page HTML template
|
| 67 |
-
├── sample_doodle.jpg # Example child's drawing
|
| 68 |
-
└── sample_book/ # Pre-generated example book (6 pages)
|
| 69 |
-
├── page_1.png
|
| 70 |
-
└── ...
|
| 71 |
-
```
|
| 72 |
-
|
| 73 |
-
---
|
| 74 |
-
|
| 75 |
-
## README.md — EXACT frontmatter
|
| 76 |
-
```yaml
|
| 77 |
-
---
|
| 78 |
-
title: DoodleBook
|
| 79 |
-
emoji: 📚
|
| 80 |
-
colorFrom: yellow
|
| 81 |
-
colorTo: orange
|
| 82 |
-
sdk: gradio
|
| 83 |
-
sdk_version: "5.0"
|
| 84 |
-
app_file: app.py
|
| 85 |
-
pinned: false
|
| 86 |
-
tags:
|
| 87 |
-
- hackathon
|
| 88 |
-
- build-small
|
| 89 |
-
- adventure-in-thousand-token-wood
|
| 90 |
-
- black-forest-labs/FLUX.2-klein
|
| 91 |
-
- openbmb/MiniCPM5-1B
|
| 92 |
-
- openbmb/VoxCPM2
|
| 93 |
-
- fine-tuned
|
| 94 |
-
- lora
|
| 95 |
-
- character-consistency
|
| 96 |
-
- storybook
|
| 97 |
-
- off-brand
|
| 98 |
-
---
|
| 99 |
-
```
|
| 100 |
-
|
| 101 |
-
---
|
| 102 |
-
|
| 103 |
-
## LoRA fine-tuning plan (run BEFORE submission)
|
| 104 |
-
|
| 105 |
-
### Goal
|
| 106 |
-
Train a LoRA that makes FLUX.2-klein reproduce the visual style of a child's crayon drawing and maintain character consistency across 6 different scene prompts.
|
| 107 |
-
|
| 108 |
-
### Training data strategy
|
| 109 |
-
1. Take a child's crayon drawing (or generate 10-15 "crayon-style" reference images)
|
| 110 |
-
2. Create variations: same character in different poses/scenes, keeping style consistent
|
| 111 |
-
3. Use DreamBooth-style fine-tuning with a trigger token: `[DOODLECHAR]`
|
| 112 |
-
|
| 113 |
-
### Train script sketch (lora_finetune/train_lora.py)
|
| 114 |
-
```python
|
| 115 |
-
from diffusers import FluxPipeline
|
| 116 |
-
from peft import LoraConfig, get_peft_model
|
| 117 |
-
# Use diffusers DreamBooth LoRA training
|
| 118 |
-
# Follow: https://github.com/huggingface/diffusers/tree/main/examples/dreambooth
|
| 119 |
-
# Target: FLUX.2-klein with rank=16, alpha=16
|
| 120 |
-
# Training images: 10-15 images of the character
|
| 121 |
-
# Instance prompt: "photo of [DOODLECHAR] character, crayon drawing style"
|
| 122 |
-
# Epochs: 200-400 steps (fast with FLUX)
|
| 123 |
-
```
|
| 124 |
-
|
| 125 |
-
After training:
|
| 126 |
-
```bash
|
| 127 |
-
huggingface-cli upload build-small-hackathon/doodlebook-flux-lora ./lora-weights
|
| 128 |
-
```
|
| 129 |
-
|
| 130 |
-
---
|
| 131 |
-
|
| 132 |
-
## modal_image_gen.py
|
| 133 |
-
```python
|
| 134 |
-
import modal
|
| 135 |
-
app = modal.App("doodlebook-image-gen")
|
| 136 |
-
|
| 137 |
-
flux_env = modal.Image.debian_slim().pip_install(
|
| 138 |
-
"diffusers>=0.28", "torch", "accelerate", "transformers",
|
| 139 |
-
"peft", "pillow", "sentencepiece"
|
| 140 |
-
)
|
| 141 |
-
|
| 142 |
-
@app.function(gpu="A100", image=flux_env, timeout=300, memory=32768)
|
| 143 |
-
def generate_page(
|
| 144 |
-
prompt: str,
|
| 145 |
-
lora_repo: str = "build-small-hackathon/doodlebook-flux-lora",
|
| 146 |
-
seed: int = 42,
|
| 147 |
-
width: int = 768,
|
| 148 |
-
height: int = 512
|
| 149 |
-
) -> bytes:
|
| 150 |
-
from diffusers import FluxPipeline
|
| 151 |
-
import torch, io
|
| 152 |
-
from PIL import Image
|
| 153 |
-
|
| 154 |
-
pipe = FluxPipeline.from_pretrained(
|
| 155 |
-
"black-forest-labs/FLUX.2-klein", # verify ID on HF Hub
|
| 156 |
-
torch_dtype=torch.bfloat16
|
| 157 |
-
).to("cuda")
|
| 158 |
-
|
| 159 |
-
# Load character LoRA for consistency
|
| 160 |
-
pipe.load_lora_weights(lora_repo)
|
| 161 |
-
pipe.fuse_lora(lora_scale=0.85)
|
| 162 |
-
|
| 163 |
-
generator = torch.Generator("cuda").manual_seed(seed)
|
| 164 |
-
image = pipe(
|
| 165 |
-
prompt=prompt,
|
| 166 |
-
num_inference_steps=20, # FLUX.2-klein is fast
|
| 167 |
-
guidance_scale=3.5,
|
| 168 |
-
width=width,
|
| 169 |
-
height=height,
|
| 170 |
-
generator=generator
|
| 171 |
-
).images[0]
|
| 172 |
-
|
| 173 |
-
buf = io.BytesIO()
|
| 174 |
-
image.save(buf, format="PNG")
|
| 175 |
-
return buf.getvalue()
|
| 176 |
-
|
| 177 |
-
@app.function(gpu="A100", image=flux_env, timeout=300, memory=32768)
|
| 178 |
-
def generate_book_pages(
|
| 179 |
-
character_desc: str,
|
| 180 |
-
story_beats: list[str],
|
| 181 |
-
art_style: str = "crayon drawing, children's book, colorful, simple shapes",
|
| 182 |
-
seed: int = 42
|
| 183 |
-
) -> list[bytes]:
|
| 184 |
-
"""Generate all 6 pages in one function call to reuse the loaded model."""
|
| 185 |
-
from diffusers import FluxPipeline
|
| 186 |
-
import torch, io
|
| 187 |
-
|
| 188 |
-
pipe = FluxPipeline.from_pretrained(
|
| 189 |
-
"black-forest-labs/FLUX.2-klein",
|
| 190 |
-
torch_dtype=torch.bfloat16
|
| 191 |
-
).to("cuda")
|
| 192 |
-
pipe.load_lora_weights("build-small-hackathon/doodlebook-flux-lora")
|
| 193 |
-
pipe.fuse_lora(lora_scale=0.85)
|
| 194 |
-
|
| 195 |
-
pages = []
|
| 196 |
-
for i, beat in enumerate(story_beats):
|
| 197 |
-
prompt = (
|
| 198 |
-
f"[DOODLECHAR] {character_desc}, {beat}, "
|
| 199 |
-
f"{art_style}, page {i+1} of children's book, "
|
| 200 |
-
f"white background, simple illustration"
|
| 201 |
-
)
|
| 202 |
-
gen = torch.Generator("cuda").manual_seed(seed + i) # deterministic per page
|
| 203 |
-
image = pipe(
|
| 204 |
-
prompt=prompt,
|
| 205 |
-
num_inference_steps=20,
|
| 206 |
-
guidance_scale=3.5,
|
| 207 |
-
width=768, height=512,
|
| 208 |
-
generator=gen
|
| 209 |
-
).images[0]
|
| 210 |
-
buf = io.BytesIO()
|
| 211 |
-
image.save(buf, format="PNG")
|
| 212 |
-
pages.append(buf.getvalue())
|
| 213 |
-
|
| 214 |
-
return pages
|
| 215 |
-
```
|
| 216 |
-
|
| 217 |
-
---
|
| 218 |
-
|
| 219 |
-
## modal_story_gen.py
|
| 220 |
-
```python
|
| 221 |
-
import modal
|
| 222 |
-
app = modal.App("doodlebook-story")
|
| 223 |
-
|
| 224 |
-
story_env = modal.Image.debian_slim().pip_install(
|
| 225 |
-
"transformers>=4.40", "torch", "accelerate", "sentencepiece"
|
| 226 |
-
)
|
| 227 |
-
|
| 228 |
-
@app.function(gpu="T4", image=story_env, timeout=120)
|
| 229 |
-
def generate_story(character_name: str, theme: str, age: int = 5) -> dict:
|
| 230 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 231 |
-
import torch, json
|
| 232 |
-
|
| 233 |
-
tok = AutoTokenizer.from_pretrained("openbmb/MiniCPM5-1B")
|
| 234 |
-
model = AutoModelForCausalLM.from_pretrained(
|
| 235 |
-
"openbmb/MiniCPM5-1B", torch_dtype=torch.float16
|
| 236 |
-
).cuda().eval()
|
| 237 |
-
|
| 238 |
-
prompt = f"""Write a 6-page children's storybook for age {age} about {character_name} with theme: {theme}.
|
| 239 |
-
|
| 240 |
-
Return ONLY valid JSON:
|
| 241 |
-
{{
|
| 242 |
-
"title": "Book title",
|
| 243 |
-
"character_description": "Visual description of {character_name} for illustration",
|
| 244 |
-
"pages": [
|
| 245 |
-
{{"page": 1, "text": "1-2 sentence page text (age {age})", "scene": "visual scene description for illustrator"}},
|
| 246 |
-
{{"page": 2, ...}},
|
| 247 |
-
{{"page": 3, ...}},
|
| 248 |
-
{{"page": 4, ...}},
|
| 249 |
-
{{"page": 5, ...}},
|
| 250 |
-
{{"page": 6, "text": "Gentle ending. Goodnight.", "scene": "closing scene"}}
|
| 251 |
-
]
|
| 252 |
-
}}"""
|
| 253 |
-
|
| 254 |
-
inputs = tok(prompt, return_tensors="pt").to("cuda")
|
| 255 |
-
with torch.no_grad():
|
| 256 |
-
out = model.generate(**inputs, max_new_tokens=800, do_sample=False)
|
| 257 |
-
response = tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
|
| 258 |
-
|
| 259 |
-
import re
|
| 260 |
-
match = re.search(r'\{.*\}', response, re.DOTALL)
|
| 261 |
-
if match:
|
| 262 |
-
return json.loads(match.group())
|
| 263 |
-
return {"error": response}
|
| 264 |
-
```
|
| 265 |
-
|
| 266 |
-
---
|
| 267 |
-
|
| 268 |
-
## book_builder.py — storybook HTML assembler
|
| 269 |
-
```python
|
| 270 |
-
import base64
|
| 271 |
-
|
| 272 |
-
PAGE_HTML = """
|
| 273 |
-
<div class="book-page" style="page-break-after: always;">
|
| 274 |
-
<img src="data:image/png;base64,{img_b64}" style="width:100%; border-radius:8px;"/>
|
| 275 |
-
<p class="page-text">{text}</p>
|
| 276 |
-
<span class="page-num">{page_num}</span>
|
| 277 |
-
</div>
|
| 278 |
-
"""
|
| 279 |
-
|
| 280 |
-
def build_book_html(pages_images: list[bytes], pages_texts: list[str], title: str) -> str:
|
| 281 |
-
pages_html = ""
|
| 282 |
-
for i, (img_bytes, text) in enumerate(zip(pages_images, pages_texts)):
|
| 283 |
-
b64 = base64.b64encode(img_bytes).decode()
|
| 284 |
-
pages_html += PAGE_HTML.format(img_b64=b64, text=text, page_num=i+1)
|
| 285 |
-
|
| 286 |
-
return f"""<div class="book-container">
|
| 287 |
-
<h1 class="book-title">{title}</h1>
|
| 288 |
-
{pages_html}
|
| 289 |
-
</div>"""
|
| 290 |
-
```
|
| 291 |
-
|
| 292 |
-
---
|
| 293 |
-
|
| 294 |
-
## app.py — full Gradio storybook UI
|
| 295 |
-
```python
|
| 296 |
-
import gradio as gr
|
| 297 |
-
from modal_story_gen import generate_story
|
| 298 |
-
from modal_image_gen import generate_book_pages
|
| 299 |
-
from modal_tts import speak_book
|
| 300 |
-
from book_builder import build_book_html
|
| 301 |
-
import json
|
| 302 |
-
|
| 303 |
-
THEMES = ["brave adventure", "making a new friend", "overcoming a fear",
|
| 304 |
-
"helping someone", "lost and found", "learning something new"]
|
| 305 |
-
|
| 306 |
-
CSS = """
|
| 307 |
-
body { background: #fef9e7; font-family: 'Georgia', serif; }
|
| 308 |
-
.book-container { max-width: 800px; margin: 0 auto; }
|
| 309 |
-
.book-title { font-size: 32px; text-align: center; color: #5d4037; }
|
| 310 |
-
.book-page { margin: 24px 0; padding: 20px; background: #fffde7;
|
| 311 |
-
border-radius: 12px; box-shadow: 3px 3px 12px rgba(0,0,0,0.15); }
|
| 312 |
-
.page-text { font-size: 22px; line-height: 1.9; color: #3e2723; text-align: center; }
|
| 313 |
-
.page-num { color: #bcaaa4; font-size: 14px; }
|
| 314 |
-
.gr-button-primary { background: #ff7043 !important; font-size: 20px; }
|
| 315 |
-
"""
|
| 316 |
-
|
| 317 |
-
def create_book(doodle_image, character_name, theme, hero_name):
|
| 318 |
-
if not character_name.strip():
|
| 319 |
-
character_name = "Little Hero"
|
| 320 |
-
if not hero_name.strip():
|
| 321 |
-
hero_name = character_name
|
| 322 |
-
|
| 323 |
-
# Step 1: Generate story
|
| 324 |
-
story = generate_story.remote(hero_name, theme, age=5)
|
| 325 |
-
if "error" in story:
|
| 326 |
-
return None, f"Story generation failed: {story['error']}", None
|
| 327 |
-
|
| 328 |
-
pages = story["pages"]
|
| 329 |
-
char_desc = story["character_description"]
|
| 330 |
-
title = story["title"]
|
| 331 |
-
|
| 332 |
-
scene_beats = [p["scene"] for p in pages]
|
| 333 |
-
page_texts = [p["text"] for p in pages]
|
| 334 |
-
|
| 335 |
-
# Step 2: Generate all 6 images (one Modal call, model loaded once)
|
| 336 |
-
img_bytes_list = generate_book_pages.remote(char_desc, scene_beats)
|
| 337 |
-
|
| 338 |
-
# Step 3: Assemble HTML book
|
| 339 |
-
book_html = build_book_html(img_bytes_list, page_texts, title)
|
| 340 |
-
|
| 341 |
-
# Step 4: TTS narration of full book
|
| 342 |
-
full_text = f"{title}. " + " ".join(page_texts)
|
| 343 |
-
audio_bytes = speak_book.remote(full_text)
|
| 344 |
-
audio_path = save_wav(audio_bytes)
|
| 345 |
-
|
| 346 |
-
return book_html, f"✅ '{title}' — 6 pages generated!", audio_path
|
| 347 |
-
|
| 348 |
-
with gr.Blocks(css=CSS, title="📚 DoodleBook") as demo:
|
| 349 |
-
gr.Markdown("# 📚 DoodleBook\n*Draw a character. Get a storybook.*")
|
| 350 |
-
|
| 351 |
-
with gr.Row():
|
| 352 |
-
with gr.Column(scale=1):
|
| 353 |
-
doodle = gr.Image(sources=["webcam","upload"], label="📸 Photo of your doodle", type="numpy")
|
| 354 |
-
char_name = gr.Textbox(label="Character name", placeholder="Ziggy the robot")
|
| 355 |
-
hero_name = gr.Textbox(label="Hero name in the story", placeholder="Ziggy")
|
| 356 |
-
theme = gr.Dropdown(choices=THEMES, value=THEMES[0], label="Story theme")
|
| 357 |
-
make_btn = gr.Button("✨ Make my book!", variant="primary")
|
| 358 |
-
gr.Examples(
|
| 359 |
-
examples=[["assets/sample_doodle.jpg", "Ziggy", "Ziggy", "brave adventure"]],
|
| 360 |
-
inputs=[doodle, char_name, hero_name, theme]
|
| 361 |
-
)
|
| 362 |
-
status = gr.Textbox(label="Status", interactive=False)
|
| 363 |
-
|
| 364 |
-
with gr.Column(scale=2):
|
| 365 |
-
book_display = gr.HTML(label="Your storybook")
|
| 366 |
-
audio_narration = gr.Audio(label="🎙️ Listen to your book", autoplay=False)
|
| 367 |
-
|
| 368 |
-
make_btn.click(
|
| 369 |
-
create_book,
|
| 370 |
-
inputs=[doodle, char_name, theme, hero_name],
|
| 371 |
-
outputs=[book_display, status, audio_narration]
|
| 372 |
-
)
|
| 373 |
-
|
| 374 |
-
demo.launch()
|
| 375 |
-
```
|
| 376 |
-
|
| 377 |
-
---
|
| 378 |
-
|
| 379 |
-
## TODO 1 — Doodle style extraction for LoRA prompt conditioning
|
| 380 |
-
After core pipeline works: use MiniCPM-V to *describe* the uploaded doodle in visual terms ("thick black outlines, bright primary colors, stick figure proportions, sun in top corner"). Prepend this extracted style description to every FLUX prompt so the generated images actually *match* the child's drawing style, not just the character concept. This is what makes the output genuinely feel like "their" character.
|
| 381 |
-
|
| 382 |
-
## TODO 2 — PDF export + shareable link
|
| 383 |
-
Assemble the 6 PNG pages into a downloadable PDF using `fpdf2` or `reportlab`. Add a "Download your book as PDF" button (gr.DownloadButton). Also export the full book as a shareable HF dataset entry (with the prompts, seeds, and LoRA version used) — this earns the Open Trace badge and means families can re-generate the same book later.
|
| 384 |
-
|
| 385 |
-
---
|
| 386 |
-
|
| 387 |
-
## Sponsor + badge alignment
|
| 388 |
-
|
| 389 |
-
| Award | Why |
|
| 390 |
-
|---|---|
|
| 391 |
-
| Thousand Token Wood podium | Unique concept — nobody combines child doodle + FLUX LoRA + story |
|
| 392 |
-
| Black Forest Labs ($3k pool) | FLUX.2-klein + custom LoRA — near-empty sponsor field |
|
| 393 |
-
| OpenBMB award (Wood track) | MiniCPM5-1B (story) + VoxCPM2 (narration) |
|
| 394 |
-
| Well-Tuned ($badge) | Published LoRA on HF |
|
| 395 |
-
| Off-Brand ($1,500) | Storybook CSS with yellowed pages, serif font — zero Gradio defaults |
|
| 396 |
-
| Best Demo ($1,000) | Child hearing their drawing narrated as a book = perfect 60-sec video |
|
| 397 |
-
| Community Choice | Shareable, emotional — parents will post this |
|
| 398 |
-
|
| 399 |
-
---
|
| 400 |
-
|
| 401 |
-
## Non-negotiables
|
| 402 |
-
- Pre-generate and include a complete sample book (all 6 pages) in `assets/sample_book/` so judges can see what it looks like without waiting for generation
|
| 403 |
-
- FLUX on A100 (not A10G) — FLUX.2-klein may need 24GB+ VRAM; check Modal memory settings
|
| 404 |
-
- If LoRA not yet trained, the app must still run with base FLUX.2-klein (no LoRA) — degrade gracefully, note "LoRA coming" in UI
|
| 405 |
-
- Generation time: expect 60-90 seconds for 6 images. Show progress: "Illustrating page 1 of 6..."
|
| 406 |
-
- Verify exact FLUX.2-klein model ID on HF Hub before writing any import statements
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
DEPLOY.md
DELETED
|
@@ -1,102 +0,0 @@
|
|
| 1 |
-
# DoodleBook — Hugging Face Deployment Guide
|
| 2 |
-
|
| 3 |
-
## Option 1: HF Spaces + Modal (Recommended)
|
| 4 |
-
|
| 5 |
-
### Setup Steps
|
| 6 |
-
|
| 7 |
-
1. **Create HF Space**
|
| 8 |
-
```bash
|
| 9 |
-
# Install HF CLI
|
| 10 |
-
pip install huggingface_hub
|
| 11 |
-
|
| 12 |
-
# Login
|
| 13 |
-
huggingface-cli login
|
| 14 |
-
|
| 15 |
-
# Create space
|
| 16 |
-
huggingface-cli space create your-username/doodlebook --sdk gradio
|
| 17 |
-
```
|
| 18 |
-
|
| 19 |
-
2. **Set Modal Secrets**
|
| 20 |
-
```bash
|
| 21 |
-
# In HF Space Settings → Secrets:
|
| 22 |
-
MODAL_TOKEN_ID=your_modal_token
|
| 23 |
-
HF_TOKEN=your_hf_token
|
| 24 |
-
```
|
| 25 |
-
|
| 26 |
-
3. **Upload Code**
|
| 27 |
-
```bash
|
| 28 |
-
cd doodlebook
|
| 29 |
-
git init
|
| 30 |
-
git add .
|
| 31 |
-
git commit -m "Initial commit"
|
| 32 |
-
git remote add origin https://huggingface.co/spaces/your-username/doodlebook
|
| 33 |
-
git push -u origin main
|
| 34 |
-
```
|
| 35 |
-
|
| 36 |
-
### Inference Times
|
| 37 |
-
|
| 38 |
-
| Scenario | Cold Start | Warm |
|
| 39 |
-
|----------|-----------|------|
|
| 40 |
-
| Sample Book | 0s | 0s |
|
| 41 |
-
| First Generation | 2-3 min | 30-60s |
|
| 42 |
-
| Subsequent | 30-60s | 20-40s |
|
| 43 |
-
|
| 44 |
-
---
|
| 45 |
-
|
| 46 |
-
## Option 2: HF ZeroGPU (Free, No Modal)
|
| 47 |
-
|
| 48 |
-
### Changes Needed
|
| 49 |
-
|
| 50 |
-
Replace Modal calls with direct inference on ZeroGPU:
|
| 51 |
-
|
| 52 |
-
```python
|
| 53 |
-
# In modal_workers/modal_image_gen.py
|
| 54 |
-
# Remove Modal, use direct torch inference
|
| 55 |
-
```
|
| 56 |
-
|
| 57 |
-
### Inference Times
|
| 58 |
-
|
| 59 |
-
| Scenario | Cold Start | Warm |
|
| 60 |
-
|----------|-----------|------|
|
| 61 |
-
| Sample Book | 0s | 0s |
|
| 62 |
-
| First Generation | 3-5 min | 1-2 min |
|
| 63 |
-
| Subsequent | 1-2 min | 45-90s |
|
| 64 |
-
|
| 65 |
-
---
|
| 66 |
-
|
| 67 |
-
## Option 3: HF Inference API
|
| 68 |
-
|
| 69 |
-
### Setup
|
| 70 |
-
|
| 71 |
-
```python
|
| 72 |
-
import requests
|
| 73 |
-
|
| 74 |
-
API_URL = "https://api-inference.huggingface.co/models/black-forest-labs/FLUX.2-klein-4B"
|
| 75 |
-
headers = {"Authorization": "Bearer your_hf_token"}
|
| 76 |
-
|
| 77 |
-
def query(payload):
|
| 78 |
-
response = requests.post(API_URL, headers=headers, json=payload)
|
| 79 |
-
return response.content
|
| 80 |
-
```
|
| 81 |
-
|
| 82 |
-
### Inference Times
|
| 83 |
-
|
| 84 |
-
| Scenario | Cold Start | Warm |
|
| 85 |
-
|----------|-----------|------|
|
| 86 |
-
| Single Image | 10-30s | 5-15s |
|
| 87 |
-
|
| 88 |
-
---
|
| 89 |
-
|
| 90 |
-
## Recommendation for Hackathon
|
| 91 |
-
|
| 92 |
-
**Use Option 1 (HF Spaces + Modal)** because:
|
| 93 |
-
- ✅ Sample book loads instantly (no compute)
|
| 94 |
-
- ✅ Warm generation ~30s (fast demo)
|
| 95 |
-
- ✅ Modal keeps model warm during judging
|
| 96 |
-
- ✅ Free HF Space (CPU only)
|
| 97 |
-
- ✅ Modal only charges during generation
|
| 98 |
-
|
| 99 |
-
### Cost Estimate
|
| 100 |
-
- HF Space: Free
|
| 101 |
-
- Modal: ~$0.50 per demo generation
|
| 102 |
-
- Total for hackathon: ~$5-10
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
EXECUTION_PLAN.md
DELETED
|
@@ -1,414 +0,0 @@
|
|
| 1 |
-
# DoodleBook — Master Execution Plan
|
| 2 |
-
**Build Small Hackathon 2026 · "Adventure in Thousand Token Wood" Track**
|
| 3 |
-
_Senior architect / strategist / tech-lead review of `B2_doodlebook_prompt.md`_
|
| 4 |
-
|
| 5 |
-
> Source-of-truth concept: A child draws a crayon character → photograph it → the app produces a
|
| 6 |
-
> **consistent, illustrated 6-page storybook** (same character + same art style on every page),
|
| 7 |
-
> written by **MiniCPM5-1B**, narrated by **VoxCPM2**, illustrated by **FLUX.2-klein + a crayon-style LoRA**.
|
| 8 |
-
|
| 9 |
-
---
|
| 10 |
-
|
| 11 |
-
## 0. Critical engineering corrections (read first)
|
| 12 |
-
|
| 13 |
-
The original prompt is excellent on vision and badge strategy but has **5 load-bearing technical risks**. Fix these before writing code or you will fail the live demo.
|
| 14 |
-
|
| 15 |
-
| # | Risk in original | Reality | Fix (this plan adopts) |
|
| 16 |
-
|---|---|---|---|
|
| 17 |
-
| **C1** | Implies per-child LoRA so each kid's character is reproduced | LoRA training = minutes–hours on A100. You **cannot** train per user at demo time. | LoRA trains **ONE crayon art-style** offline. Per-character consistency comes from **(a) locked seed, (b) the child's doodle used as an image prompt** via IP-Adapter / FLUX Redux / img2img, and **(c) a fixed character description** reused on every page. Never claim live per-child training. |
|
| 18 |
-
| **C2** | 1B model must emit strict JSON | 1B models break JSON constantly (trailing commas, prose, truncation). | Constrained decoding + a **3-layer parser**: (1) regex extract, (2) `json5`/repair, (3) deterministic template fallback so the app NEVER crashes. Few-shot prompt with one full exemplar. |
|
| 19 |
-
| **C3** | "60–90s for 6 images" | True for warm GPU; **Modal cold start of a 12B diffusion model = 2–4 min**. | Modal `@app.cls` with `keep_warm=1` during demo window + model weights on a **Modal Volume** (no re-download). Pre-warm button. Always ship a pre-generated sample book. |
|
| 20 |
-
| **C4** | "Small Models" hackathon, but hero model is 12B | Judges may discount the "small" claim. | **Reframe narrative:** the *brain* (story + voice) is a **3B total small-model stack** (MiniCPM5-1B + VoxCPM2); FLUX is the *renderer/printer*. Add a real **"Tiny Mode"** (SDXL-Turbo or SD-Turbo + style LoRA, runnable on T4/edge) to make the small-model claim defensible and unlock the edge-device track. |
|
| 21 |
-
| **C5** | Assumes model IDs are correct | Prompt itself flags FLUX.2-klein / MiniCPM5-1B / VoxCPM2 as unverified. | **Phase 1, Task 0:** verify every model-card ID on HF Hub; wire fallbacks (`FLUX.1-schnell`, `MiniCPM3-4B`, `MeloTTS`/`Kokoro`) behind one config block. |
|
| 22 |
-
|
| 23 |
-
Everything below assumes these corrections.
|
| 24 |
-
|
| 25 |
-
---
|
| 26 |
-
|
| 27 |
-
## 1. Project Analysis
|
| 28 |
-
|
| 29 |
-
### Executive summary
|
| 30 |
-
DoodleBook converts a child's hand-drawn crayon character (captured by photo) into a complete, narrated, visually-consistent 6-page picture book in ~2 minutes. It fuses three sponsor models into a single emotional, demo-perfect artifact: a 1B story writer, a 2B voice, and a 12B image model steered by a custom crayon-style LoRA. The moat is the **combination** — nobody else pairs a child's real drawing with a fine-tuned FLUX LoRA and on-model narration.
|
| 31 |
-
|
| 32 |
-
### Problem statement
|
| 33 |
-
Kids create characters constantly but those drawings die on the fridge. Parents can't turn them into the stories children imagine. Existing "AI storybook" apps (a) ignore the child's actual art, (b) produce inconsistent characters page-to-page, and (c) feel like generic AI slop, not *the child's* creation. There is no tool that preserves the child's own visual style across a coherent narrated book.
|
| 34 |
-
|
| 35 |
-
### Target users
|
| 36 |
-
- **Primary:** Parents of children 3–8 (gift/keepsake, bedtime, screen-time-with-purpose).
|
| 37 |
-
- **Secondary:** Early-years teachers (creative writing prompts), pediatric/occupational therapists (expressive activities), grandparents (remote bonding).
|
| 38 |
-
- **Demo persona:** Judge watching a 60-sec video of a kid hearing their own drawing read aloud as a book.
|
| 39 |
-
|
| 40 |
-
### Market need
|
| 41 |
-
- AI-storytime apps are a growing category but commoditized and character-inconsistent.
|
| 42 |
-
- Differentiator the market lacks: **"your child's actual drawing becomes the book's art."** Emotional keepsake > generic generation. High shareability (parents post their kids).
|
| 43 |
-
|
| 44 |
-
### Competitive advantage / moat
|
| 45 |
-
1. **Underused sponsor field** — FLUX.2-klein + custom LoRA is near-empty in the competition (per original recon).
|
| 46 |
-
2. **Style-faithful character consistency** via seed-lock + image-prompt conditioning + LoRA (multi-signal, robust).
|
| 47 |
-
3. **Full small-model stack** end-to-end on sponsor models (story + voice + image).
|
| 48 |
-
4. **Off-brand storybook UI** — zero Gradio defaults.
|
| 49 |
-
5. **Emotional demo** — strongest 60-second-video category in the whole event.
|
| 50 |
-
|
| 51 |
-
### Innovation score (self-assessed)
|
| 52 |
-
| Axis | Score /10 | Note |
|
| 53 |
-
|---|---|---|
|
| 54 |
-
| Concept originality | 9 | Doodle→consistent book is genuinely novel |
|
| 55 |
-
| Technical depth | 8 | LoRA + multi-model orchestration + consistency engineering |
|
| 56 |
-
| Feasibility in hackathon window | 7 | Achievable IF corrections C1–C5 applied |
|
| 57 |
-
| Demo/emotional impact | 10 | Best-in-show potential |
|
| 58 |
-
| Track coverage | 9 | 5+ awards reachable |
|
| 59 |
-
| **Composite** | **8.6** | Strong winner profile |
|
| 60 |
-
|
| 61 |
-
### Hackathon-winning potential
|
| 62 |
-
**High.** Realistic path to: Thousand Token Wood podium + Black Forest Labs sponsor award + OpenBMB award + Off-Brand + Best Demo + Community Choice. Five+ simultaneous award surfaces is the strategy (see §7).
|
| 63 |
-
|
| 64 |
-
---
|
| 65 |
-
|
| 66 |
-
## 2. Product Vision
|
| 67 |
-
|
| 68 |
-
### Long-term vision
|
| 69 |
-
The default way a family turns a child's imagination into a keepsake — "Instagram for the things your kid invents." Drawing → narrated book → printed photo-book → series with recurring characters.
|
| 70 |
-
|
| 71 |
-
### Future roadmap
|
| 72 |
-
- **v1 (hackathon):** Doodle → 6-page narrated book, PDF export, Open-Trace share.
|
| 73 |
-
- **v2:** Character library (recurring heroes across books), multi-character scenes, child-voice cloning (with guardian consent), print-on-demand.
|
| 74 |
-
- **v3:** Collaborative books (siblings co-create), classroom mode, multilingual narration, animation (short clips per page).
|
| 75 |
-
- **v4:** On-device "Tiny Mode" mobile app for offline bedtime generation.
|
| 76 |
-
|
| 77 |
-
### Scalability opportunities
|
| 78 |
-
- Stateless generation workers (Modal autoscale) behind a thin Gradio/HF front door.
|
| 79 |
-
- Cache by (doodle-hash + theme + seed) to dedupe regenerations.
|
| 80 |
-
- Batch the 6 pages in one warm container (already in the original `generate_book_pages`).
|
| 81 |
-
|
| 82 |
-
### Edge-device deployment possibilities
|
| 83 |
-
- **Story:** MiniCPM5-1B quantized (GGUF/llama.cpp, int4) runs on a laptop/phone NPU.
|
| 84 |
-
- **Voice:** VoxCPM2 or a Kokoro/MeloTTS fallback runs on CPU.
|
| 85 |
-
- **Image (Tiny Mode):** SD-Turbo / SDXL-Turbo + tiny style LoRA at 1–4 steps on a single consumer GPU or Apple Silicon; sub-second-to-a-few-seconds pages.
|
| 86 |
-
- Ship a documented "edge profile" config to claim the edge/small-model narrative credibly.
|
| 87 |
-
|
| 88 |
-
### Small-model optimization strategy
|
| 89 |
-
- 4-bit (NF4/bitsandbytes) for MiniCPM; `torch.compile` + bf16 for FLUX; sequential CPU offload to fit smaller GPUs.
|
| 90 |
-
- FLUX turbo settings: 4–20 steps; Tiny Mode = 1–4 steps with turbo image model.
|
| 91 |
-
- Modal Volume model cache; `keep_warm` only during judging.
|
| 92 |
-
- KV-cache reuse + greedy decode for deterministic, fast story gen.
|
| 93 |
-
|
| 94 |
-
---
|
| 95 |
-
|
| 96 |
-
## 3. Technical Architecture
|
| 97 |
-
|
| 98 |
-
### System architecture (text diagram)
|
| 99 |
-
```
|
| 100 |
-
┌──────────────────────────────────────────────┐
|
| 101 |
-
│ HF Space (Gradio 5.x, custom storybook UI) │
|
| 102 |
-
│ app.py · custom.css · book_builder.py │
|
| 103 |
-
└───────────────┬──────────────────────────────┘
|
| 104 |
-
│ orchestration (sync calls)
|
| 105 |
-
┌────────────────────────────────┼─────────────────────────────────┐
|
| 106 |
-
▼ ▼ ▼
|
| 107 |
-
┌───────────────┐ ┌───────────────────┐ ┌──────────────────┐
|
| 108 |
-
│ modal_story │ │ modal_image_gen │ │ modal_tts │
|
| 109 |
-
│ MiniCPM5-1B │ │ FLUX.2-klein + │ │ VoxCPM2 │
|
| 110 |
-
│ (T4) JSON │ char_desc │ crayon LoRA (A100)│ pages.png │ (T4/A10G) wav │
|
| 111 |
-
│ story+scenes │ ───────────► │ + IP-Adapter/img2 │ ──────────► │ narration │
|
| 112 |
-
└───────┬───────┘ scenes │ img from doodle │ └────────┬─────────┘
|
| 113 |
-
│ └─────────┬─────────┘ │
|
| 114 |
-
│ pages[text,scene] │ 6 page images │ audio
|
| 115 |
-
└───────────────┬────────────────┴────────────────────────────────┘
|
| 116 |
-
▼
|
| 117 |
-
┌────────────────────┐
|
| 118 |
-
│ book_builder.py │ → storybook HTML (gr.HTML) + PDF (fpdf2)
|
| 119 |
-
│ + open_trace.py │ → HF dataset trace (prompts/seeds/lora)
|
| 120 |
-
└────────────────────┘
|
| 121 |
-
```
|
| 122 |
-
|
| 123 |
-
### Frontend architecture
|
| 124 |
-
- **Gradio 5.x `gr.Blocks`** single-page, two-column: input panel (left), live book viewer (right).
|
| 125 |
-
- Custom **storybook CSS** (yellowed paper, serif, drop shadows, page-flip feel) → Off-Brand badge.
|
| 126 |
-
- Progressive reveal: pages stream in as generated ("Illustrating page 3 of 6…").
|
| 127 |
-
- `gr.HTML` book canvas, `gr.Audio` narration, `gr.DownloadButton` PDF, `gr.Gallery` fallback.
|
| 128 |
-
|
| 129 |
-
### Backend architecture
|
| 130 |
-
- **Modal** for all heavy compute, 3 apps (story/image/tts), each a warm-able class.
|
| 131 |
-
- **Stateless** functions; book assembly + trace logging in the Space process.
|
| 132 |
-
- Config module (`config.py`) holds every model ID + fallback + generation params (single source of truth → fixes C5).
|
| 133 |
-
|
| 134 |
-
### AI model architecture
|
| 135 |
-
- **Story (MiniCPM5-1B, T4):** few-shot, greedy, `max_new_tokens≈800`, constrained JSON + 3-layer parser (C2). Outputs `title`, `character_description`, `pages[{page,text,scene}]`.
|
| 136 |
-
- **Image (FLUX.2-klein + LoRA, A100):**
|
| 137 |
-
- Crayon **style LoRA** (offline-trained, rank 16) fused at scale ~0.8.
|
| 138 |
-
- **Consistency stack:** locked base seed `S`; page `i` uses `S+i`; reuse identical `character_description` token block; **doodle image fed as image prompt** (IP-Adapter / Redux / img2img strength ~0.3–0.5) so output resembles the child's drawing (this realizes original TODO 1).
|
| 139 |
-
- 20 steps (Standard) / 4 steps (Tiny Mode), guidance ~3.5, 768×512.
|
| 140 |
-
- **Voice (VoxCPM2, T4/A10G):** narrate `title + page texts`; return wav.
|
| 141 |
-
- **Doodle understanding (MiniCPM-V, optional):** caption the drawing → style tokens prepended to FLUX prompt (original TODO 1).
|
| 142 |
-
|
| 143 |
-
### Data flow
|
| 144 |
-
1. User uploads/photographs doodle + name + theme.
|
| 145 |
-
2. Story worker → JSON (title, char desc, 6×{text,scene}).
|
| 146 |
-
3. (Opt) Doodle captioner → style tokens.
|
| 147 |
-
4. Image worker → 6 PNGs (seed-locked, LoRA + doodle-conditioned).
|
| 148 |
-
5. TTS worker → narration wav.
|
| 149 |
-
6. `book_builder` → HTML book + PDF; `open_trace` → HF dataset row.
|
| 150 |
-
|
| 151 |
-
### API structure (internal contracts)
|
| 152 |
-
```
|
| 153 |
-
generate_story(hero_name:str, theme:str, age:int=5) -> {title, character_description, pages:[{page,text,scene}]}
|
| 154 |
-
generate_book_pages(character_desc:str, story_beats:list[str], doodle:bytes|None,
|
| 155 |
-
art_style:str, seed:int=42, tiny:bool=False) -> list[bytes]
|
| 156 |
-
speak_book(text:str, voice:str="warm") -> bytes(wav)
|
| 157 |
-
build_book_html(images:list[bytes], texts:list[str], title:str) -> str
|
| 158 |
-
export_pdf(images, texts, title) -> path
|
| 159 |
-
log_trace(payload) -> dataset_url
|
| 160 |
-
```
|
| 161 |
-
|
| 162 |
-
### Storage strategy
|
| 163 |
-
- Model weights → **Modal Volume** (cache, no re-download → fixes C3).
|
| 164 |
-
- Generated assets → ephemeral `/tmp` in Space; user downloads PDF.
|
| 165 |
-
- Traces → **HF Dataset** `build-small-hackathon/doodlebook-traces` (Open Trace badge).
|
| 166 |
-
- LoRA weights → **HF model repo** `build-small-hackathon/doodlebook-flux-lora` (Well-Tuned badge).
|
| 167 |
-
|
| 168 |
-
### Deployment strategy
|
| 169 |
-
- Front end on **HF Spaces** (Gradio SDK 5.0, `app.py`).
|
| 170 |
-
- Compute on **Modal** (secrets: `HF_TOKEN`, endpoint URLs via `.env`).
|
| 171 |
-
- `keep_warm=1` on image app only during judging window; scale to 0 after.
|
| 172 |
-
|
| 173 |
-
### Performance optimization plan
|
| 174 |
-
- One warm container generates all 6 pages (already designed).
|
| 175 |
-
- bf16 + optional `torch.compile`; turbo step counts; sequential CPU offload fallback.
|
| 176 |
-
- Stream page-by-page UI updates (perceived speed).
|
| 177 |
-
- Pre-generated sample book for instant judge view (non-negotiable).
|
| 178 |
-
- Tiny Mode for sub-10s full books on cheap GPU.
|
| 179 |
-
|
| 180 |
-
---
|
| 181 |
-
|
| 182 |
-
## 4. UI/UX Design Plan
|
| 183 |
-
|
| 184 |
-
### Design philosophy
|
| 185 |
-
"**A warm digital picture book, not a dashboard.**" Tactile, nostalgic, magical — looks hand-made, hides all ML. Every interaction should feel like turning a page, not running a model.
|
| 186 |
-
|
| 187 |
-
### User journeys
|
| 188 |
-
1. **First-time parent (happy path):** land → see sample book glowing → upload kid's drawing → name + theme → "✨ Make my book!" → progress storybook fills page-by-page → narration auto-ready → download PDF / share. <2 min, zero jargon.
|
| 189 |
-
2. **Judge (cold, impatient):** lands on a finished sample book immediately (no generation wait) → clicks "Hear it" → reads the story → optionally generates one live → sees Open-Trace link. Wow in <15s.
|
| 190 |
-
3. **Returning user (v2 vision):** pick a saved character → new adventure → consistent hero.
|
| 191 |
-
|
| 192 |
-
### Wireframe descriptions
|
| 193 |
-
- **Header:** centered title "📚 DoodleBook", subtitle "Draw a character. Get a storybook.", soft paper texture.
|
| 194 |
-
- **Left input card (scale 1):** webcam/upload doodle, character name, hero name, theme dropdown, big orange "Make my book!" CTA, Examples row (loads sample), status line, "⚡ Tiny Mode" toggle.
|
| 195 |
-
- **Right book viewer (scale 2):** large `gr.HTML` book — title page then 6 illustrated text pages with page numbers, yellowed background, serif body; narration audio bar pinned above; "⬇ Download PDF" + "🔗 Share trace" buttons below.
|
| 196 |
-
- **Progress state:** skeleton page slots fill one-by-one with "Illustrating page N of 6…".
|
| 197 |
-
|
| 198 |
-
### Dashboard / layout
|
| 199 |
-
- Single page, two columns desktop; stacked on mobile (input → book).
|
| 200 |
-
- Optional collapsible "🔬 Behind the magic" panel showing prompts/seeds/LoRA (judge candy + Open Trace).
|
| 201 |
-
|
| 202 |
-
### Color palette
|
| 203 |
-
| Token | Hex | Use |
|
| 204 |
-
|---|---|---|
|
| 205 |
-
| Paper | `#FEF9E7` | app background |
|
| 206 |
-
| Page | `#FFFDE7` | book pages |
|
| 207 |
-
| Ink | `#3E2723` | body text |
|
| 208 |
-
| Title brown | `#5D4037` | headings |
|
| 209 |
-
| Crayon orange | `#FF7043` | primary CTA |
|
| 210 |
-
| Sky accent | `#4FC3F7` | secondary/links |
|
| 211 |
-
| Muted | `#BCAAA4` | page numbers/meta |
|
| 212 |
-
|
| 213 |
-
### Typography
|
| 214 |
-
- Display/title: **Georgia / "Fredoka" / "Baloo 2"** (rounded, child-friendly).
|
| 215 |
-
- Body: **Georgia serif** 20–22px, line-height 1.9 (read-aloud comfortable).
|
| 216 |
-
- Avoid system sans defaults — they read as "Gradio".
|
| 217 |
-
|
| 218 |
-
### Accessibility
|
| 219 |
-
- WCAG AA contrast (ink on page passes); 18px+ body.
|
| 220 |
-
- Audio narration = built-in alt for non-readers; captions = page text.
|
| 221 |
-
- All controls keyboard reachable; alt text on every generated image (use page `text`).
|
| 222 |
-
- Respect `prefers-reduced-motion` (disable page-flip animation).
|
| 223 |
-
|
| 224 |
-
### Mobile responsiveness
|
| 225 |
-
- Columns collapse to stack; CTA full-width sticky; webcam capture works on phones (parents photograph the drawing in-app).
|
| 226 |
-
|
| 227 |
-
### Demo-friendly interactions
|
| 228 |
-
- Auto-load sample book on launch (no empty state).
|
| 229 |
-
- Page-by-page streaming reveal (visible progress = perceived magic).
|
| 230 |
-
- One-tap "Play narration".
|
| 231 |
-
- "Tiny Mode" toggle to show edge story live without long waits.
|
| 232 |
-
|
| 233 |
-
---
|
| 234 |
-
|
| 235 |
-
## 5. Gradio Implementation Plan
|
| 236 |
-
|
| 237 |
-
### App structure
|
| 238 |
-
```
|
| 239 |
-
app.py
|
| 240 |
-
├─ config.py # model IDs + fallbacks + params (single source of truth)
|
| 241 |
-
├─ ui/
|
| 242 |
-
│ ├─ layout.py # gr.Blocks layout
|
| 243 |
-
│ └─ custom.css # storybook styling
|
| 244 |
-
├─ services/
|
| 245 |
-
│ ├─ story.py # calls modal_story_gen
|
| 246 |
-
│ ├─ images.py # calls modal_image_gen
|
| 247 |
-
│ ├─ tts.py # calls modal_tts
|
| 248 |
-
│ ├─ book_builder.py # HTML + PDF
|
| 249 |
-
│ └─ trace.py # Open Trace dataset logging
|
| 250 |
-
└─ modal/
|
| 251 |
-
├─ modal_story_gen.py
|
| 252 |
-
├─ modal_image_gen.py
|
| 253 |
-
└─ modal_tts.py
|
| 254 |
-
```
|
| 255 |
-
|
| 256 |
-
### Component hierarchy
|
| 257 |
-
```
|
| 258 |
-
gr.Blocks(css, theme)
|
| 259 |
-
├─ Header (gr.Markdown)
|
| 260 |
-
├─ gr.Row
|
| 261 |
-
│ ├─ gr.Column(scale=1) # inputs
|
| 262 |
-
│ │ ├─ gr.Image(sources=[webcam,upload])
|
| 263 |
-
│ │ ├─ gr.Textbox char_name / hero_name
|
| 264 |
-
│ │ ├─ gr.Dropdown theme
|
| 265 |
-
│ │ ├─ gr.Checkbox tiny_mode
|
| 266 |
-
│ │ ├─ gr.Button "Make my book!" (primary)
|
| 267 |
-
│ │ ├─ gr.Examples (sample)
|
| 268 |
-
│ │ └─ gr.Textbox status (interactive=False)
|
| 269 |
-
│ └─ gr.Column(scale=2) # output
|
| 270 |
-
│ ├─ gr.Audio narration
|
| 271 |
-
│ ├─ gr.HTML book_display
|
| 272 |
-
│ ├─ gr.DownloadButton PDF
|
| 273 |
-
│ └─ gr.Accordion "Behind the magic" (prompts/seeds)
|
| 274 |
-
```
|
| 275 |
-
|
| 276 |
-
### Pages & navigation
|
| 277 |
-
Single page (hackathon-optimal). "Pages" = sections of the book inside the HTML canvas. No router needed.
|
| 278 |
-
|
| 279 |
-
### User interaction flow
|
| 280 |
-
`make_btn.click(create_book, inputs=[...], outputs=[book_html, status, audio, pdf])` — use a **generator function** (`yield`) so status + pages stream in, not one blocking return.
|
| 281 |
-
|
| 282 |
-
### Model integration approach
|
| 283 |
-
- Space process is thin orchestrator; all GPU work via `modal.Function.remote()`.
|
| 284 |
-
- Defensive: every remote call wrapped in try/except → graceful UI error + fallback (base FLUX if no LoRA, template story if JSON fails).
|
| 285 |
-
|
| 286 |
-
### Performance considerations
|
| 287 |
-
- Single warm Modal container per book (6 images batched).
|
| 288 |
-
- `gr.Progress()` for the progress bar; `yield` partial books.
|
| 289 |
-
- Cache sample book in memory at startup.
|
| 290 |
-
|
| 291 |
-
### Deployment on HF Spaces
|
| 292 |
-
- `sdk: gradio`, `sdk_version: "5.0"`, `app_file: app.py` (frontmatter already specified).
|
| 293 |
-
- Secrets: `HF_TOKEN`, `MODAL_ENDPOINT_URL` (or Modal token) in Space settings.
|
| 294 |
-
- Keep Space CPU-only (compute offloaded to Modal) → cheap, always-on.
|
| 295 |
-
|
| 296 |
-
---
|
| 297 |
-
|
| 298 |
-
## 6. Development Roadmap
|
| 299 |
-
|
| 300 |
-
> Effort assumes a single builder + coding agent. Sequence is dependency-ordered.
|
| 301 |
-
|
| 302 |
-
### Phase 1 — Foundation
|
| 303 |
-
- **Tasks:** Verify all model IDs on HF Hub (Task 0, fixes C5); scaffold repo per directory structure; `config.py` with IDs + fallbacks + params; `requirements.txt`; `.env.example`; Modal account + secrets; bare Gradio shell that loads and shows static sample book.
|
| 304 |
-
- **Dependencies:** HF + Modal accounts, tokens.
|
| 305 |
-
- **Effort:** ~0.5 day.
|
| 306 |
-
- **Risks:** Model IDs differ from prompt → fallback wiring matters.
|
| 307 |
-
- **Success:** `app.py` launches locally, shows sample book, `config.py` resolves real model IDs.
|
| 308 |
-
|
| 309 |
-
### Phase 2 — Core Features
|
| 310 |
-
- **Tasks:** `modal_story_gen.py` with 3-layer JSON parser + template fallback (C2); `book_builder.py` HTML; PDF export (`fpdf2`); storybook CSS; wire story→book (text only, placeholder images).
|
| 311 |
-
- **Dependencies:** Phase 1.
|
| 312 |
-
- **Effort:** ~1 day.
|
| 313 |
-
- **Risks:** 1B JSON instability → mitigated by parser + fallback.
|
| 314 |
-
- **Success:** Enter name+theme → get a valid 6-page text book + PDF, no crashes even on bad model output.
|
| 315 |
-
|
| 316 |
-
### Phase 3 — AI Integration
|
| 317 |
-
- **Tasks:** `modal_image_gen.py` FLUX pipeline; Modal Volume model cache + `keep_warm` (C3); seed-lock + doodle image-prompt consistency stack (C1); graceful base-FLUX fallback if no LoRA; `modal_tts.py` VoxCPM2 (+ Kokoro/MeloTTS fallback); full pipeline story→images→audio.
|
| 318 |
-
- **Dependencies:** Phase 2; LoRA may still be training (degrade gracefully).
|
| 319 |
-
- **Effort:** ~1.5 days.
|
| 320 |
-
- **Risks:** Cold starts (C3), VRAM (use A100, CPU offload fallback), model-ID drift.
|
| 321 |
-
- **Success:** End-to-end live book in <2 min warm; consistent character across pages; narration plays.
|
| 322 |
-
|
| 323 |
-
### Phase 4 — UI/UX Enhancement
|
| 324 |
-
- **Tasks:** Streaming page-by-page reveal (`yield`); progress text; "Behind the magic" accordion; Tiny Mode toggle (C4); mobile responsive CSS; Examples auto-load; accessibility pass (alt text, contrast, reduced-motion).
|
| 325 |
-
- **Dependencies:** Phase 3.
|
| 326 |
-
- **Effort:** ~1 day.
|
| 327 |
-
- **Risks:** Gradio streaming quirks; CSS scope leaks.
|
| 328 |
-
- **Success:** Off-Brand-worthy UI; live progress; works on phone; Tiny Mode produces a book fast.
|
| 329 |
-
|
| 330 |
-
### Phase 5 — Optimization
|
| 331 |
-
- **Tasks:** Train + publish crayon style LoRA (Well-Tuned); quantization/turbo settings; Tiny Mode SD-Turbo path; trace logging to HF dataset (Open Trace); pre-generate + commit sample book (6 pages); error hardening.
|
| 332 |
-
- **Dependencies:** Phases 3–4.
|
| 333 |
-
- **Effort:** ~1 day (+ LoRA train time in background).
|
| 334 |
-
- **Risks:** LoRA quality/time → app must run on base model meanwhile (non-negotiable from original).
|
| 335 |
-
- **Success:** LoRA on HF, traces logged, sample book committed, Tiny Mode real, no unhandled errors.
|
| 336 |
-
|
| 337 |
-
### Phase 6 — Submission Preparation
|
| 338 |
-
- **Tasks:** README with exact frontmatter + Tiny Titan argument; record 60-sec demo video (child hearing book); blog post (Field Notes) on FLUX+LoRA consistency; screenshots/GIFs; deploy + smoke test on Spaces; final checklist (§9).
|
| 339 |
-
- **Dependencies:** All prior.
|
| 340 |
-
- **Effort:** ~0.5–1 day.
|
| 341 |
-
- **Risks:** Last-minute deploy breakage → smoke test early, keep sample-book path independent of live compute.
|
| 342 |
-
- **Success:** Public Space loads sample instantly, live gen works, all badges' artifacts published, video submitted.
|
| 343 |
-
|
| 344 |
-
**Total:** ~5–6 focused days. Critical path: Phase 1 (model IDs) → Phase 3 (FLUX+consistency) → Phase 6 (deploy/video).
|
| 345 |
-
|
| 346 |
-
---
|
| 347 |
-
|
| 348 |
-
## 7. Hackathon Strategy
|
| 349 |
-
|
| 350 |
-
### Tracks / awards to target (stack as many as possible)
|
| 351 |
-
| Award | Lever |
|
| 352 |
-
|---|---|
|
| 353 |
-
| Thousand Token Wood podium | Unique doodle→consistent-book concept |
|
| 354 |
-
| Black Forest Labs ($3k) | FLUX.2-klein + published custom LoRA (sparse field) |
|
| 355 |
-
| OpenBMB award | MiniCPM5-1B story + VoxCPM2 narration |
|
| 356 |
-
| Well-Tuned | Published LoRA on HF |
|
| 357 |
-
| Off-Brand ($1,500) | Storybook UI, zero Gradio defaults |
|
| 358 |
-
| Best Demo ($1,000) | Child hearing their drawing narrated |
|
| 359 |
-
| Community Choice | Shareable, emotional, parents repost |
|
| 360 |
-
| Tiny Titan (claimable) | Argue story generator (1B) is the primary AI + real Tiny Mode |
|
| 361 |
-
|
| 362 |
-
### How to maximize scoring
|
| 363 |
-
- One artifact, many badges: every badge needs a concrete published thing (LoRA repo, trace dataset, blog post, off-brand UI) — produce all four.
|
| 364 |
-
- Lead with emotion + sponsor-model usage in README's first 5 lines.
|
| 365 |
-
- Show, don't tell: pre-generated sample book + 60-sec video carry the score even if live gen is slow.
|
| 366 |
-
|
| 367 |
-
### What judges look for
|
| 368 |
-
Working demo, clear sponsor-model use, originality, polish, reproducibility (traces + LoRA), and a story that makes them feel something. DoodleBook is built to hit all six.
|
| 369 |
-
|
| 370 |
-
### Demo strategy
|
| 371 |
-
- Open on the finished sample book (instant wow, no wait).
|
| 372 |
-
- Play narration immediately.
|
| 373 |
-
- Then generate one live (or Tiny Mode) to prove it's real.
|
| 374 |
-
- End on the Open-Trace link + "made by a 1B + 2B small-model brain."
|
| 375 |
-
|
| 376 |
-
### Presentation / storytelling
|
| 377 |
-
Narrative arc: "Kids invent characters every day and we throw them away. Watch what happens when a 1B model gives one a story and FLUX gives it a book — in the child's own art." Personal, concrete, sponsor-forward.
|
| 378 |
-
|
| 379 |
-
### Key differentiators
|
| 380 |
-
Child's real art preserved · cross-page character consistency engineering · full small-model stack · off-brand keepsake UX · reproducible traces + LoRA.
|
| 381 |
-
|
| 382 |
-
---
|
| 383 |
-
|
| 384 |
-
## 8. README Plan
|
| 385 |
-
```
|
| 386 |
-
# 📚 DoodleBook (+ exact HF frontmatter block from original prompt)
|
| 387 |
-
> Elevator pitch: Draw a character → get a narrated, illustrated 6-page storybook in your child's own art.
|
| 388 |
-
1. ✨ Features (consistency, narration, doodle-faithful art, PDF, Tiny Mode, traces)
|
| 389 |
-
2. 🧠 Models & why (table: FLUX.2-klein+LoRA / MiniCPM5-1B / VoxCPM2) + Tiny Titan argument
|
| 390 |
-
3. 🏗️ Architecture (diagram + data flow)
|
| 391 |
-
4. ⚙️ Installation (clone, requirements, Modal setup, HF token, .env)
|
| 392 |
-
5. ▶️ Usage (run app.py, upload doodle, make book)
|
| 393 |
-
6. 🖼️ Screenshots (sample book pages, UI)
|
| 394 |
-
7. 🎬 Demo (60-sec video link + live Space link)
|
| 395 |
-
8. 🔬 Reproducibility (LoRA repo, Open-Trace dataset, seeds)
|
| 396 |
-
9. 🛣️ Future work (character library, voice cloning, print-on-demand, edge app)
|
| 397 |
-
10. 🏅 Hackathon badges (Well-Tuned, Off-Brand, Field Notes, Open Trace)
|
| 398 |
-
11. 📄 License (Apache-2.0 / MIT) · 👥 Contributors
|
| 399 |
-
```
|
| 400 |
-
|
| 401 |
-
## 9. Submission Checklist
|
| 402 |
-
**Code** ☐ app launches clean ☐ all Modal fns callable ☐ graceful fallbacks (no-LoRA, bad-JSON, remote error) ☐ config has verified model IDs + fallbacks
|
| 403 |
-
**Docs** ☐ README + exact frontmatter ☐ install/usage ☐ LoRA reproduce README ☐ Field Notes blog published
|
| 404 |
-
**UI polish** ☐ storybook CSS, no Gradio defaults ☐ mobile ok ☐ accessibility (alt/contrast/reduced-motion) ☐ Examples auto-load
|
| 405 |
-
**Performance** ☐ warm gen <2 min ☐ keep_warm during judging ��� Tiny Mode works ☐ sample book loads instantly
|
| 406 |
-
**Model optimization** ☐ LoRA trained + published ☐ bf16/turbo steps ☐ Tiny Mode SD-Turbo path
|
| 407 |
-
**HF deployment** ☐ Space live ☐ secrets set ☐ smoke test ☐ trace dataset public ☐ LoRA repo public
|
| 408 |
-
**Demo** ☐ 60-sec video (child hearing book) ☐ live Space link ☐ screenshots/GIFs
|
| 409 |
-
**Presentation** ☐ pitch deck/blurb ☐ storytelling script ☐ badge artifacts linked
|
| 410 |
-
**Final validation** ☐ fresh-clone run ☐ cold-open judge path tested ☐ all badge claims have a published URL
|
| 411 |
-
|
| 412 |
-
## 10. Agent Execution Prompt
|
| 413 |
-
See `AGENT_HANDOFF.md` — a self-contained master prompt for Codex / OpenCode / Cursor / Claude Code to build the whole project with the C1–C5 corrections baked in.
|
| 414 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
|
@@ -1,289 +1,292 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: DoodleBook
|
| 3 |
-
emoji: 📚
|
| 4 |
-
colorFrom: yellow
|
| 5 |
-
colorTo: red
|
| 6 |
-
sdk: gradio
|
| 7 |
-
sdk_version: "5.50.0"
|
| 8 |
-
app_file: app.py
|
| 9 |
-
pinned: false
|
| 10 |
-
tags:
|
| 11 |
-
- hackathon
|
| 12 |
-
- build-small
|
| 13 |
-
- adventure-in-thousand-token-wood
|
| 14 |
-
- gradio
|
| 15 |
-
-
|
| 16 |
-
-
|
| 17 |
-
-
|
| 18 |
-
-
|
| 19 |
-
-
|
| 20 |
-
|
| 21 |
-
---
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
-
|
| 35 |
-
-
|
| 36 |
-
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
-
|
| 50 |
-
-
|
| 51 |
-
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
- the
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
|
| 89 |
-
|
|
| 90 |
-
|
|
| 91 |
-
|
|
| 92 |
-
|
|
| 93 |
-
|
|
| 94 |
-
|
|
| 95 |
-
|
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
-
|
| 103 |
-
-
|
| 104 |
-
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
- The
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
-
|
| 124 |
-
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
- The
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
-
|
| 148 |
-
|
| 149 |
-
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
-
|
| 162 |
-
-
|
| 163 |
-
-
|
| 164 |
-
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
- `
|
| 174 |
-
- `
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
-
|
| 182 |
-
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
-
|
| 190 |
-
- Fixed
|
| 191 |
-
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
-
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
|
| 223 |
-
```
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
-
|
| 243 |
-
-
|
| 244 |
-
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
|
| 251 |
-
|
| 252 |
-
|
| 253 |
-
|
| 254 |
-
|
| 255 |
-
|
| 256 |
-
-
|
| 257 |
-
|
| 258 |
-
|
| 259 |
-
|
| 260 |
-
|
| 261 |
-
|
| 262 |
-
|
| 263 |
-
|
| 264 |
-
-
|
| 265 |
-
|
| 266 |
-
|
| 267 |
-
|
| 268 |
-
|
| 269 |
-
|
| 270 |
-
|
| 271 |
-
|
| 272 |
-
|
| 273 |
-
|
| 274 |
-
|
| 275 |
-
|
| 276 |
-
|
| 277 |
-
|
| 278 |
-
|
| 279 |
-
-
|
| 280 |
-
-
|
| 281 |
-
|
| 282 |
-
|
| 283 |
-
|
| 284 |
-
|
| 285 |
-
|
| 286 |
-
|
| 287 |
-
|
| 288 |
-
|
| 289 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: DoodleBook
|
| 3 |
+
emoji: 📚
|
| 4 |
+
colorFrom: yellow
|
| 5 |
+
colorTo: red
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: "5.50.0"
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
tags:
|
| 11 |
+
- hackathon
|
| 12 |
+
- build-small
|
| 13 |
+
- adventure-in-thousand-token-wood
|
| 14 |
+
- gradio
|
| 15 |
+
- flux
|
| 16 |
+
- minicpm
|
| 17 |
+
- voxcpm
|
| 18 |
+
- storybook
|
| 19 |
+
- coloring-book
|
| 20 |
+
models:
|
| 21 |
+
- black-forest-labs/FLUX.2-klein-4B
|
| 22 |
+
- openbmb/MiniCPM5-1B
|
| 23 |
+
- openbmb/VoxCPM2
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
# DoodleBook
|
| 27 |
+
|
| 28 |
+
Draw a character, upload it, and DoodleBook turns it into a narrated six-page picture book plus a matching printable coloring book.
|
| 29 |
+
|
| 30 |
+
The project was built for the Build Small Hackathon 2026. The core idea is to keep the reasoning stack small, use a strong image renderer only where it matters, and make the whole flow feel like a child-facing product instead of a model demo.
|
| 31 |
+
|
| 32 |
+
## What it does
|
| 33 |
+
|
| 34 |
+
- Takes a doodle photo from upload or webcam.
|
| 35 |
+
- Generates a six-page children's story with a consistent hero.
|
| 36 |
+
- Renders six full-color story pages with FLUX.
|
| 37 |
+
- Generates narration audio for the whole book.
|
| 38 |
+
- Exports a story PDF.
|
| 39 |
+
- Generates a matching black-and-white coloring book as a second output.
|
| 40 |
+
|
| 41 |
+
## Current architecture
|
| 42 |
+
|
| 43 |
+
There are two runtime modes in this repo.
|
| 44 |
+
|
| 45 |
+
### 1. Local Modal-backed app
|
| 46 |
+
|
| 47 |
+
Use [run_modal.py](run_modal.py) for the real end-to-end flow during development.
|
| 48 |
+
|
| 49 |
+
- UI: Gradio 5 custom Blocks layout
|
| 50 |
+
- Story: local generator by default, optional Modal MiniCPM route
|
| 51 |
+
- Images: Modal FLUX pipeline
|
| 52 |
+
- TTS: Modal VoxCPM pipeline
|
| 53 |
+
- PDFs: local export
|
| 54 |
+
- Coloring book: direct FLUX line-art render, with traced fallback
|
| 55 |
+
|
| 56 |
+
Start it with:
|
| 57 |
+
|
| 58 |
+
```bash
|
| 59 |
+
python run_modal.py
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
The default local URL is:
|
| 63 |
+
|
| 64 |
+
```text
|
| 65 |
+
http://127.0.0.1:7880
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
### 2. HF Spaces / ZeroGPU-oriented app
|
| 69 |
+
|
| 70 |
+
Use [app.py](app.py) for the official Hugging Face Gradio Space target.
|
| 71 |
+
|
| 72 |
+
- `app.py` is the Space entrypoint declared in the repo metadata.
|
| 73 |
+
- `app_zerogpu.py` is the alternate experimental path kept for local ZeroGPU-focused iteration.
|
| 74 |
+
|
| 75 |
+
## Stack used in the hackathon
|
| 76 |
+
|
| 77 |
+
This project deliberately mixes a small-model reasoning stack, a stronger dedicated image renderer, a custom Gradio presentation layer, and remote inference infrastructure that is cheap enough to demo but strong enough to feel like a real product.
|
| 78 |
+
|
| 79 |
+
The important distinction is:
|
| 80 |
+
|
| 81 |
+
- the app "brain" is small
|
| 82 |
+
- the renderer is specialized
|
| 83 |
+
- the UX is product-shaped, not notebook-shaped
|
| 84 |
+
- the deployment path is built around a Gradio Space front-end
|
| 85 |
+
|
| 86 |
+
### Full stack at a glance
|
| 87 |
+
|
| 88 |
+
| Layer | Stack | Role in the product |
|
| 89 |
+
|---|---|---|
|
| 90 |
+
| Product UI | Gradio 5 Blocks + custom CSS/HTML/JS | Child-facing scrapbook interface, status streaming, downloads |
|
| 91 |
+
| Story engine | MiniCPM5-1B + local structured fallback | Writes the six-page narrative and scene plan |
|
| 92 |
+
| Image engine | FLUX.2-klein-4B on Modal | Draws consistent full-color pages and dedicated coloring pages |
|
| 93 |
+
| Voice engine | VoxCPM2 on Modal | Narrates the full storybook |
|
| 94 |
+
| Coloring engine | Direct FLUX line-art pass + cleanup fallback | Produces printable black-and-white pages |
|
| 95 |
+
| Export layer | Pillow + FPDF | Builds printable story and coloring PDFs |
|
| 96 |
+
| Hosting target | Hugging Face Spaces | Gradio app shell and user entrypoint |
|
| 97 |
+
| Remote compute | Modal | GPU execution for heavy image and TTS work |
|
| 98 |
+
| Observability | Heartbeat streaming + stage timing in trace panel | Keeps long runs visible and debuggable |
|
| 99 |
+
|
| 100 |
+
### Frontend and product shell
|
| 101 |
+
|
| 102 |
+
- Gradio 5
|
| 103 |
+
- Custom scrapbook-style UI in [ui/layout.py](ui/layout.py)
|
| 104 |
+
- HTML-based book rendering in [book_builder.py](book_builder.py)
|
| 105 |
+
- Fixed-position PDF downloads under the status panel
|
| 106 |
+
- Streaming progress heartbeats to keep long jobs alive in the browser
|
| 107 |
+
- File-backed page rendering instead of giant inline base64 payloads
|
| 108 |
+
|
| 109 |
+
### Story generation stack
|
| 110 |
+
|
| 111 |
+
- `openbmb/MiniCPM5-1B`
|
| 112 |
+
- Local fast fallback story generator in [services/story.py](services/story.py)
|
| 113 |
+
- Optional Modal story worker in [modal_workers/modal_story_gen.py](modal_workers/modal_story_gen.py)
|
| 114 |
+
|
| 115 |
+
Why it matters:
|
| 116 |
+
- The story model is the small-model "brain" of the app.
|
| 117 |
+
- It keeps the narrative stack small and hackathon-aligned.
|
| 118 |
+
- The story system outputs both prose and scene prompts, so downstream image generation stays structured.
|
| 119 |
+
- The local structured fallback means the Space can still produce a valid book if the remote story path is unavailable.
|
| 120 |
+
|
| 121 |
+
### Image generation stack
|
| 122 |
+
|
| 123 |
+
- `black-forest-labs/FLUX.2-klein-4B`
|
| 124 |
+
- Modal deployment for image generation in [modal_workers/modal_image_gen.py](modal_workers/modal_image_gen.py)
|
| 125 |
+
- Parallel canonical-character plus per-page render flow in [services/images.py](services/images.py)
|
| 126 |
+
- One canonical character render from the child doodle, then scene-specific page renders
|
| 127 |
+
- Separate direct line-art render path for the coloring book
|
| 128 |
+
|
| 129 |
+
Why it matters:
|
| 130 |
+
- The app needs high visual quality and character consistency.
|
| 131 |
+
- FLUX is used as the renderer, not as the reasoning engine.
|
| 132 |
+
- The character consistency pipeline is what makes the book feel authored rather than randomly reimagined on every page.
|
| 133 |
+
- The line-art renderer is separate because tracing finished crayon pages produced bad coloring results.
|
| 134 |
+
|
| 135 |
+
### TTS stack
|
| 136 |
+
|
| 137 |
+
- `openbmb/VoxCPM2`
|
| 138 |
+
- Modal TTS worker in [modal_workers/modal_tts.py](modal_workers/modal_tts.py)
|
| 139 |
+
- Service wrapper in [services/tts.py](services/tts.py)
|
| 140 |
+
- Parallelized with image generation in the real Modal-backed app
|
| 141 |
+
|
| 142 |
+
Why it matters:
|
| 143 |
+
- Narration is part of the child-facing experience, not a side feature.
|
| 144 |
+
- TTS runs in parallel with image generation in the real local pipeline.
|
| 145 |
+
- Overlapping TTS with illustration time reduces total wait without degrading output quality.
|
| 146 |
+
|
| 147 |
+
### Coloring-book stack
|
| 148 |
+
|
| 149 |
+
- Direct FLUX line-art rendering for the same scenes
|
| 150 |
+
- Cleanup and fallback pipeline in [services/coloring.py](services/coloring.py)
|
| 151 |
+
- Modal `render_coloring_page` for dedicated line-art scene generation
|
| 152 |
+
- Local cleanup for thresholding, despeckling, and printable black-on-white output
|
| 153 |
+
|
| 154 |
+
Why it matters:
|
| 155 |
+
- The main bug fixed in this version was that the coloring book used to trace finished crayon-textured images.
|
| 156 |
+
- The improved pipeline renders dedicated line-art pages instead of trying to strip color out after the fact.
|
| 157 |
+
- This is the main quality improvement that separates the current version from the earlier broken coloring-book output.
|
| 158 |
+
|
| 159 |
+
### Infrastructure stack
|
| 160 |
+
|
| 161 |
+
- Modal for remote GPU inference
|
| 162 |
+
- Hugging Face Spaces as the target host
|
| 163 |
+
- Python 3.11 / 3.13 local development
|
| 164 |
+
- `diffusers`, `transformers`, `torch`, `accelerate`
|
| 165 |
+
- `Pillow`, `OpenCV`, `FPDF`
|
| 166 |
+
- Gradio client-compatible API surface for testing and debugging
|
| 167 |
+
- Hugging Face org deployment target: `build-small-hackathon`
|
| 168 |
+
|
| 169 |
+
### Sponsor and hackathon alignment
|
| 170 |
+
|
| 171 |
+
This app directly reflects the hackathon sponsor/tool stack:
|
| 172 |
+
|
| 173 |
+
- `OpenBMB`: MiniCPM5-1B and VoxCPM2
|
| 174 |
+
- `Black Forest Labs`: FLUX.2-klein-4B
|
| 175 |
+
- `Modal`: remote GPU inference
|
| 176 |
+
- `OpenAI Codex`: debugging, architecture fixes, deployment preparation, README/release work
|
| 177 |
+
- `Hugging Face Spaces`: final Gradio app surface
|
| 178 |
+
|
| 179 |
+
For hackathon judging, the main narrative is:
|
| 180 |
+
|
| 181 |
+
- Tiny Titan reasoning stack
|
| 182 |
+
- Off-brand custom UI
|
| 183 |
+
- Real multimodal product loop
|
| 184 |
+
- Remote GPU orchestration with a Gradio user experience
|
| 185 |
+
- Child-usable output artifacts: storybook PDF, audio, coloring book PDF
|
| 186 |
+
|
| 187 |
+
## Key engineering fixes in this version
|
| 188 |
+
|
| 189 |
+
- Added direct Modal coloring-page rendering with `render_coloring_page`.
|
| 190 |
+
- Fixed the live app to keep the Gradio stream alive during long coloring generation.
|
| 191 |
+
- Added stage timing so story, image, PDF, TTS, and coloring costs are visible.
|
| 192 |
+
- Reduced final-page payload size by replacing giant inline base64 book HTML with file-backed image URLs.
|
| 193 |
+
- Fixed download serving through Gradio temp-file paths.
|
| 194 |
+
- Removed port confusion between the local test app and the real Modal-backed app.
|
| 195 |
+
|
| 196 |
+
## Measured performance
|
| 197 |
+
|
| 198 |
+
Measured against the real local Modal-backed app flow:
|
| 199 |
+
|
| 200 |
+
- Story-only stage: about `0.3s`
|
| 201 |
+
- Full-color book, warm: about `75s to 80s`
|
| 202 |
+
- Full-color book + coloring book, warm: about `200s`
|
| 203 |
+
- Slowest stage: coloring-book generation
|
| 204 |
+
|
| 205 |
+
The current bottleneck is still the coloring-book path, even after the direct line-art fix.
|
| 206 |
+
|
| 207 |
+
## Repository layout
|
| 208 |
+
|
| 209 |
+
```text
|
| 210 |
+
app.py Main Gradio app variant
|
| 211 |
+
app_zerogpu.py ZeroGPU-oriented app variant
|
| 212 |
+
run_modal.py Real local Modal-backed app
|
| 213 |
+
book_builder.py HTML and PDF assembly
|
| 214 |
+
services/ Orchestration and fallbacks
|
| 215 |
+
modal_workers/ Modal remote workers
|
| 216 |
+
ui/ Custom Gradio layout
|
| 217 |
+
assets/ Sample doodles and sample book pages
|
| 218 |
+
docs/ Specs and notes
|
| 219 |
+
```
|
| 220 |
+
|
| 221 |
+
## Local setup
|
| 222 |
+
|
| 223 |
+
```bash
|
| 224 |
+
pip install -r requirements.txt
|
| 225 |
+
python run_modal.py
|
| 226 |
+
```
|
| 227 |
+
|
| 228 |
+
If you want the real Modal-backed app, use `run_modal.py`, not `app.py`.
|
| 229 |
+
|
| 230 |
+
## Hugging Face Space deployment target
|
| 231 |
+
|
| 232 |
+
The intended hosted version is a Gradio Space in the `build-small-hackathon` org.
|
| 233 |
+
|
| 234 |
+
Target format:
|
| 235 |
+
|
| 236 |
+
```text
|
| 237 |
+
build-small-hackathon/DoodleBook
|
| 238 |
+
```
|
| 239 |
+
|
| 240 |
+
Official target configuration:
|
| 241 |
+
|
| 242 |
+
- Hugging Face Space SDK: `gradio`
|
| 243 |
+
- Space entrypoint: `app.py`
|
| 244 |
+
- Hardware target: `ZeroGPU`
|
| 245 |
+
- Space frontend and API live on Hugging Face
|
| 246 |
+
- Local or Spaces-managed inference path should be preferred for the official org deployment
|
| 247 |
+
|
| 248 |
+
Important distinction:
|
| 249 |
+
|
| 250 |
+
- `run_modal.py` is the best local development and debugging path.
|
| 251 |
+
- `app.py` is the correct Hugging Face Space entrypoint.
|
| 252 |
+
- Do not point the Space metadata at `run_modal.py`, because that is the Modal-backed dev runtime rather than the official hosted Gradio runtime.
|
| 253 |
+
|
| 254 |
+
If you choose the Modal-backed hosted variant later, that becomes a different deployment shape and requires secrets.
|
| 255 |
+
|
| 256 |
+
Required secrets only for the Modal-backed hosted variant:
|
| 257 |
+
|
| 258 |
+
- `MODAL_TOKEN_ID`
|
| 259 |
+
- `MODAL_TOKEN_SECRET`
|
| 260 |
+
- any Hugging Face token needed by Modal workers for model pulls
|
| 261 |
+
|
| 262 |
+
Why the Gradio Space + ZeroGPU shape is preferred for the hackathon org:
|
| 263 |
+
|
| 264 |
+
- keeps the user-facing app as a normal Gradio Space
|
| 265 |
+
- matches the official hackathon org publishing model
|
| 266 |
+
- keeps the demo easy to judge, share, and run from the org page
|
| 267 |
+
- avoids depending on a separate private frontend host
|
| 268 |
+
|
| 269 |
+
Tradeoff:
|
| 270 |
+
|
| 271 |
+
- The pure ZeroGPU path is easier to host in the official org.
|
| 272 |
+
- The Modal-backed path currently gives stronger image and TTS quality.
|
| 273 |
+
- The repo keeps both because local quality validation and official hosting have different constraints.
|
| 274 |
+
|
| 275 |
+
## Hackathon fit
|
| 276 |
+
|
| 277 |
+
This project targets the hackathon stack in a deliberate way:
|
| 278 |
+
|
| 279 |
+
- Small-model reasoning for story generation
|
| 280 |
+
- Strong but scoped rendering model for visuals
|
| 281 |
+
- Distinct multimodal outputs: story, illustrations, narration, coloring book
|
| 282 |
+
- Real product UX instead of a bare prompt box
|
| 283 |
+
- Clear deployment story for Hugging Face Spaces plus Modal GPU workers
|
| 284 |
+
|
| 285 |
+
## Contributors
|
| 286 |
+
|
| 287 |
+
- Sushruth S.
|
| 288 |
+
- OpenAI Codex: debugging, architecture fixes, rendering pipeline fixes, README and release preparation
|
| 289 |
+
|
| 290 |
+
## License
|
| 291 |
+
|
| 292 |
+
Apache-2.0. See [LICENSE](LICENSE).
|
app.py
CHANGED
|
@@ -1,726 +1,697 @@
|
|
| 1 |
-
"""
|
| 2 |
-
DoodleBook — HF ZeroGPU Version
|
| 3 |
-
|
| 4 |
-
Free T4 GPU on Hugging Face Spaces!
|
| 5 |
-
No Modal needed.
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
import gradio as gr
|
| 9 |
-
import os
|
| 10 |
-
import sys
|
| 11 |
-
import torch
|
| 12 |
-
try:
|
| 13 |
-
import spaces
|
| 14 |
-
except ModuleNotFoundError:
|
| 15 |
-
# `spaces` only exists on HF ZeroGPU. Off-HF (local/dev) provide a no-op so
|
| 16 |
-
# the app still runs; generation then uses whatever local GPU/CPU exists.
|
| 17 |
-
class _SpacesShim:
|
| 18 |
-
@staticmethod
|
| 19 |
-
def GPU(*args, **kwargs):
|
| 20 |
-
if args and callable(args[0]): # bare @spaces.GPU
|
| 21 |
-
return args[0]
|
| 22 |
-
def deco(fn): # @spaces.GPU(duration=...)
|
| 23 |
-
return fn
|
| 24 |
-
return deco
|
| 25 |
-
spaces = _SpacesShim()
|
| 26 |
-
import json
|
| 27 |
-
import time
|
| 28 |
-
import tempfile
|
| 29 |
-
import logging
|
| 30 |
-
import struct
|
| 31 |
-
import re
|
| 32 |
-
|
| 33 |
-
sys.path.insert(0, os.path.dirname(__file__))
|
| 34 |
-
|
| 35 |
-
from config import (
|
| 36 |
-
FLUX_MODEL, STORY_MODEL, TTS_MODEL,
|
| 37 |
-
GENERATION_PARAMS, SAMPLE_BOOK_PATH, BASE_SEED, page_seed,
|
| 38 |
-
DEFAULT_VOICE, voice_design,
|
| 39 |
-
)
|
| 40 |
-
from book_builder import (
|
| 41 |
-
build_book_html, export_pdf, magic_loader_html,
|
| 42 |
-
build_coloring_html, export_coloring_pdf,
|
| 43 |
-
)
|
| 44 |
-
from ui.layout import create_layout
|
| 45 |
-
|
| 46 |
-
logging.basicConfig(level=logging.INFO)
|
| 47 |
-
logger = logging.getLogger(__name__)
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
"
|
| 64 |
-
)
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
)
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
("
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
("
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
""
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
|
| 223 |
-
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
|
| 251 |
-
|
| 252 |
-
|
| 253 |
-
|
| 254 |
-
|
| 255 |
-
|
| 256 |
-
|
| 257 |
-
|
| 258 |
-
|
| 259 |
-
|
| 260 |
-
|
| 261 |
-
|
| 262 |
-
|
| 263 |
-
|
| 264 |
-
|
| 265 |
-
|
| 266 |
-
|
| 267 |
-
|
| 268 |
-
|
| 269 |
-
|
| 270 |
-
|
| 271 |
-
|
| 272 |
-
|
| 273 |
-
|
| 274 |
-
|
| 275 |
-
|
| 276 |
-
|
| 277 |
-
|
| 278 |
-
|
| 279 |
-
|
| 280 |
-
|
| 281 |
-
|
| 282 |
-
|
| 283 |
-
|
| 284 |
-
|
| 285 |
-
|
| 286 |
-
|
| 287 |
-
|
| 288 |
-
|
| 289 |
-
|
| 290 |
-
|
| 291 |
-
|
| 292 |
-
|
| 293 |
-
|
| 294 |
-
|
| 295 |
-
|
| 296 |
-
|
| 297 |
-
|
| 298 |
-
|
| 299 |
-
|
| 300 |
-
|
| 301 |
-
|
| 302 |
-
|
| 303 |
-
|
| 304 |
-
|
| 305 |
-
|
| 306 |
-
|
| 307 |
-
|
| 308 |
-
|
| 309 |
-
|
| 310 |
-
|
| 311 |
-
|
| 312 |
-
|
| 313 |
-
|
| 314 |
-
|
| 315 |
-
|
| 316 |
-
|
| 317 |
-
|
| 318 |
-
|
| 319 |
-
|
| 320 |
-
|
| 321 |
-
|
| 322 |
-
|
| 323 |
-
)
|
| 324 |
-
|
| 325 |
-
|
| 326 |
-
|
| 327 |
-
|
| 328 |
-
|
| 329 |
-
|
| 330 |
-
|
| 331 |
-
|
| 332 |
-
|
| 333 |
-
|
| 334 |
-
|
| 335 |
-
|
| 336 |
-
|
| 337 |
-
|
| 338 |
-
|
| 339 |
-
|
| 340 |
-
|
| 341 |
-
|
| 342 |
-
|
| 343 |
-
|
| 344 |
-
|
| 345 |
-
|
| 346 |
-
|
| 347 |
-
|
| 348 |
-
|
| 349 |
-
|
| 350 |
-
|
| 351 |
-
|
| 352 |
-
|
| 353 |
-
|
| 354 |
-
|
| 355 |
-
|
| 356 |
-
|
| 357 |
-
|
| 358 |
-
|
| 359 |
-
|
| 360 |
-
|
| 361 |
-
|
| 362 |
-
|
| 363 |
-
|
| 364 |
-
|
| 365 |
-
|
| 366 |
-
|
| 367 |
-
|
| 368 |
-
|
| 369 |
-
|
| 370 |
-
|
| 371 |
-
|
| 372 |
-
|
| 373 |
-
|
| 374 |
-
|
| 375 |
-
|
| 376 |
-
|
| 377 |
-
|
| 378 |
-
images.append(
|
| 379 |
-
logger.info(f"Generated page {i+1}/
|
| 380 |
-
|
| 381 |
-
|
| 382 |
-
|
| 383 |
-
|
| 384 |
-
|
| 385 |
-
|
| 386 |
-
|
| 387 |
-
|
| 388 |
-
|
| 389 |
-
|
| 390 |
-
|
| 391 |
-
|
| 392 |
-
|
| 393 |
-
|
| 394 |
-
|
| 395 |
-
|
| 396 |
-
|
| 397 |
-
|
| 398 |
-
|
| 399 |
-
|
| 400 |
-
|
| 401 |
-
|
| 402 |
-
|
| 403 |
-
|
| 404 |
-
|
| 405 |
-
|
| 406 |
-
|
| 407 |
-
|
| 408 |
-
|
| 409 |
-
|
| 410 |
-
|
| 411 |
-
|
| 412 |
-
|
| 413 |
-
|
| 414 |
-
|
| 415 |
-
|
| 416 |
-
|
| 417 |
-
|
| 418 |
-
|
| 419 |
-
|
| 420 |
-
|
| 421 |
-
|
| 422 |
-
|
| 423 |
-
|
| 424 |
-
|
| 425 |
-
|
| 426 |
-
|
| 427 |
-
|
| 428 |
-
|
| 429 |
-
|
| 430 |
-
|
| 431 |
-
|
| 432 |
-
|
| 433 |
-
|
| 434 |
-
|
| 435 |
-
|
| 436 |
-
|
| 437 |
-
|
| 438 |
-
|
| 439 |
-
|
| 440 |
-
|
| 441 |
-
|
| 442 |
-
|
| 443 |
-
|
| 444 |
-
|
| 445 |
-
|
| 446 |
-
|
| 447 |
-
|
| 448 |
-
|
| 449 |
-
|
| 450 |
-
|
| 451 |
-
|
| 452 |
-
|
| 453 |
-
|
| 454 |
-
|
| 455 |
-
|
| 456 |
-
|
| 457 |
-
|
| 458 |
-
|
| 459 |
-
|
| 460 |
-
|
| 461 |
-
|
| 462 |
-
|
| 463 |
-
|
| 464 |
-
|
| 465 |
-
|
| 466 |
-
|
| 467 |
-
|
| 468 |
-
|
| 469 |
-
|
| 470 |
-
|
| 471 |
-
|
| 472 |
-
|
| 473 |
-
|
| 474 |
-
|
| 475 |
-
|
| 476 |
-
|
| 477 |
-
|
| 478 |
-
|
| 479 |
-
|
| 480 |
-
|
| 481 |
-
|
| 482 |
-
|
| 483 |
-
|
| 484 |
-
|
| 485 |
-
|
| 486 |
-
|
| 487 |
-
|
| 488 |
-
|
| 489 |
-
|
| 490 |
-
|
| 491 |
-
|
| 492 |
-
|
| 493 |
-
|
| 494 |
-
|
| 495 |
-
|
| 496 |
-
|
| 497 |
-
|
| 498 |
-
|
| 499 |
-
|
| 500 |
-
|
| 501 |
-
|
| 502 |
-
|
| 503 |
-
|
| 504 |
-
|
| 505 |
-
|
| 506 |
-
|
| 507 |
-
|
| 508 |
-
|
| 509 |
-
|
| 510 |
-
|
| 511 |
-
|
| 512 |
-
"
|
| 513 |
-
|
| 514 |
-
|
| 515 |
-
|
| 516 |
-
|
| 517 |
-
|
| 518 |
-
|
| 519 |
-
|
| 520 |
-
|
| 521 |
-
|
| 522 |
-
|
| 523 |
-
|
| 524 |
-
|
| 525 |
-
|
| 526 |
-
|
| 527 |
-
|
| 528 |
-
|
| 529 |
-
|
| 530 |
-
|
| 531 |
-
|
| 532 |
-
|
| 533 |
-
|
| 534 |
-
|
| 535 |
-
|
| 536 |
-
|
| 537 |
-
|
| 538 |
-
|
| 539 |
-
|
| 540 |
-
|
| 541 |
-
|
| 542 |
-
|
| 543 |
-
|
| 544 |
-
|
| 545 |
-
|
| 546 |
-
|
| 547 |
-
|
| 548 |
-
|
| 549 |
-
|
| 550 |
-
|
| 551 |
-
|
| 552 |
-
|
| 553 |
-
|
| 554 |
-
|
| 555 |
-
|
| 556 |
-
|
| 557 |
-
|
| 558 |
-
|
| 559 |
-
|
| 560 |
-
|
| 561 |
-
|
| 562 |
-
|
| 563 |
-
|
| 564 |
-
|
| 565 |
-
|
| 566 |
-
|
| 567 |
-
|
| 568 |
-
|
| 569 |
-
|
| 570 |
-
|
| 571 |
-
|
| 572 |
-
|
| 573 |
-
|
| 574 |
-
|
| 575 |
-
|
| 576 |
-
|
| 577 |
-
|
| 578 |
-
|
| 579 |
-
|
| 580 |
-
|
| 581 |
-
|
| 582 |
-
|
| 583 |
-
|
| 584 |
-
|
| 585 |
-
|
| 586 |
-
|
| 587 |
-
|
| 588 |
-
|
| 589 |
-
|
| 590 |
-
|
| 591 |
-
|
| 592 |
-
|
| 593 |
-
|
| 594 |
-
|
| 595 |
-
|
| 596 |
-
|
| 597 |
-
|
| 598 |
-
|
| 599 |
-
|
| 600 |
-
|
| 601 |
-
|
| 602 |
-
|
| 603 |
-
|
| 604 |
-
|
| 605 |
-
|
| 606 |
-
|
| 607 |
-
|
| 608 |
-
|
| 609 |
-
|
| 610 |
-
|
| 611 |
-
|
| 612 |
-
|
| 613 |
-
|
| 614 |
-
|
| 615 |
-
|
| 616 |
-
|
| 617 |
-
|
| 618 |
-
|
| 619 |
-
|
| 620 |
-
|
| 621 |
-
|
| 622 |
-
|
| 623 |
-
|
| 624 |
-
|
| 625 |
-
|
| 626 |
-
|
| 627 |
-
|
| 628 |
-
|
| 629 |
-
|
| 630 |
-
|
| 631 |
-
|
| 632 |
-
|
| 633 |
-
|
| 634 |
-
|
| 635 |
-
|
| 636 |
-
|
| 637 |
-
|
| 638 |
-
|
| 639 |
-
|
| 640 |
-
|
| 641 |
-
|
| 642 |
-
|
| 643 |
-
|
| 644 |
-
|
| 645 |
-
|
| 646 |
-
|
| 647 |
-
|
| 648 |
-
|
| 649 |
-
|
| 650 |
-
|
| 651 |
-
|
| 652 |
-
|
| 653 |
-
|
| 654 |
-
|
| 655 |
-
|
| 656 |
-
|
| 657 |
-
|
| 658 |
-
|
| 659 |
-
|
| 660 |
-
|
| 661 |
-
|
| 662 |
-
|
| 663 |
-
|
| 664 |
-
|
| 665 |
-
|
| 666 |
-
|
| 667 |
-
|
| 668 |
-
|
| 669 |
-
|
| 670 |
-
|
| 671 |
-
|
| 672 |
-
|
| 673 |
-
|
| 674 |
-
|
| 675 |
-
|
| 676 |
-
|
| 677 |
-
|
| 678 |
-
|
| 679 |
-
|
| 680 |
-
|
| 681 |
-
|
| 682 |
-
|
| 683 |
-
|
| 684 |
-
|
| 685 |
-
|
| 686 |
-
|
| 687 |
-
|
| 688 |
-
|
| 689 |
-
|
| 690 |
-
|
| 691 |
-
|
| 692 |
-
|
| 693 |
-
|
| 694 |
-
|
| 695 |
-
|
| 696 |
-
|
| 697 |
-
|
| 698 |
-
pdf_update = gr.update(value=pdf_path) if pdf_path else _keep
|
| 699 |
-
coloring_pdf_update = gr.update(value=coloring_pdf_path) if coloring_pdf_path else _keep
|
| 700 |
-
coloring_display_update = (gr.update(visible=True, value=coloring_html) if coloring_html
|
| 701 |
-
else _no)
|
| 702 |
-
|
| 703 |
-
yield (
|
| 704 |
-
book_html,
|
| 705 |
-
f"Complete: {title} — {len(img_bytes)} pages · {'FLUX (ZeroGPU)' if engine == 'flux' else 'local sketch fallback'} · voice: {voice} · total {trace_data['total_sec']}s",
|
| 706 |
-
audio_path,
|
| 707 |
-
pdf_update,
|
| 708 |
-
story,
|
| 709 |
-
f"Pages: {len(img_bytes)} | Seed: {BASE_SEED} | Mode: {'Tiny' if tiny_mode else 'Standard'} | Engine: {engine} | Story {trace_data.get('story_sec', 0)}s | Images {trace_data.get('images_sec', 0)}s | PDF {trace_data.get('pdf_sec', 0)}s | Coloring {trace_data.get('coloring_sec', 0)}s",
|
| 710 |
-
json.dumps(trace_data, indent=2),
|
| 711 |
-
coloring_display_update,
|
| 712 |
-
coloring_pdf_update,
|
| 713 |
-
)
|
| 714 |
-
|
| 715 |
-
|
| 716 |
-
# ============================================================================
|
| 717 |
-
# MAIN
|
| 718 |
-
# ============================================================================
|
| 719 |
-
|
| 720 |
-
if __name__ == "__main__":
|
| 721 |
-
demo = create_layout(
|
| 722 |
-
load_sample_fn=load_sample_book,
|
| 723 |
-
create_book_fn=create_book,
|
| 724 |
-
)
|
| 725 |
-
demo.queue(default_concurrency_limit=2, max_size=8)
|
| 726 |
-
demo.launch(share=False, allowed_paths=[tempfile.gettempdir()])
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
DoodleBook — HF ZeroGPU Version
|
| 3 |
+
|
| 4 |
+
Free T4 GPU on Hugging Face Spaces!
|
| 5 |
+
No Modal needed.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import gradio as gr
|
| 9 |
+
import os
|
| 10 |
+
import sys
|
| 11 |
+
import torch
|
| 12 |
+
try:
|
| 13 |
+
import spaces
|
| 14 |
+
except ModuleNotFoundError:
|
| 15 |
+
# `spaces` only exists on HF ZeroGPU. Off-HF (local/dev) provide a no-op so
|
| 16 |
+
# the app still runs; generation then uses whatever local GPU/CPU exists.
|
| 17 |
+
class _SpacesShim:
|
| 18 |
+
@staticmethod
|
| 19 |
+
def GPU(*args, **kwargs):
|
| 20 |
+
if args and callable(args[0]): # bare @spaces.GPU
|
| 21 |
+
return args[0]
|
| 22 |
+
def deco(fn): # @spaces.GPU(duration=...)
|
| 23 |
+
return fn
|
| 24 |
+
return deco
|
| 25 |
+
spaces = _SpacesShim()
|
| 26 |
+
import json
|
| 27 |
+
import time
|
| 28 |
+
import tempfile
|
| 29 |
+
import logging
|
| 30 |
+
import struct
|
| 31 |
+
import re
|
| 32 |
+
|
| 33 |
+
sys.path.insert(0, os.path.dirname(__file__))
|
| 34 |
+
|
| 35 |
+
from config import (
|
| 36 |
+
FLUX_MODEL, STORY_MODEL, TTS_MODEL,
|
| 37 |
+
GENERATION_PARAMS, SAMPLE_BOOK_PATH, BASE_SEED, page_seed,
|
| 38 |
+
DEFAULT_VOICE, voice_design,
|
| 39 |
+
)
|
| 40 |
+
from book_builder import (
|
| 41 |
+
build_book_html, export_pdf, magic_loader_html,
|
| 42 |
+
build_coloring_html, export_coloring_pdf,
|
| 43 |
+
)
|
| 44 |
+
from ui.layout import create_layout
|
| 45 |
+
|
| 46 |
+
logging.basicConfig(level=logging.INFO)
|
| 47 |
+
logger = logging.getLogger(__name__)
|
| 48 |
+
|
| 49 |
+
# ZeroGPU sets SPACES_ZERO_GPU. On the Space we load models on cuda at IMPORT
|
| 50 |
+
# (a CUDA-emulation layer makes that work without a real GPU); lazy-loading
|
| 51 |
+
# inside @spaces.GPU is explicitly discouraged and was why FLUX kept failing
|
| 52 |
+
# → sketch. Guarded so a local/dev import doesn't try to pull ~20GB of weights.
|
| 53 |
+
ON_ZEROGPU = bool(os.environ.get("SPACES_ZERO_GPU"))
|
| 54 |
+
|
| 55 |
+
_FLUX_PIPE = None
|
| 56 |
+
_STORY_MODEL = None
|
| 57 |
+
_STORY_TOKENIZER = None
|
| 58 |
+
_TTS_MODEL = None
|
| 59 |
+
_LOAD_ERRORS = {}
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
def load_flux():
|
| 63 |
+
"""FLUX image pipeline placed on cuda at module scope (the ZeroGPU pattern).
|
| 64 |
+
No enable_model_cpu_offload() — that fights ZeroGPU's device management."""
|
| 65 |
+
global _FLUX_PIPE
|
| 66 |
+
if _FLUX_PIPE is None:
|
| 67 |
+
from diffusers import Flux2KleinPipeline
|
| 68 |
+
logger.info(f"Loading image model: {FLUX_MODEL.hub_id}")
|
| 69 |
+
pipe = Flux2KleinPipeline.from_pretrained(
|
| 70 |
+
FLUX_MODEL.hub_id, torch_dtype=torch.bfloat16,
|
| 71 |
+
)
|
| 72 |
+
pipe.to("cuda")
|
| 73 |
+
_FLUX_PIPE = pipe
|
| 74 |
+
return _FLUX_PIPE
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
def load_story():
|
| 78 |
+
global _STORY_MODEL, _STORY_TOKENIZER
|
| 79 |
+
if _STORY_MODEL is None:
|
| 80 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 81 |
+
logger.info(f"Loading story model: {STORY_MODEL.hub_id}")
|
| 82 |
+
_STORY_TOKENIZER = AutoTokenizer.from_pretrained(
|
| 83 |
+
STORY_MODEL.hub_id, trust_remote_code=True,
|
| 84 |
+
)
|
| 85 |
+
_STORY_MODEL = AutoModelForCausalLM.from_pretrained(
|
| 86 |
+
STORY_MODEL.hub_id, torch_dtype=torch.float16, trust_remote_code=True,
|
| 87 |
+
).to("cuda").eval()
|
| 88 |
+
return _STORY_MODEL, _STORY_TOKENIZER
|
| 89 |
+
|
| 90 |
+
|
| 91 |
+
def load_tts():
|
| 92 |
+
global _TTS_MODEL
|
| 93 |
+
if _TTS_MODEL is None:
|
| 94 |
+
from voxcpm import VoxCPM
|
| 95 |
+
logger.info(f"Loading TTS model: {TTS_MODEL.hub_id}")
|
| 96 |
+
_TTS_MODEL = VoxCPM.from_pretrained(TTS_MODEL.hub_id, load_denoiser=False)
|
| 97 |
+
return _TTS_MODEL
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
if ON_ZEROGPU:
|
| 101 |
+
for _name, _loader in (("flux", load_flux), ("story", load_story), ("tts", load_tts)):
|
| 102 |
+
try:
|
| 103 |
+
_loader()
|
| 104 |
+
except Exception as _e: # keep the Space booting
|
| 105 |
+
_LOAD_ERRORS[_name] = repr(_e)
|
| 106 |
+
logger.exception(f"Module-level load failed for {_name}")
|
| 107 |
+
|
| 108 |
+
COLOR_ART_STYLE = (
|
| 109 |
+
"children's crayon storybook illustration, bold black outlines, "
|
| 110 |
+
"flat bright colors, simple shapes"
|
| 111 |
+
)
|
| 112 |
+
COLOR_PAGE_SUFFIX = "full colorful background scene, the character clearly visible."
|
| 113 |
+
LINE_ART_STYLE = (
|
| 114 |
+
"children's coloring book page, pure black ink outlines on pure white paper, "
|
| 115 |
+
"clean contour lines, no color, no gray, no shading, no texture, "
|
| 116 |
+
"no hatching, no pencil marks, open spaces to color"
|
| 117 |
+
)
|
| 118 |
+
LINE_ART_SUFFIX = (
|
| 119 |
+
"simple clean background shapes, same composition, thick readable outlines, "
|
| 120 |
+
"no filled black areas, no extra sketch marks."
|
| 121 |
+
)
|
| 122 |
+
|
| 123 |
+
THEME_TEMPLATES = {
|
| 124 |
+
"brave adventure": [
|
| 125 |
+
("{hero} loved exploring new places.", "{hero} standing at the start of a bright adventure trail"),
|
| 126 |
+
("One morning, {hero} discovered something glowing nearby.", "{hero} spotting a magical glow in the distance"),
|
| 127 |
+
("Taking a deep breath, {hero} bravely went closer.", "{hero} walking forward with courage"),
|
| 128 |
+
("There, a new friend needed help.", "{hero} finding a small friend in trouble"),
|
| 129 |
+
("{hero} helped with kindness and a clever idea.", "{hero} helping the friend together"),
|
| 130 |
+
("Everyone cheered, and {hero} felt proud and brave.", "{hero} celebrating at sunset with the new friend"),
|
| 131 |
+
],
|
| 132 |
+
"making a new friend": [
|
| 133 |
+
("{hero} was playing alone in a sunny place.", "{hero} playing under a bright sky"),
|
| 134 |
+
("Then {hero} noticed someone shy nearby.", "{hero} seeing a shy new friend nearby"),
|
| 135 |
+
("{hero} smiled and said hello.", "{hero} waving with a friendly smile"),
|
| 136 |
+
("Soon they were sharing stories and laughs.", "{hero} and the new friend laughing together"),
|
| 137 |
+
("They played games all afternoon.", "{hero} and the new friend playing together"),
|
| 138 |
+
("By sunset, {hero} had made a wonderful new friend.", "{hero} and the new friend smiling together at sunset"),
|
| 139 |
+
],
|
| 140 |
+
}
|
| 141 |
+
|
| 142 |
+
FEW_SHOT_EXEMPLAR = """
|
| 143 |
+
Write a 6-page children's storybook for age 5 about Luna the cat with theme: brave adventure.
|
| 144 |
+
|
| 145 |
+
Return ONLY valid JSON:
|
| 146 |
+
{
|
| 147 |
+
"title": "Luna's Brave Adventure",
|
| 148 |
+
"character_description": "A small orange tabby cat named Luna with big green eyes, whiskers, and a tiny red scarf",
|
| 149 |
+
"pages": [
|
| 150 |
+
{"page": 1, "text": "Luna was a small orange cat who loved to explore.", "scene": "Luna sitting by the window looking outside"},
|
| 151 |
+
{"page": 2, "text": "One sunny morning, Luna saw something sparkling in the forest.", "scene": "Luna spotting a glow in the trees"},
|
| 152 |
+
{"page": 3, "text": "Bravely, Luna crept into the forest to investigate.", "scene": "Luna walking cautiously through trees"},
|
| 153 |
+
{"page": 4, "text": "It was a tiny fairy stuck in a spider web!", "scene": "Luna discovering a fairy in trouble"},
|
| 154 |
+
{"page": 5, "text": "Luna gently freed the fairy with her paw.", "scene": "Luna carefully helping the fairy"},
|
| 155 |
+
{"page": 6, "text": "The fairy thanked Luna and they became friends forever.", "scene": "Luna and fairy playing together at sunset"}
|
| 156 |
+
]
|
| 157 |
+
}
|
| 158 |
+
"""
|
| 159 |
+
|
| 160 |
+
|
| 161 |
+
def build_story_prompt(hero_name: str, theme: str, age: int) -> str:
|
| 162 |
+
return f"""{FEW_SHOT_EXEMPLAR}
|
| 163 |
+
|
| 164 |
+
Write a 6-page children's storybook for age {age} about {hero_name} with theme: {theme}.
|
| 165 |
+
|
| 166 |
+
Return ONLY valid JSON:
|
| 167 |
+
"""
|
| 168 |
+
|
| 169 |
+
|
| 170 |
+
def _validate_story_structure(story: dict) -> bool:
|
| 171 |
+
required_keys = ["title", "character_description", "pages"]
|
| 172 |
+
if not all(k in story for k in required_keys):
|
| 173 |
+
return False
|
| 174 |
+
pages = story.get("pages", [])
|
| 175 |
+
if not isinstance(pages, list) or len(pages) < 1:
|
| 176 |
+
return False
|
| 177 |
+
first_page = pages[0]
|
| 178 |
+
return all(k in first_page for k in ["page", "text", "scene"])
|
| 179 |
+
|
| 180 |
+
|
| 181 |
+
def _repair_json(json_str: str) -> str:
|
| 182 |
+
json_str = re.sub(r',\s*([}\]])', r'\1', json_str)
|
| 183 |
+
json_str = re.sub(r'//.*?$', '', json_str, flags=re.MULTILINE)
|
| 184 |
+
json_str = re.sub(r'/\*[\s\S]*?\*/', '', json_str)
|
| 185 |
+
json_str = re.sub(r'(?<=")\n(?=")', '\\n', json_str)
|
| 186 |
+
json_str = re.sub(r'(\s)(\w+)(\s*:)', r'\1"\2"\3', json_str)
|
| 187 |
+
return json_str
|
| 188 |
+
|
| 189 |
+
|
| 190 |
+
def parse_story_json(raw_output: str) -> dict | None:
|
| 191 |
+
match = re.search(r'\{[\s\S]*\}', raw_output or "")
|
| 192 |
+
if not match:
|
| 193 |
+
return None
|
| 194 |
+
raw_json = match.group(0)
|
| 195 |
+
for candidate in (raw_json, _repair_json(raw_json)):
|
| 196 |
+
try:
|
| 197 |
+
story = json.loads(candidate)
|
| 198 |
+
if _validate_story_structure(story):
|
| 199 |
+
return story
|
| 200 |
+
except Exception:
|
| 201 |
+
continue
|
| 202 |
+
return None
|
| 203 |
+
|
| 204 |
+
|
| 205 |
+
def _normalize_story(story: dict) -> dict:
|
| 206 |
+
pages = list(story.get("pages", []))[:6]
|
| 207 |
+
while len(pages) < 6:
|
| 208 |
+
pages.append({
|
| 209 |
+
"page": len(pages) + 1,
|
| 210 |
+
"text": "And the adventure continued happily.",
|
| 211 |
+
"scene": "Continuing adventure",
|
| 212 |
+
})
|
| 213 |
+
story["pages"] = pages
|
| 214 |
+
story.setdefault("title", "A Wonderful Adventure")
|
| 215 |
+
story.setdefault(
|
| 216 |
+
"character_description",
|
| 217 |
+
"A friendly children's storybook hero with bright colors and cheerful features",
|
| 218 |
+
)
|
| 219 |
+
return story
|
| 220 |
+
|
| 221 |
+
|
| 222 |
+
def build_story_locally(hero_name: str, theme: str) -> dict:
|
| 223 |
+
"""Fast, deterministic fallback story that avoids any Modal dependency."""
|
| 224 |
+
hero = (hero_name or "Little Hero").strip() or "Little Hero"
|
| 225 |
+
beats = THEME_TEMPLATES.get(theme, THEME_TEMPLATES["brave adventure"])
|
| 226 |
+
pages = [
|
| 227 |
+
{"page": i + 1, "text": text.format(hero=hero), "scene": scene.format(hero=hero)}
|
| 228 |
+
for i, (text, scene) in enumerate(beats)
|
| 229 |
+
]
|
| 230 |
+
return {
|
| 231 |
+
"title": f"{hero}'s Storybook Adventure",
|
| 232 |
+
"character_description": (
|
| 233 |
+
f"{hero}, a friendly children's storybook hero with bright colors, "
|
| 234 |
+
"bold outlines, and a cheerful expressive face"
|
| 235 |
+
),
|
| 236 |
+
"pages": pages,
|
| 237 |
+
}
|
| 238 |
+
|
| 239 |
+
|
| 240 |
+
def silent_wav_bytes(duration_seconds: int = 2, sample_rate: int = 24000) -> bytes:
|
| 241 |
+
"""Return a short silent WAV so the UI remains stable if TTS is unavailable."""
|
| 242 |
+
num_samples = sample_rate * duration_seconds
|
| 243 |
+
data_size = num_samples * 2
|
| 244 |
+
header = struct.pack(
|
| 245 |
+
"<4sI4s4sIHHIIHH4sI",
|
| 246 |
+
b"RIFF", 36 + data_size, b"WAVE",
|
| 247 |
+
b"fmt ", 16, 1, 1, sample_rate, sample_rate * 2, 2, 16,
|
| 248 |
+
b"data", data_size,
|
| 249 |
+
)
|
| 250 |
+
return header + (b"\x00" * data_size)
|
| 251 |
+
|
| 252 |
+
|
| 253 |
+
def _with_heartbeat(blocking_fn, frame_fn, poll=4.0):
|
| 254 |
+
import threading
|
| 255 |
+
|
| 256 |
+
box = {}
|
| 257 |
+
|
| 258 |
+
def _run():
|
| 259 |
+
try:
|
| 260 |
+
box["val"] = blocking_fn()
|
| 261 |
+
except BaseException as e:
|
| 262 |
+
box["err"] = e
|
| 263 |
+
|
| 264 |
+
th = threading.Thread(target=_run, daemon=True)
|
| 265 |
+
th.start()
|
| 266 |
+
t0 = time.time()
|
| 267 |
+
while th.is_alive():
|
| 268 |
+
th.join(timeout=poll)
|
| 269 |
+
if th.is_alive():
|
| 270 |
+
yield ("hb", frame_fn(int(time.time() - t0)))
|
| 271 |
+
if "err" in box:
|
| 272 |
+
raise box["err"]
|
| 273 |
+
yield ("done", box["val"])
|
| 274 |
+
|
| 275 |
+
|
| 276 |
+
# ============================================================================
|
| 277 |
+
# SAMPLE BOOK (loads instantly, no GPU needed)
|
| 278 |
+
# ============================================================================
|
| 279 |
+
|
| 280 |
+
SAMPLE_BOOK_HTML = None
|
| 281 |
+
|
| 282 |
+
def load_sample_book() -> str:
|
| 283 |
+
"""Load pre-generated sample book (C3: always ship sample)."""
|
| 284 |
+
global SAMPLE_BOOK_HTML
|
| 285 |
+
if SAMPLE_BOOK_HTML:
|
| 286 |
+
return SAMPLE_BOOK_HTML
|
| 287 |
+
|
| 288 |
+
sample_path = os.path.join(SAMPLE_BOOK_PATH, "sample.html")
|
| 289 |
+
if os.path.exists(sample_path):
|
| 290 |
+
with open(sample_path, "r", encoding="utf-8") as f:
|
| 291 |
+
SAMPLE_BOOK_HTML = f.read()
|
| 292 |
+
return SAMPLE_BOOK_HTML
|
| 293 |
+
|
| 294 |
+
return "<div class='page-loading'>Loading sample book...</div>"
|
| 295 |
+
|
| 296 |
+
|
| 297 |
+
# ============================================================================
|
| 298 |
+
# ZEROGPU INFERENCE FUNCTIONS
|
| 299 |
+
# ============================================================================
|
| 300 |
+
|
| 301 |
+
@spaces.GPU(duration=60)
|
| 302 |
+
def generate_story_gpu(hero_name: str, theme: str, age: int = 5) -> dict:
|
| 303 |
+
"""Generate a story on ZeroGPU, falling back to a deterministic local story."""
|
| 304 |
+
try:
|
| 305 |
+
model, tok = load_story()
|
| 306 |
+
prompt = build_story_prompt(hero_name, theme, age)
|
| 307 |
+
inputs = tok.apply_chat_template(
|
| 308 |
+
[{"role": "user", "content": prompt}],
|
| 309 |
+
add_generation_prompt=True,
|
| 310 |
+
enable_thinking=False,
|
| 311 |
+
return_dict=True,
|
| 312 |
+
return_tensors="pt",
|
| 313 |
+
).to("cuda")
|
| 314 |
+
with torch.no_grad():
|
| 315 |
+
out = model.generate(
|
| 316 |
+
**inputs,
|
| 317 |
+
max_new_tokens=GENERATION_PARAMS.max_story_tokens,
|
| 318 |
+
do_sample=False,
|
| 319 |
+
)
|
| 320 |
+
response = tok.decode(
|
| 321 |
+
out[0][inputs["input_ids"].shape[1]:],
|
| 322 |
+
skip_special_tokens=True,
|
| 323 |
+
)
|
| 324 |
+
parsed = parse_story_json(response)
|
| 325 |
+
if parsed:
|
| 326 |
+
return _normalize_story(parsed)
|
| 327 |
+
logger.warning("Story parser failed; using deterministic local fallback")
|
| 328 |
+
except Exception as e:
|
| 329 |
+
logger.warning(f"ZeroGPU story generation failed: {e}")
|
| 330 |
+
return _normalize_story(build_story_locally(hero_name, theme))
|
| 331 |
+
|
| 332 |
+
|
| 333 |
+
@spaces.GPU(duration=150)
|
| 334 |
+
def generate_images_gpu(
|
| 335 |
+
character_desc: str,
|
| 336 |
+
scenes: list,
|
| 337 |
+
doodle_bytes: bytes = None,
|
| 338 |
+
seed: int = 42,
|
| 339 |
+
) -> list:
|
| 340 |
+
"""Generate all story pages with FLUX on ZeroGPU (two-stage: canonical
|
| 341 |
+
character from the doodle, then the same character in each scene)."""
|
| 342 |
+
import io
|
| 343 |
+
from PIL import Image
|
| 344 |
+
|
| 345 |
+
pipe = load_flux()
|
| 346 |
+
num_steps, guidance = 6, 1.0
|
| 347 |
+
|
| 348 |
+
canonical = None
|
| 349 |
+
if doodle_bytes:
|
| 350 |
+
try:
|
| 351 |
+
ref = Image.open(io.BytesIO(doodle_bytes)).convert("RGB")
|
| 352 |
+
canonical = pipe(
|
| 353 |
+
prompt=(f"Turn this child's drawing into a clean, friendly, full-body cartoon "
|
| 354 |
+
f"character for a children's storybook. Keep the EXACT same creature, "
|
| 355 |
+
f"face, and features as the drawing. {COLOR_ART_STYLE}, "
|
| 356 |
+
f"plain white background, full character visible, centered."),
|
| 357 |
+
image=ref, height=768, width=768, guidance_scale=guidance,
|
| 358 |
+
num_inference_steps=num_steps,
|
| 359 |
+
generator=torch.Generator("cuda").manual_seed(seed),
|
| 360 |
+
).images[0]
|
| 361 |
+
logger.info("Canonical character built from doodle")
|
| 362 |
+
except Exception as e:
|
| 363 |
+
logger.warning(f"Canonical build failed ({e}); text2img fallback")
|
| 364 |
+
canonical = None
|
| 365 |
+
|
| 366 |
+
images = []
|
| 367 |
+
for i, scene in enumerate(scenes):
|
| 368 |
+
if canonical is not None:
|
| 369 |
+
prompt = f"The same character. {scene}. {COLOR_ART_STYLE}, {COLOR_PAGE_SUFFIX}"
|
| 370 |
+
kw = dict(image=canonical, prompt=prompt)
|
| 371 |
+
else:
|
| 372 |
+
prompt = (f"{character_desc}. Scene: {scene}. {COLOR_ART_STYLE}, "
|
| 373 |
+
f"white background, centered, full character visible")
|
| 374 |
+
kw = dict(prompt=prompt)
|
| 375 |
+
kw.update(height=768, width=768, guidance_scale=guidance,
|
| 376 |
+
num_inference_steps=num_steps,
|
| 377 |
+
generator=torch.Generator("cuda").manual_seed(seed + i + 1))
|
| 378 |
+
images.append(pipe(**kw).images[0])
|
| 379 |
+
logger.info(f"Generated page {i+1}/{len(scenes)}")
|
| 380 |
+
return images
|
| 381 |
+
|
| 382 |
+
|
| 383 |
+
@spaces.GPU(duration=150)
|
| 384 |
+
def generate_coloring_images_gpu(
|
| 385 |
+
character_desc: str,
|
| 386 |
+
scenes: list,
|
| 387 |
+
doodle_bytes: bytes = None,
|
| 388 |
+
seed: int = 42,
|
| 389 |
+
) -> list:
|
| 390 |
+
"""Generate coloring pages directly with FLUX as line art (no tracing)."""
|
| 391 |
+
import io
|
| 392 |
+
from PIL import Image
|
| 393 |
+
|
| 394 |
+
pipe = load_flux()
|
| 395 |
+
num_steps, guidance = 6, 1.0
|
| 396 |
+
|
| 397 |
+
canonical = None
|
| 398 |
+
if doodle_bytes:
|
| 399 |
+
try:
|
| 400 |
+
ref = Image.open(io.BytesIO(doodle_bytes)).convert("RGB")
|
| 401 |
+
canonical = pipe(
|
| 402 |
+
prompt=(f"Turn this child's drawing into a clean, friendly, full-body cartoon "
|
| 403 |
+
f"character for a children's coloring book. Keep the EXACT same creature, "
|
| 404 |
+
f"face, and features as the drawing. {LINE_ART_STYLE}, "
|
| 405 |
+
f"plain white background, full character visible, centered."),
|
| 406 |
+
image=ref, height=768, width=768, guidance_scale=guidance,
|
| 407 |
+
num_inference_steps=num_steps,
|
| 408 |
+
generator=torch.Generator("cuda").manual_seed(seed),
|
| 409 |
+
).images[0]
|
| 410 |
+
logger.info("Line-art canonical character built from doodle")
|
| 411 |
+
except Exception as e:
|
| 412 |
+
logger.warning(f"Line-art canonical build failed ({e}); text2img fallback")
|
| 413 |
+
canonical = None
|
| 414 |
+
|
| 415 |
+
images = []
|
| 416 |
+
for i, scene in enumerate(scenes):
|
| 417 |
+
if canonical is not None:
|
| 418 |
+
prompt = f"The same character. {scene}. {LINE_ART_STYLE}, {LINE_ART_SUFFIX}"
|
| 419 |
+
kw = dict(image=canonical, prompt=prompt)
|
| 420 |
+
else:
|
| 421 |
+
prompt = (f"{character_desc}. Scene: {scene}. {LINE_ART_STYLE}, "
|
| 422 |
+
f"white background, centered, full character visible")
|
| 423 |
+
kw = dict(prompt=prompt)
|
| 424 |
+
kw.update(height=768, width=768, guidance_scale=guidance,
|
| 425 |
+
num_inference_steps=num_steps,
|
| 426 |
+
generator=torch.Generator("cuda").manual_seed(seed + i + 101))
|
| 427 |
+
images.append(pipe(**kw).images[0])
|
| 428 |
+
logger.info(f"Generated coloring page {i+1}/{len(scenes)}")
|
| 429 |
+
return images
|
| 430 |
+
|
| 431 |
+
|
| 432 |
+
@spaces.GPU(duration=120)
|
| 433 |
+
def generate_tts_gpu(text: str, voice: str = DEFAULT_VOICE) -> bytes:
|
| 434 |
+
"""Narrate the book with VoxCPM2. Raises on failure so the caller can show
|
| 435 |
+
the real reason instead of silently shipping a silent clip."""
|
| 436 |
+
import io
|
| 437 |
+
import numpy as np
|
| 438 |
+
|
| 439 |
+
try:
|
| 440 |
+
model = load_tts()
|
| 441 |
+
design = voice_design(voice)
|
| 442 |
+
|
| 443 |
+
import re
|
| 444 |
+
chunks = [s.strip() for s in re.split(r"(?<=[.!?])\s+", text) if s.strip()]
|
| 445 |
+
if not chunks:
|
| 446 |
+
chunks = [text.strip() or "The end."]
|
| 447 |
+
|
| 448 |
+
sr = model.tts_model.sample_rate
|
| 449 |
+
pause = np.zeros(int(sr * 0.35), dtype=np.float32)
|
| 450 |
+
pieces = []
|
| 451 |
+
|
| 452 |
+
for i, sentence in enumerate(chunks):
|
| 453 |
+
wav = model.generate(
|
| 454 |
+
text=f"{design} {sentence}",
|
| 455 |
+
cfg_value=2.0,
|
| 456 |
+
inference_timesteps=10,
|
| 457 |
+
)
|
| 458 |
+
pieces.append(np.asarray(wav, dtype=np.float32))
|
| 459 |
+
if i < len(chunks) - 1:
|
| 460 |
+
pieces.append(pause)
|
| 461 |
+
|
| 462 |
+
audio = np.concatenate(pieces)
|
| 463 |
+
import soundfile as sf
|
| 464 |
+
buf = io.BytesIO()
|
| 465 |
+
sf.write(buf, audio, sr, format="WAV")
|
| 466 |
+
return buf.getvalue()
|
| 467 |
+
|
| 468 |
+
except Exception as e:
|
| 469 |
+
# Surface the real reason (e.g. missing model) instead of a silent clip
|
| 470 |
+
# that looks like it worked. create_book records this in the trace.
|
| 471 |
+
logger.exception("TTS failed")
|
| 472 |
+
raise
|
| 473 |
+
|
| 474 |
+
|
| 475 |
+
# ============================================================================
|
| 476 |
+
# MAIN BOOK CREATION (Generator for streaming)
|
| 477 |
+
# ============================================================================
|
| 478 |
+
|
| 479 |
+
def create_book(doodle_image, character_name, theme, hero_name, voice=DEFAULT_VOICE, make_coloring=False):
|
| 480 |
+
"""ZeroGPU book flow: story → images → narration → PDFs → coloring book,
|
| 481 |
+
each a sequential @spaces.GPU call (ZeroGPU has one GPU per request)."""
|
| 482 |
+
t_total = time.perf_counter()
|
| 483 |
+
character_name = (character_name or "").strip() or "Little Hero"
|
| 484 |
+
hero_name = (hero_name or "").strip() or character_name
|
| 485 |
+
|
| 486 |
+
trace_data = {
|
| 487 |
+
"backend": "zerogpu",
|
| 488 |
+
"hero_name": hero_name,
|
| 489 |
+
"theme": theme,
|
| 490 |
+
"voice": voice,
|
| 491 |
+
"make_coloring": make_coloring,
|
| 492 |
+
"seed": BASE_SEED,
|
| 493 |
+
"timestamp": time.strftime("%Y-%m-%d %H:%M:%S"),
|
| 494 |
+
}
|
| 495 |
+
if _LOAD_ERRORS:
|
| 496 |
+
trace_data["model_load_errors"] = _LOAD_ERRORS
|
| 497 |
+
|
| 498 |
+
_no = gr.update(visible=False)
|
| 499 |
+
_keep = gr.update()
|
| 500 |
+
|
| 501 |
+
yield (
|
| 502 |
+
magic_loader_html("story", hero_name),
|
| 503 |
+
"Writing the story…",
|
| 504 |
+
None, _keep, {}, "", json.dumps(trace_data, indent=2),
|
| 505 |
+
_no, _keep,
|
| 506 |
+
)
|
| 507 |
+
|
| 508 |
+
t_story = time.perf_counter()
|
| 509 |
+
try:
|
| 510 |
+
story = generate_story_gpu(hero_name, theme)
|
| 511 |
+
except Exception as e:
|
| 512 |
+
logger.error(f"Story generation failed: {e}")
|
| 513 |
+
yield (
|
| 514 |
+
f"<div class='page-loading'>Error: {e}</div>",
|
| 515 |
+
f"Error: {e}",
|
| 516 |
+
None, _keep, {}, "", "",
|
| 517 |
+
_no, _keep,
|
| 518 |
+
)
|
| 519 |
+
return
|
| 520 |
+
trace_data["story_sec"] = round(time.perf_counter() - t_story, 2)
|
| 521 |
+
|
| 522 |
+
pages = story.get("pages", [])
|
| 523 |
+
char_desc = story.get("character_description", "")
|
| 524 |
+
title = story.get("title", "Untitled Story")
|
| 525 |
+
page_texts = [p.get("text", "") for p in pages]
|
| 526 |
+
scenes = [p.get("scene", "") for p in pages]
|
| 527 |
+
|
| 528 |
+
trace_data["title"] = title
|
| 529 |
+
trace_data["character_description"] = char_desc
|
| 530 |
+
|
| 531 |
+
yield (
|
| 532 |
+
magic_loader_html("images", hero_name),
|
| 533 |
+
f"{title} — illustrating on ZeroGPU…",
|
| 534 |
+
None, _keep, story, "", json.dumps(trace_data, indent=2),
|
| 535 |
+
_no, _keep,
|
| 536 |
+
)
|
| 537 |
+
|
| 538 |
+
doodle_bytes = None
|
| 539 |
+
if doodle_image is not None:
|
| 540 |
+
import io
|
| 541 |
+
from PIL import Image
|
| 542 |
+
img = Image.fromarray(doodle_image)
|
| 543 |
+
buf = io.BytesIO()
|
| 544 |
+
img.save(buf, format="PNG")
|
| 545 |
+
doodle_bytes = buf.getvalue()
|
| 546 |
+
|
| 547 |
+
full_text = f"{title}. {' '.join(page_texts)}"
|
| 548 |
+
|
| 549 |
+
# ---- IMAGES (FLUX on ZeroGPU) ----
|
| 550 |
+
img_bytes, engine = None, "sketch"
|
| 551 |
+
t_images = time.perf_counter()
|
| 552 |
+
try:
|
| 553 |
+
for kind, payload in _with_heartbeat(
|
| 554 |
+
lambda: generate_images_gpu(char_desc, scenes, doodle_bytes, BASE_SEED),
|
| 555 |
+
lambda s: (
|
| 556 |
+
magic_loader_html("images", hero_name),
|
| 557 |
+
f"{title} — illustrating on ZeroGPU… {s}s",
|
| 558 |
+
None, _keep, story, "", json.dumps(trace_data, indent=2), _no, _keep,
|
| 559 |
+
),
|
| 560 |
+
):
|
| 561 |
+
if kind == "hb":
|
| 562 |
+
yield payload
|
| 563 |
+
else:
|
| 564 |
+
images = payload
|
| 565 |
+
import io
|
| 566 |
+
img_bytes = []
|
| 567 |
+
for img in images:
|
| 568 |
+
buf = io.BytesIO()
|
| 569 |
+
img.save(buf, format="PNG")
|
| 570 |
+
img_bytes.append(buf.getvalue())
|
| 571 |
+
engine = "flux"
|
| 572 |
+
except Exception as e:
|
| 573 |
+
logger.exception("Image generation failed")
|
| 574 |
+
trace_data["image_error"] = repr(e)
|
| 575 |
+
from services.images import generate_placeholder_images
|
| 576 |
+
img_bytes = generate_placeholder_images(char_desc, scenes, doodle_bytes)
|
| 577 |
+
engine = "sketch"
|
| 578 |
+
trace_data["images_sec"] = round(time.perf_counter() - t_images, 2)
|
| 579 |
+
trace_data["engine"] = engine
|
| 580 |
+
|
| 581 |
+
book_html = build_book_html(img_bytes, page_texts, title, engine)
|
| 582 |
+
|
| 583 |
+
# ---- NARRATION (VoxCPM2 on ZeroGPU) — sequential: one GPU per request ----
|
| 584 |
+
audio_path = None
|
| 585 |
+
t_tts = time.perf_counter()
|
| 586 |
+
try:
|
| 587 |
+
for kind, payload in _with_heartbeat(
|
| 588 |
+
lambda: generate_tts_gpu(full_text, voice),
|
| 589 |
+
lambda s: (
|
| 590 |
+
book_html,
|
| 591 |
+
f"{title} — recording the narration… {s}s",
|
| 592 |
+
None, _keep, story, "", json.dumps(trace_data, indent=2), _no, _keep,
|
| 593 |
+
),
|
| 594 |
+
):
|
| 595 |
+
if kind == "hb":
|
| 596 |
+
yield payload
|
| 597 |
+
else:
|
| 598 |
+
voice_bytes = payload
|
| 599 |
+
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
|
| 600 |
+
tmp.write(voice_bytes)
|
| 601 |
+
audio_path = tmp.name
|
| 602 |
+
except Exception as e:
|
| 603 |
+
logger.exception("TTS failed")
|
| 604 |
+
trace_data["tts_error"] = repr(e)
|
| 605 |
+
trace_data["tts_sec"] = round(time.perf_counter() - t_tts, 2)
|
| 606 |
+
|
| 607 |
+
pdf_path = None
|
| 608 |
+
t_pdf = time.perf_counter()
|
| 609 |
+
try:
|
| 610 |
+
with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp:
|
| 611 |
+
pdf_path = export_pdf(img_bytes, page_texts, title, tmp.name)
|
| 612 |
+
except Exception as e:
|
| 613 |
+
logger.warning(f"PDF failed: {e}")
|
| 614 |
+
trace_data["pdf_sec"] = round(time.perf_counter() - t_pdf, 2)
|
| 615 |
+
|
| 616 |
+
coloring_html = ""
|
| 617 |
+
coloring_pdf_path = None
|
| 618 |
+
if make_coloring:
|
| 619 |
+
t_coloring = time.perf_counter()
|
| 620 |
+
try:
|
| 621 |
+
from services.coloring import _crispen
|
| 622 |
+
for kind, payload in _with_heartbeat(
|
| 623 |
+
lambda: generate_coloring_images_gpu(char_desc, scenes, doodle_bytes, BASE_SEED),
|
| 624 |
+
lambda s: (
|
| 625 |
+
book_html,
|
| 626 |
+
f"{title} — building coloring book… {s}s",
|
| 627 |
+
audio_path,
|
| 628 |
+
_keep,
|
| 629 |
+
story,
|
| 630 |
+
"",
|
| 631 |
+
json.dumps(trace_data, indent=2),
|
| 632 |
+
_no,
|
| 633 |
+
_keep,
|
| 634 |
+
),
|
| 635 |
+
):
|
| 636 |
+
if kind == "hb":
|
| 637 |
+
yield payload
|
| 638 |
+
else:
|
| 639 |
+
coloring_images = payload
|
| 640 |
+
import io
|
| 641 |
+
outlines = []
|
| 642 |
+
for img in coloring_images:
|
| 643 |
+
buf = io.BytesIO()
|
| 644 |
+
img.save(buf, format="PNG")
|
| 645 |
+
outlines.append(_crispen(buf.getvalue()))
|
| 646 |
+
coloring_html = build_coloring_html(outlines, page_texts, title)
|
| 647 |
+
with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp:
|
| 648 |
+
coloring_pdf_path = export_coloring_pdf(outlines, page_texts, title, tmp.name)
|
| 649 |
+
trace_data["coloring_book"] = True
|
| 650 |
+
trace_data["coloring_engine"] = "flux-direct-lineart"
|
| 651 |
+
except Exception as e:
|
| 652 |
+
logger.warning(f"Direct FLUX coloring book failed ({e}); using traced fallback")
|
| 653 |
+
try:
|
| 654 |
+
from services.coloring import derive_coloring_pages
|
| 655 |
+
outlines = derive_coloring_pages(img_bytes)
|
| 656 |
+
coloring_html = build_coloring_html(outlines, page_texts, title)
|
| 657 |
+
with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp:
|
| 658 |
+
coloring_pdf_path = export_coloring_pdf(outlines, page_texts, title, tmp.name)
|
| 659 |
+
trace_data["coloring_book"] = True
|
| 660 |
+
trace_data["coloring_engine"] = "trace-fallback"
|
| 661 |
+
except Exception as e2:
|
| 662 |
+
logger.warning(f"Coloring book fallback failed: {e2}")
|
| 663 |
+
trace_data["coloring_sec"] = round(time.perf_counter() - t_coloring, 2)
|
| 664 |
+
|
| 665 |
+
trace_data["completed"] = True
|
| 666 |
+
trace_data["pages_generated"] = len(img_bytes)
|
| 667 |
+
trace_data["total_sec"] = round(time.perf_counter() - t_total, 2)
|
| 668 |
+
|
| 669 |
+
pdf_update = gr.update(value=pdf_path) if pdf_path else _keep
|
| 670 |
+
coloring_pdf_update = gr.update(value=coloring_pdf_path) if coloring_pdf_path else _keep
|
| 671 |
+
coloring_display_update = (gr.update(visible=True, value=coloring_html) if coloring_html
|
| 672 |
+
else _no)
|
| 673 |
+
|
| 674 |
+
yield (
|
| 675 |
+
book_html,
|
| 676 |
+
f"Complete: {title} — {len(img_bytes)} pages · {'FLUX (ZeroGPU)' if engine == 'flux' else 'local sketch fallback'} · voice: {voice} · total {trace_data['total_sec']}s",
|
| 677 |
+
audio_path,
|
| 678 |
+
pdf_update,
|
| 679 |
+
story,
|
| 680 |
+
f"Pages: {len(img_bytes)} | Seed: {BASE_SEED} | Engine: {engine} | Story {trace_data.get('story_sec', 0)}s | Images {trace_data.get('images_sec', 0)}s | PDF {trace_data.get('pdf_sec', 0)}s | Coloring {trace_data.get('coloring_sec', 0)}s",
|
| 681 |
+
json.dumps(trace_data, indent=2),
|
| 682 |
+
coloring_display_update,
|
| 683 |
+
coloring_pdf_update,
|
| 684 |
+
)
|
| 685 |
+
|
| 686 |
+
|
| 687 |
+
# ============================================================================
|
| 688 |
+
# MAIN
|
| 689 |
+
# ============================================================================
|
| 690 |
+
|
| 691 |
+
if __name__ == "__main__":
|
| 692 |
+
demo = create_layout(
|
| 693 |
+
load_sample_fn=load_sample_book,
|
| 694 |
+
create_book_fn=create_book,
|
| 695 |
+
)
|
| 696 |
+
demo.queue(default_concurrency_limit=2, max_size=8)
|
| 697 |
+
demo.launch(share=False, allowed_paths=[tempfile.gettempdir()])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app_zerogpu.py
DELETED
|
@@ -1,152 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
DoodleBook — ZeroGPU Version (Free HF Hosting)
|
| 3 |
-
|
| 4 |
-
Runs directly on HF ZeroGPU without Modal.
|
| 5 |
-
Slower but completely free.
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
import gradio as gr
|
| 9 |
-
import os
|
| 10 |
-
import sys
|
| 11 |
-
import torch
|
| 12 |
-
from pathlib import Path
|
| 13 |
-
|
| 14 |
-
sys.path.insert(0, os.path.dirname(__file__))
|
| 15 |
-
|
| 16 |
-
from config import FLUX_MODEL, STORY_MODEL, GENERATION_PARAMS, BASE_SEED
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
# ============================================================================
|
| 20 |
-
# ZEROGPU INFERENCE (No Modal)
|
| 21 |
-
# ============================================================================
|
| 22 |
-
|
| 23 |
-
@torch.inference_mode()
|
| 24 |
-
def generate_story_zerogpu(hero_name: str, theme: str, age: int = 5) -> dict:
|
| 25 |
-
"""Generate story using MiniCPM5-1B on ZeroGPU."""
|
| 26 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 27 |
-
|
| 28 |
-
model_id = STORY_MODEL.hub_id
|
| 29 |
-
tok = AutoTokenizer.from_pretrained(model_id)
|
| 30 |
-
model = AutoModelForCausalLM.from_pretrained(
|
| 31 |
-
model_id, torch_dtype=torch.float16
|
| 32 |
-
).cuda().eval()
|
| 33 |
-
|
| 34 |
-
prompt = f"""Write a 6-page children's storybook for age {age} about {hero_name} with theme: {theme}.
|
| 35 |
-
Return ONLY valid JSON:
|
| 36 |
-
{{"title": "Title", "character_description": "Description", "pages": [{{"page": 1, "text": "Text", "scene": "Scene"}}]}}"""
|
| 37 |
-
|
| 38 |
-
inputs = tok(prompt, return_tensors="pt").cuda()
|
| 39 |
-
with torch.no_grad():
|
| 40 |
-
out = model.generate(**inputs, max_new_tokens=800, do_sample=False)
|
| 41 |
-
|
| 42 |
-
response = tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
|
| 43 |
-
|
| 44 |
-
# Parse JSON
|
| 45 |
-
import re, json
|
| 46 |
-
match = re.search(r'\{.*\}', response, re.DOTALL)
|
| 47 |
-
if match:
|
| 48 |
-
try:
|
| 49 |
-
return json.loads(match.group())
|
| 50 |
-
except:
|
| 51 |
-
pass
|
| 52 |
-
|
| 53 |
-
# Fallback
|
| 54 |
-
return {
|
| 55 |
-
"title": f"{hero_name}'s Adventure",
|
| 56 |
-
"character_description": f"A friendly character named {hero_name}",
|
| 57 |
-
"pages": [{"page": i+1, "text": f"Page {i+1} text", "scene": f"Scene {i+1}"} for i in range(6)]
|
| 58 |
-
}
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
@torch.inference_mode()
|
| 62 |
-
def generate_images_zerogpu(character_desc: str, scenes: list) -> list:
|
| 63 |
-
"""Generate images using FLUX on ZeroGPU."""
|
| 64 |
-
from diffusers import FluxPipeline
|
| 65 |
-
|
| 66 |
-
pipe = FluxPipeline.from_pretrained(
|
| 67 |
-
FLUX_MODEL.hub_id,
|
| 68 |
-
torch_dtype=torch.bfloat16
|
| 69 |
-
).cuda()
|
| 70 |
-
|
| 71 |
-
images = []
|
| 72 |
-
for i, scene in enumerate(scenes):
|
| 73 |
-
prompt = f"{character_desc}, {scene}, crayon drawing style"
|
| 74 |
-
generator = torch.Generator("cuda").manual_seed(BASE_SEED + i)
|
| 75 |
-
|
| 76 |
-
image = pipe(
|
| 77 |
-
prompt=prompt,
|
| 78 |
-
num_inference_steps=20,
|
| 79 |
-
guidance_scale=3.5,
|
| 80 |
-
width=768,
|
| 81 |
-
height=512,
|
| 82 |
-
generator=generator
|
| 83 |
-
).images[0]
|
| 84 |
-
|
| 85 |
-
images.append(image)
|
| 86 |
-
|
| 87 |
-
return images
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
# ============================================================================
|
| 91 |
-
# MAIN FUNCTION (ZeroGPU compatible)
|
| 92 |
-
# ============================================================================
|
| 93 |
-
|
| 94 |
-
def create_book_zerogpu(doodle_image, character_name, theme, hero_name, tiny_mode=False):
|
| 95 |
-
"""
|
| 96 |
-
Book creation without Modal.
|
| 97 |
-
Uses ZeroGPU for inference.
|
| 98 |
-
"""
|
| 99 |
-
import time
|
| 100 |
-
from book_builder import build_book_html
|
| 101 |
-
import io, base64
|
| 102 |
-
|
| 103 |
-
# Generate story
|
| 104 |
-
story = generate_story_zerogpu(hero_name, theme)
|
| 105 |
-
title = story.get("title", "Story")
|
| 106 |
-
pages = story.get("pages", [])
|
| 107 |
-
char_desc = story.get("character_description", "")
|
| 108 |
-
scenes = [p.get("scene", "") for p in pages]
|
| 109 |
-
texts = [p.get("text", "") for p in pages]
|
| 110 |
-
|
| 111 |
-
# Generate images
|
| 112 |
-
images = generate_images_zerogpu(char_desc, scenes)
|
| 113 |
-
|
| 114 |
-
# Convert to bytes
|
| 115 |
-
img_bytes = []
|
| 116 |
-
for img in images:
|
| 117 |
-
buf = io.BytesIO()
|
| 118 |
-
img.save(buf, format="PNG")
|
| 119 |
-
img_bytes.append(buf.getvalue())
|
| 120 |
-
|
| 121 |
-
# Build HTML
|
| 122 |
-
html = build_book_html(img_bytes, texts, title)
|
| 123 |
-
|
| 124 |
-
return html, f"Complete: {title}", None, None
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
# ============================================================================
|
| 128 |
-
# GRADIO UI
|
| 129 |
-
# ============================================================================
|
| 130 |
-
|
| 131 |
-
if __name__ == "__main__":
|
| 132 |
-
with gr.Blocks(title="DoodleBook (Free)") as demo:
|
| 133 |
-
gr.Markdown("# 📚 DoodleBook (ZeroGPU Version)")
|
| 134 |
-
|
| 135 |
-
with gr.Row():
|
| 136 |
-
with gr.Column():
|
| 137 |
-
doodle = gr.Image(label="Doodle", type="numpy")
|
| 138 |
-
name = gr.Textbox(label="Character name")
|
| 139 |
-
theme = gr.Dropdown(["brave adventure", "making a friend"], label="Theme")
|
| 140 |
-
btn = gr.Button("Make book!")
|
| 141 |
-
|
| 142 |
-
with gr.Column():
|
| 143 |
-
output = gr.HTML()
|
| 144 |
-
status = gr.Textbox(label="Status")
|
| 145 |
-
|
| 146 |
-
btn.click(
|
| 147 |
-
create_book_zerogpu,
|
| 148 |
-
inputs=[doodle, name, theme, name],
|
| 149 |
-
outputs=[output, status]
|
| 150 |
-
)
|
| 151 |
-
|
| 152 |
-
demo.launch()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
config.py
CHANGED
|
@@ -86,56 +86,24 @@ class ModelConfig:
|
|
| 86 |
# Fallbacks selected for license compatibility (Apache 2.0 preferred).
|
| 87 |
#
|
| 88 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
FLUX_MODEL = ModelConfig(
|
| 90 |
hub_id="black-forest-labs/FLUX.2-klein-4B",
|
| 91 |
params_b=4.0,
|
| 92 |
license=LicenseType.APACHE_2_0,
|
| 93 |
vram_gb=13.0,
|
| 94 |
-
fallback_id="black-forest-labs/FLUX.1-schnell",
|
| 95 |
-
fallback_reason="12B, Apache 2.0, 1-4 step distilled, fast inference",
|
| 96 |
modal_gpu="A10G", # 24GB fits the ~13GB model; A100-40GB was overkill
|
| 97 |
modal_memory=32768,
|
| 98 |
)
|
| 99 |
|
| 100 |
-
FLUX_MODEL_9B = ModelConfig(
|
| 101 |
-
hub_id="black-forest-labs/FLUX.2-klein-9B",
|
| 102 |
-
params_b=9.0,
|
| 103 |
-
license=LicenseType.NON_COMMERCIAL,
|
| 104 |
-
vram_gb=29.0,
|
| 105 |
-
fallback_id="black-forest-labs/FLUX.2-klein-4B",
|
| 106 |
-
fallback_reason="4B variant with Apache 2.0 license",
|
| 107 |
-
is_primary=False,
|
| 108 |
-
modal_gpu="A100",
|
| 109 |
-
modal_memory=32768,
|
| 110 |
-
)
|
| 111 |
-
|
| 112 |
-
FLUX_FALLBACK = ModelConfig(
|
| 113 |
-
hub_id="black-forest-labs/FLUX.1-schnell",
|
| 114 |
-
params_b=12.0,
|
| 115 |
-
license=LicenseType.APACHE_2_0,
|
| 116 |
-
vram_gb=24.0,
|
| 117 |
-
is_primary=False,
|
| 118 |
-
modal_gpu="A100",
|
| 119 |
-
modal_memory=32768,
|
| 120 |
-
)
|
| 121 |
-
|
| 122 |
STORY_MODEL = ModelConfig(
|
| 123 |
hub_id="openbmb/MiniCPM5-1B",
|
| 124 |
params_b=1.0,
|
| 125 |
license=LicenseType.APACHE_2_0,
|
| 126 |
vram_gb=4.0,
|
| 127 |
-
fallback_id="openbmb/MiniCPM3-4B",
|
| 128 |
-
fallback_reason="4B, stronger capability but larger footprint",
|
| 129 |
-
modal_gpu="T4",
|
| 130 |
-
modal_memory=8192,
|
| 131 |
-
)
|
| 132 |
-
|
| 133 |
-
STORY_FALLBACK = ModelConfig(
|
| 134 |
-
hub_id="openbmb/MiniCPM3-4B",
|
| 135 |
-
params_b=4.0,
|
| 136 |
-
license=LicenseType.APACHE_2_0,
|
| 137 |
-
vram_gb=8.0,
|
| 138 |
-
is_primary=False,
|
| 139 |
modal_gpu="T4",
|
| 140 |
modal_memory=8192,
|
| 141 |
)
|
|
@@ -145,43 +113,6 @@ TTS_MODEL = ModelConfig(
|
|
| 145 |
params_b=2.0,
|
| 146 |
license=LicenseType.APACHE_2_0,
|
| 147 |
vram_gb=8.0,
|
| 148 |
-
fallback_id="hexgrad/Kokoro-82M",
|
| 149 |
-
fallback_reason="82M params, ultra-lightweight, Apache 2.0",
|
| 150 |
-
modal_gpu="T4",
|
| 151 |
-
modal_memory=8192,
|
| 152 |
-
)
|
| 153 |
-
|
| 154 |
-
TTS_FALLBACK_KOKORO = ModelConfig(
|
| 155 |
-
hub_id="hexgrad/Kokoro-82M",
|
| 156 |
-
params_b=0.082,
|
| 157 |
-
license=LicenseType.APACHE_2_0,
|
| 158 |
-
vram_gb=1.0,
|
| 159 |
-
is_primary=False,
|
| 160 |
-
modal_gpu="T4",
|
| 161 |
-
modal_memory=4096,
|
| 162 |
-
)
|
| 163 |
-
|
| 164 |
-
TTS_FALLBACK_MELO = ModelConfig(
|
| 165 |
-
hub_id="myshell-ai/MeloTTS-English-v3",
|
| 166 |
-
params_b=0.0, # Unknown exact size
|
| 167 |
-
license=LicenseType.MIT,
|
| 168 |
-
vram_gb=1.0,
|
| 169 |
-
is_primary=False,
|
| 170 |
-
modal_gpu="CPU",
|
| 171 |
-
modal_memory=2048,
|
| 172 |
-
)
|
| 173 |
-
|
| 174 |
-
# ============================================================================
|
| 175 |
-
# TINY MODE MODELS (C4: Edge/Tiny Model Support)
|
| 176 |
-
# ============================================================================
|
| 177 |
-
|
| 178 |
-
TINY_IMAGE_MODEL = ModelConfig(
|
| 179 |
-
hub_id="stabilityai/sd-turbo",
|
| 180 |
-
params_b=0.67,
|
| 181 |
-
license=LicenseType.APACHE_2_0,
|
| 182 |
-
vram_gb=4.0,
|
| 183 |
-
fallback_id="stabilityai/sdxl-turbo",
|
| 184 |
-
fallback_reason="SDXL-Turbo, higher quality but more VRAM",
|
| 185 |
modal_gpu="T4",
|
| 186 |
modal_memory=8192,
|
| 187 |
)
|
|
@@ -443,17 +374,8 @@ def get_model_with_fallback(
|
|
| 443 |
Returns:
|
| 444 |
ModelConfig (primary or fallback)
|
| 445 |
"""
|
| 446 |
-
|
| 447 |
-
|
| 448 |
-
# Return the appropriate fallback config
|
| 449 |
-
fallback_map = {
|
| 450 |
-
"black-forest-labs/FLUX.1-schnell": FLUX_FALLBACK,
|
| 451 |
-
"openbmb/MiniCPM3-4B": STORY_FALLBACK,
|
| 452 |
-
"hexgrad/Kokoro-82M": TTS_FALLBACK_KOKORO,
|
| 453 |
-
"myshell-ai/MeloTTS-English-v3": TTS_FALLBACK_MELO,
|
| 454 |
-
}
|
| 455 |
-
return fallback_map.get(model.fallback_id, model)
|
| 456 |
-
|
| 457 |
return model
|
| 458 |
|
| 459 |
|
|
|
|
| 86 |
# Fallbacks selected for license compatibility (Apache 2.0 preferred).
|
| 87 |
#
|
| 88 |
|
| 89 |
+
# The three sponsor models DoodleBook actually loads. Fallback/variant configs
|
| 90 |
+
# were removed so the HF Space links exactly these three (HF auto-links any
|
| 91 |
+
# model id it finds in the repo files).
|
| 92 |
+
|
| 93 |
FLUX_MODEL = ModelConfig(
|
| 94 |
hub_id="black-forest-labs/FLUX.2-klein-4B",
|
| 95 |
params_b=4.0,
|
| 96 |
license=LicenseType.APACHE_2_0,
|
| 97 |
vram_gb=13.0,
|
|
|
|
|
|
|
| 98 |
modal_gpu="A10G", # 24GB fits the ~13GB model; A100-40GB was overkill
|
| 99 |
modal_memory=32768,
|
| 100 |
)
|
| 101 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
STORY_MODEL = ModelConfig(
|
| 103 |
hub_id="openbmb/MiniCPM5-1B",
|
| 104 |
params_b=1.0,
|
| 105 |
license=LicenseType.APACHE_2_0,
|
| 106 |
vram_gb=4.0,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
modal_gpu="T4",
|
| 108 |
modal_memory=8192,
|
| 109 |
)
|
|
|
|
| 113 |
params_b=2.0,
|
| 114 |
license=LicenseType.APACHE_2_0,
|
| 115 |
vram_gb=8.0,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
modal_gpu="T4",
|
| 117 |
modal_memory=8192,
|
| 118 |
)
|
|
|
|
| 374 |
Returns:
|
| 375 |
ModelConfig (primary or fallback)
|
| 376 |
"""
|
| 377 |
+
# Fallback model configs were removed (the Space links only the 3 primaries);
|
| 378 |
+
# there is no alternate config to swap in, so always return the primary.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 379 |
return model
|
| 380 |
|
| 381 |
|
docs/blog.md
DELETED
|
@@ -1,121 +0,0 @@
|
|
| 1 |
-
# Field Notes: FLUX + LoRA Character Consistency
|
| 2 |
-
|
| 3 |
-
*How we achieved cross-page character consistency in DoodleBook using FLUX.2-klein, seed-locking, and a crayon-style LoRA.*
|
| 4 |
-
|
| 5 |
-
---
|
| 6 |
-
|
| 7 |
-
## The Challenge
|
| 8 |
-
|
| 9 |
-
The core problem in AI-generated storybooks is **character consistency**. If you generate 6 pages of a story independently, each page produces a different character — different colors, different proportions, different style. The magic is lost.
|
| 10 |
-
|
| 11 |
-
We needed: **the same character, in the same art style, across all 6 pages.**
|
| 12 |
-
|
| 13 |
-
---
|
| 14 |
-
|
| 15 |
-
## Our Approach: The Consistency Stack
|
| 16 |
-
|
| 17 |
-
We didn't rely on a single technique. Instead, we layered three complementary approaches:
|
| 18 |
-
|
| 19 |
-
### 1. Seed Locking
|
| 20 |
-
|
| 21 |
-
```python
|
| 22 |
-
BASE_SEED = 42
|
| 23 |
-
def page_seed(page_num):
|
| 24 |
-
return BASE_SEED + page_num # Page 0: 42, Page 1: 43, ...
|
| 25 |
-
```
|
| 26 |
-
|
| 27 |
-
Each page uses a deterministic seed derived from a locked base. This ensures:
|
| 28 |
-
- Reproducible generation (same inputs = same outputs)
|
| 29 |
-
- Slight variation between pages (different seeds)
|
| 30 |
-
- Consistent "feel" across the book
|
| 31 |
-
|
| 32 |
-
### 2. Character Description Reuse
|
| 33 |
-
|
| 34 |
-
Every page uses the **exact same** `character_description` string:
|
| 35 |
-
|
| 36 |
-
```python
|
| 37 |
-
prompt = f"""
|
| 38 |
-
{character_description}, # IDENTICAL on every page
|
| 39 |
-
{scene_description}, # UNIQUE per page
|
| 40 |
-
{art_style}, page {i+1} of children's book
|
| 41 |
-
"""
|
| 42 |
-
```
|
| 43 |
-
|
| 44 |
-
The character description acts as an anchor, keeping the model's interpretation consistent.
|
| 45 |
-
|
| 46 |
-
### 3. LoRA Fine-Tuning (The Secret Sauce)
|
| 47 |
-
|
| 48 |
-
We trained a **crayon-style LoRA** on FLUX.2-klein:
|
| 49 |
-
|
| 50 |
-
- **Trigger token:** `[DOODLECHAR]`
|
| 51 |
-
- **Training data:** 10-15 crayon-style character images
|
| 52 |
-
- **Rank:** 16 (balances quality vs. file size)
|
| 53 |
-
- **Steps:** 300
|
| 54 |
-
|
| 55 |
-
The LoRA teaches FLUX to generate images in a specific art style. Combined with the character description, this creates a consistent visual identity.
|
| 56 |
-
|
| 57 |
-
---
|
| 58 |
-
|
| 59 |
-
## The Results
|
| 60 |
-
|
| 61 |
-
### Before LoRA (Base FLUX)
|
| 62 |
-
- Pages look like generic AI art
|
| 63 |
-
- Character changes dramatically between pages
|
| 64 |
-
- No consistent style
|
| 65 |
-
|
| 66 |
-
### After LoRA + Consistency Stack
|
| 67 |
-
- Same character across all 6 pages
|
| 68 |
-
- Consistent crayon art style
|
| 69 |
-
- Recognizable as "the same book"
|
| 70 |
-
|
| 71 |
-
---
|
| 72 |
-
|
| 73 |
-
## Key Learnings
|
| 74 |
-
|
| 75 |
-
1. **Seed alone isn't enough.** Different prompts with the same seed produce different characters. You need description consistency too.
|
| 76 |
-
|
| 77 |
-
2. **LoRA provides style, not identity.** The LoRA teaches the art style (crayon, watercolor, etc.), but the character identity comes from the prompt.
|
| 78 |
-
|
| 79 |
-
3. **Image conditioning helps.** When available, feeding the child's actual doodle as an image prompt (via img2img) dramatically improves style matching.
|
| 80 |
-
|
| 81 |
-
4. **Quality vs. speed tradeoff.** FLUX.2-klein-4B (4B params) runs faster than 9B with minimal quality loss for storybook art.
|
| 82 |
-
|
| 83 |
-
---
|
| 84 |
-
|
| 85 |
-
## Technical Details
|
| 86 |
-
|
| 87 |
-
### Model Stack
|
| 88 |
-
- **Image:** FLUX.2-klein-4B + crayon-style LoRA
|
| 89 |
-
- **Story:** MiniCPM5-1B (1B)
|
| 90 |
-
- **TTS:** VoxCPM2 (2B)
|
| 91 |
-
|
| 92 |
-
### Training Config
|
| 93 |
-
```yaml
|
| 94 |
-
rank: 16
|
| 95 |
-
alpha: 16
|
| 96 |
-
learning_rate: 1e-4
|
| 97 |
-
steps: 300
|
| 98 |
-
resolution: 512
|
| 99 |
-
batch_size: 1
|
| 100 |
-
```
|
| 101 |
-
|
| 102 |
-
### Inference Config
|
| 103 |
-
```yaml
|
| 104 |
-
guidance_scale: 3.5
|
| 105 |
-
num_inference_steps: 20 # Standard mode
|
| 106 |
-
4 # Tiny Mode (SD-Turbo)
|
| 107 |
-
width: 768
|
| 108 |
-
height: 512
|
| 109 |
-
```
|
| 110 |
-
|
| 111 |
-
---
|
| 112 |
-
|
| 113 |
-
## Conclusion
|
| 114 |
-
|
| 115 |
-
Character consistency in AI storybooks requires a multi-layered approach: seed locking for reproducibility, prompt engineering for identity, and LoRA fine-tuning for style. No single technique solves the problem alone, but together they create a reliable system.
|
| 116 |
-
|
| 117 |
-
The result? A child's crayon drawing becomes a consistent, narrated, illustrated storybook — their character, their style, brought to life by AI.
|
| 118 |
-
|
| 119 |
-
---
|
| 120 |
-
|
| 121 |
-
*Built for Build Small Hackathon 2026 · Thousand Token Wood Track*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/superpowers/specs/2026-06-14-coloring-book-loader-pdf-cover-design.md
DELETED
|
@@ -1,174 +0,0 @@
|
|
| 1 |
-
# DoodleBook — Coloring Book, Magic Loader & Styled PDF Covers
|
| 2 |
-
|
| 3 |
-
**Date:** 2026-06-14
|
| 4 |
-
**Status:** Approved design (pending spec review)
|
| 5 |
-
|
| 6 |
-
Three user-requested features for the DoodleBook app (`run_modal.py` Modal build,
|
| 7 |
-
mirrored in `app.py` HF build):
|
| 8 |
-
|
| 9 |
-
1. **Magic Loader** — an engaging, on-brand wait screen while generation runs.
|
| 10 |
-
2. **Coloring Book** — the SAME FLUX images with the colors removed (outlines
|
| 11 |
-
only) so kids can color the exact same pictures. No second/different image is
|
| 12 |
-
ever generated. Opt-in via a checkbox chosen before generating.
|
| 13 |
-
3. **Styled PDF covers** — the PDF front page should match the on-screen scrapbook
|
| 14 |
-
cover (cream paper, crayon title with shadow, green "illustrated by" badge),
|
| 15 |
-
for both the storybook and the coloring-book PDFs.
|
| 16 |
-
|
| 17 |
-
---
|
| 18 |
-
|
| 19 |
-
## Feature 1 — Magic Loader
|
| 20 |
-
|
| 21 |
-
### Goal
|
| 22 |
-
Generation takes a few minutes (FLUX is the slow stage). Replace the single plain
|
| 23 |
-
status line with a crayon-styled animated panel that tells the user what's
|
| 24 |
-
happening and showcases the small-model stack (good for judges too).
|
| 25 |
-
|
| 26 |
-
### Approach
|
| 27 |
-
Pure-CSS rotating messages (no JS, no per-page streaming — reliable inside Gradio
|
| 28 |
-
`gr.HTML`). A stack of message `<div>`s fade in/out in sequence via staggered CSS
|
| 29 |
-
`animation-delay` on an opacity keyframe.
|
| 30 |
-
|
| 31 |
-
- Helper `magic_loader_html(stage: str, hero_name: str) -> str` in `book_builder.py`.
|
| 32 |
-
- Rotating messages (image stage — the long one):
|
| 33 |
-
- `✏️ MiniCPM is dreaming up {hero}'s story…`
|
| 34 |
-
- `🎨 FLUX is painting your 6 pages…`
|
| 35 |
-
- `🔊 VoxCPM is recording the narration…`
|
| 36 |
-
- `💡 Did you know? Your whole storybook runs on tiny models!`
|
| 37 |
-
- `create_book` yields the loader HTML into `book_display` during the story and
|
| 38 |
-
image stages (it already streams stage-by-stage).
|
| 39 |
-
|
| 40 |
-
### CSS
|
| 41 |
-
New `.magic-loader` / `.ml-msg` rules in the `ui/layout.py` CSS string. Each
|
| 42 |
-
`.ml-msg` is absolutely stacked; `animation: ml-cycle Ns infinite` with
|
| 43 |
-
`animation-delay: i*step`. Respects existing `prefers-reduced-motion` block
|
| 44 |
-
(messages still legible, just no fade).
|
| 45 |
-
|
| 46 |
-
---
|
| 47 |
-
|
| 48 |
-
## Feature 2 — Coloring Book (checkbox-triggered)
|
| 49 |
-
|
| 50 |
-
### Goal
|
| 51 |
-
When the user opts in, produce printable black-and-white outline pages **of the
|
| 52 |
-
exact same FLUX images** (same scenes, same character) for kids to color. The
|
| 53 |
-
coloring page is the color page with its colors removed — never a newly generated
|
| 54 |
-
or different image.
|
| 55 |
-
|
| 56 |
-
### Trigger
|
| 57 |
-
New checkbox in the input card: `🖍️ Also make a coloring book` (`make_coloring`),
|
| 58 |
-
chosen before pressing "Make my book!". Outline pages are produced automatically
|
| 59 |
-
with the book when checked.
|
| 60 |
-
|
| 61 |
-
### Generation (same image, colors removed — instant, free)
|
| 62 |
-
Process the already-generated color images locally with OpenCV to strip the fills
|
| 63 |
-
and keep the outlines. No extra Modal call, ~seconds, and the result is the SAME
|
| 64 |
-
picture as line art. There is no FLUX re-generation and no "HD" alternative — that
|
| 65 |
-
guarantees the coloring page always matches the color page exactly.
|
| 66 |
-
|
| 67 |
-
### New module: `services/coloring.py`
|
| 68 |
-
- `to_line_art(png_bytes: bytes) -> bytes`
|
| 69 |
-
- OpenCV: grayscale → light blur → `adaptiveThreshold(GAUSSIAN_C, THRESH_BINARY,
|
| 70 |
-
blockSize≈11, C≈2)` to get black lines on white; remove tiny speck components
|
| 71 |
-
(reuse the cleanup approach from `services/images.py:_doodle_to_cartoon`).
|
| 72 |
-
- Returns a PNG (white background, black outlines).
|
| 73 |
-
- On any failure: fall back to grayscale Otsu threshold; never raises.
|
| 74 |
-
- `derive_coloring_pages(color_imgs: list[bytes]) -> list[bytes]`.
|
| 75 |
-
|
| 76 |
-
### `book_builder.py` additions
|
| 77 |
-
- `build_coloring_html(outline_imgs, page_texts, title) -> str` — same scrapbook
|
| 78 |
-
layout as `build_book_html` but with a coloring-book cover badge and outline
|
| 79 |
-
images (text kept small/light beneath each page).
|
| 80 |
-
- `export_coloring_pdf(outline_imgs, page_texts, title, path) -> str` — styled
|
| 81 |
-
cover (Feature 3) + outline pages, print-friendly.
|
| 82 |
-
|
| 83 |
-
### UI (`ui/layout.py`)
|
| 84 |
-
- Input card: `make_coloring = gr.Checkbox(...)` (styled like `tiny_mode`).
|
| 85 |
-
- Output card (below the storybook + downloads):
|
| 86 |
-
- `coloring_display = gr.HTML(visible=False)`
|
| 87 |
-
- `coloring_pdf_download = gr.DownloadButton("Download Coloring Book (PDF)", visible=False)`
|
| 88 |
-
|
| 89 |
-
### Data flow
|
| 90 |
-
```
|
| 91 |
-
create_book(doodle, char_name, theme, hero, tiny, voice, make_coloring):
|
| 92 |
-
yield magic_loader(story) -> book_display
|
| 93 |
-
story = services.story.generate_story(...)
|
| 94 |
-
yield magic_loader(images)
|
| 95 |
-
color_imgs, engine = services.images.generate_book_pages(...)
|
| 96 |
-
book_html = build_book_html(color_imgs, ...)
|
| 97 |
-
if make_coloring:
|
| 98 |
-
outlines = services.coloring.derive_coloring_pages(color_imgs) # SAME imgs, colors removed
|
| 99 |
-
coloring_html = build_coloring_html(outlines, ...)
|
| 100 |
-
coloring_pdf = export_coloring_pdf(outlines, ...)
|
| 101 |
-
audio = services.tts.speak_book(...)
|
| 102 |
-
pdf = export_pdf(color_imgs, ...)
|
| 103 |
-
yield final: book_html, status, audio, pdf(visible), story_json, image_info,
|
| 104 |
-
trace, coloring_html(visible if make_coloring),
|
| 105 |
-
coloring_pdf(visible if make_coloring)
|
| 106 |
-
```
|
| 107 |
-
No `gr.State` and no regenerate handler are needed — the coloring pages are derived
|
| 108 |
-
directly from `color_imgs`, so they always match the color book.
|
| 109 |
-
|
| 110 |
-
---
|
| 111 |
-
|
| 112 |
-
## Feature 3 — Styled PDF cover
|
| 113 |
-
|
| 114 |
-
### Goal
|
| 115 |
-
Replace the plain Helvetica text title page in `export_pdf` with a cover that
|
| 116 |
-
matches the on-screen scrapbook cover.
|
| 117 |
-
|
| 118 |
-
### Approach
|
| 119 |
-
Render the cover as a full-page **PIL image** and place it as PDF page 1 (PIL is
|
| 120 |
-
already a dependency; works on HF with no browser).
|
| 121 |
-
|
| 122 |
-
- `book_builder.render_cover_image(title, badge_text, kind="story") -> bytes`
|
| 123 |
-
- Canvas ~1240×1754 (A4 @150dpi). Cream fill (`#fff8e6`) + subtle speckle.
|
| 124 |
-
- Kicker `a DoodleBook story` — Caveat, berry (`#d6517a`), centered upper third.
|
| 125 |
-
- Title — Gaegu Bold, large, centered, wrapped to ≤2 lines, layered shadow:
|
| 126 |
-
offset draw in crayon-sun (`#f4c64a`) then ink (`#2e2a26`) on top.
|
| 127 |
-
- Badge — rounded rect, crayon-leaf (`#74b85a`), white Gaegu text `badge_text`.
|
| 128 |
-
- `badge_text`: story = `illustrated by FLUX.2-klein`; coloring =
|
| 129 |
-
`a coloring book to color in`.
|
| 130 |
-
- Fonts from `assets/fonts/Gaegu-Bold.ttf` + `Caveat.ttf` via
|
| 131 |
-
`ImageFont.truetype`; fall back to `ImageFont.load_default()` if missing.
|
| 132 |
-
- `export_pdf` and `export_coloring_pdf` use `render_cover_image(...)` for page 1.
|
| 133 |
-
|
| 134 |
-
### Fonts to bundle (OFL, free to redistribute)
|
| 135 |
-
- `assets/fonts/Gaegu-Bold.ttf`
|
| 136 |
-
- `assets/fonts/Caveat.ttf` (Regular or SemiBold)
|
| 137 |
-
|
| 138 |
-
---
|
| 139 |
-
|
| 140 |
-
## Error handling (never crash a generation)
|
| 141 |
-
- `to_line_art`: OpenCV failure → Otsu threshold fallback → original image.
|
| 142 |
-
- `render_cover_image`: missing font → default font; PIL failure → old text cover.
|
| 143 |
-
- `create_book` body wrapped so exceptions yield an error state, not a throw.
|
| 144 |
-
|
| 145 |
-
## Testing
|
| 146 |
-
- Unit:
|
| 147 |
-
- `to_line_art` → output PNG is mostly white with a meaningful fraction of dark
|
| 148 |
-
pixels (line art), valid dimensions.
|
| 149 |
-
- `render_cover_image` → returns a valid PNG of expected size; runs with fonts
|
| 150 |
-
present and with fonts removed (fallback path).
|
| 151 |
-
- `export_coloring_pdf` / `export_pdf` → produce a non-empty PDF whose page 1 is
|
| 152 |
-
the cover image.
|
| 153 |
-
- Manual (run_modal.py): checkbox ON → loader animates → coloring section +
|
| 154 |
-
"Download Coloring Book (PDF)" appear; the outline pages are visibly the SAME
|
| 155 |
-
scenes as the color book; both PDFs open with the styled cover.
|
| 156 |
-
|
| 157 |
-
## Files
|
| 158 |
-
- **New:** `services/coloring.py`; `assets/fonts/Gaegu-Bold.ttf`,
|
| 159 |
-
`assets/fonts/Caveat.ttf`; this spec.
|
| 160 |
-
- **Changed:** `ui/layout.py` (loader CSS, checkbox, coloring outputs, wiring);
|
| 161 |
-
`run_modal.py` (`create_book`) is the primary target; `app.py` mirrors the same
|
| 162 |
-
three features (loader, coloring from its own color images, styled cover) for HF
|
| 163 |
-
parity; `book_builder.py` (`render_cover_image`, `build_coloring_html`,
|
| 164 |
-
`export_coloring_pdf`, `magic_loader_html`, `export_pdf` cover).
|
| 165 |
-
No Modal worker changes are needed.
|
| 166 |
-
|
| 167 |
-
## Out of scope (YAGNI)
|
| 168 |
-
- Regenerating a separate line-art via FLUX. The user wants the EXACT same image,
|
| 169 |
-
just without color — so the coloring page is always derived from the color image,
|
| 170 |
-
never re-generated. (No "HD lines" button, no second FLUX run.)
|
| 171 |
-
- Per-page progress bar (needs Modal streaming; loader covers the need).
|
| 172 |
-
- Saving/downloading the narration audio (separate future ask).
|
| 173 |
-
- Animated illustrated pipeline loader (chose the lighter CSS version).
|
| 174 |
-
</content>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/superpowers/specs/2026-06-14-flux-lineart-coloring-design.md
DELETED
|
@@ -1,93 +0,0 @@
|
|
| 1 |
-
# FLUX-generated line art for coloring pages
|
| 2 |
-
|
| 3 |
-
**Date:** 2026-06-14
|
| 4 |
-
**Status:** Approved, pending implementation
|
| 5 |
-
|
| 6 |
-
## Problem
|
| 7 |
-
|
| 8 |
-
Coloring pages are derived from the finished full-color crayon page by
|
| 9 |
-
`services/coloring.py:to_line_art()` (OpenCV: bilateral filter → k-means
|
| 10 |
-
quantize → region-boundary trace). On textured/busy backgrounds (sand, hills,
|
| 11 |
-
crayon-shaded sky) the crayon strokes fragment into hundreds of stray edges, so
|
| 12 |
-
the coloring page is speckled and uncolorable (see the "rolling sand dunes" and
|
| 13 |
-
"deep breath" pages). The source was never line art, so tracing it cleanly is
|
| 14 |
-
fundamentally hard.
|
| 15 |
-
|
| 16 |
-
The full-COLOR storybook pages look good and are NOT changing.
|
| 17 |
-
|
| 18 |
-
## Approach (Option B — keep color pipeline, let FLUX draw the outline)
|
| 19 |
-
|
| 20 |
-
Stop guessing outlines from pixels. Hand the finished color page back to FLUX
|
| 21 |
-
and have it **redraw the scene as clean line art** via img2img. FLUX understands
|
| 22 |
-
the scene semantically (character + clouds + hills) so it draws shape boundaries
|
| 23 |
-
instead of tracing crayon texture.
|
| 24 |
-
|
| 25 |
-
### Pipeline
|
| 26 |
-
|
| 27 |
-
```
|
| 28 |
-
canonical char ─┐
|
| 29 |
-
├─► render_page (color, A10G) ──► STORYBOOK page (unchanged)
|
| 30 |
-
story beat ───┘ │
|
| 31 |
-
└─► render_lineart (FLUX img2img, A10G)
|
| 32 |
-
│
|
| 33 |
-
└─► threshold B/W + despeckle ──► COLORING page
|
| 34 |
-
```
|
| 35 |
-
|
| 36 |
-
### Components
|
| 37 |
-
|
| 38 |
-
1. **`modal_workers/modal_image_gen.py` — new `render_lineart(color_png) -> bytes`**
|
| 39 |
-
- A10G FLUX function (uses shared `GPU_FN` decorator → A10G, 300s timeout,
|
| 40 |
-
120s scaledown).
|
| 41 |
-
- img2img with the **color page as the image reference** (so the coloring
|
| 42 |
-
page matches the story picture exactly: same pose, same composition).
|
| 43 |
-
- Prompt: *"black and white coloring book line drawing, clean bold black
|
| 44 |
-
outlines on a pure white background, no shading, no color, no crayon
|
| 45 |
-
texture, simple shapes a child can color in."*
|
| 46 |
-
- Denoising `strength` tuned high enough to redraw as line art but low enough
|
| 47 |
-
to keep composition. **Exact param verified against the live
|
| 48 |
-
Flux2KleinPipeline and tuned on a real render before sign-off.**
|
| 49 |
-
|
| 50 |
-
2. **`services/coloring.py` — cleanup + orchestration**
|
| 51 |
-
- Keep a small local `_crispen(png)`: threshold the FLUX output to pure
|
| 52 |
-
black-on-white + despeckle (FLUX may leave faint gray; coloring pages must
|
| 53 |
-
be crisp). Fast, no GPU.
|
| 54 |
-
- `derive_coloring_pages(color_imgs)`: fan the pages out via
|
| 55 |
-
`modal.Function.from_name("doodlebook-image-gen", "render_lineart").starmap`
|
| 56 |
-
(concurrent, like `render_page`), then `_crispen` each.
|
| 57 |
-
- **Fallback:** if the Modal call fails, fall back to the existing OpenCV
|
| 58 |
-
`to_line_art` (renamed `_to_line_art_opencv`) — so it degrades gracefully,
|
| 59 |
-
never worse than today.
|
| 60 |
-
|
| 61 |
-
### Decisions (confirmed)
|
| 62 |
-
|
| 63 |
-
- **Trace source:** the finished color page (matches the story picture).
|
| 64 |
-
- **When it runs:** only when "Also make a coloring book" is checked — no extra
|
| 65 |
-
cost/time otherwise. `run_modal.py:182` already gates `derive_coloring_pages`
|
| 66 |
-
behind `make_coloring`, so no caller change needed.
|
| 67 |
-
|
| 68 |
-
### Cost / latency
|
| 69 |
-
|
| 70 |
-
+1 A10G render per page (~6 extra renders/book), opt-in. Cheap on A10G.
|
| 71 |
-
|
| 72 |
-
## Ships with the GPU/deadlock fix
|
| 73 |
-
|
| 74 |
-
Already edited (pending `modal deploy`): image-gen GPU A100→A10G,
|
| 75 |
-
`scaledown_window` 1200→120s, per-call `timeout` 24h→300s. The deploy that ships
|
| 76 |
-
line art also activates these. See the separate deadlock diagnosis (9 idle A100
|
| 77 |
-
containers pinning the 10-GPU account quota → queued calls never timed out →
|
| 78 |
-
app spun forever).
|
| 79 |
-
|
| 80 |
-
## Testing
|
| 81 |
-
|
| 82 |
-
1. Verify Flux2KleinPipeline img2img accepts a `strength`/denoising param;
|
| 83 |
-
confirm the exact name.
|
| 84 |
-
2. Render ONE real book locally with coloring enabled; save the color page and
|
| 85 |
-
the line-art coloring page side by side.
|
| 86 |
-
3. Confirm: clean colorable regions on the previously-bad busy pages (sand
|
| 87 |
-
dunes, hills), character preserved, pure black-on-white, no speckle.
|
| 88 |
-
4. Tune `strength` if texture survives (raise) or composition drifts (lower).
|
| 89 |
-
|
| 90 |
-
## Out of scope
|
| 91 |
-
|
| 92 |
-
- Changing the color storybook render (looks good).
|
| 93 |
-
- Programmatic flat-fill colorizing (not needed; color pipeline stays).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora_finetune/dataset_prep.py
DELETED
|
@@ -1,39 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Dataset preparation for LoRA training.
|
| 3 |
-
|
| 4 |
-
Prepares character images for DreamBooth-style fine-tuning.
|
| 5 |
-
"""
|
| 6 |
-
|
| 7 |
-
import os
|
| 8 |
-
from PIL import Image
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
def prepare_training_data(
|
| 12 |
-
input_dir: str,
|
| 13 |
-
output_dir: str = "./training_data",
|
| 14 |
-
target_size: int = 512,
|
| 15 |
-
num_augmentations: int = 5
|
| 16 |
-
):
|
| 17 |
-
"""
|
| 18 |
-
Prepare training images for LoRA fine-tuning.
|
| 19 |
-
|
| 20 |
-
Steps:
|
| 21 |
-
1. Load original doodle images
|
| 22 |
-
2. Resize to target size
|
| 23 |
-
3. Create variations (flip, rotate, color shift)
|
| 24 |
-
4. Save with consistent naming
|
| 25 |
-
|
| 26 |
-
Args:
|
| 27 |
-
input_dir: Directory with original doodle images
|
| 28 |
-
output_dir: Output directory for prepared data
|
| 29 |
-
target_size: Target image size (512x512 for FLUX)
|
| 30 |
-
num_augmentations: Number of augmented versions per image
|
| 31 |
-
"""
|
| 32 |
-
os.makedirs(output_dir, exist_ok=True)
|
| 33 |
-
|
| 34 |
-
# Phase 5: Full implementation with augmentations
|
| 35 |
-
raise NotImplementedError("Phase 5: Dataset preparation")
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
if __name__ == "__main__":
|
| 39 |
-
print("Run this script to prepare training data for LoRA.")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lora_finetune/train_lora.py
DELETED
|
@@ -1,287 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
FLUX LoRA Training Script — Phase 5
|
| 3 |
-
|
| 4 |
-
Trains a crayon-style LoRA for character consistency on FLUX.2-klein.
|
| 5 |
-
Uses DreamBooth-style fine-tuning with trigger token [DOODLECHAR].
|
| 6 |
-
|
| 7 |
-
Usage:
|
| 8 |
-
python train_lora.py --images_dir ./training_images --output_dir ./lora-weights
|
| 9 |
-
|
| 10 |
-
Requirements:
|
| 11 |
-
- diffusers>=0.28
|
| 12 |
-
- peft
|
| 13 |
-
- torch
|
| 14 |
-
- accelerate
|
| 15 |
-
"""
|
| 16 |
-
|
| 17 |
-
import argparse
|
| 18 |
-
import os
|
| 19 |
-
import json
|
| 20 |
-
import logging
|
| 21 |
-
from pathlib import Path
|
| 22 |
-
|
| 23 |
-
logger = logging.getLogger(__name__)
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
# ============================================================================
|
| 27 |
-
# TRAINING CONFIGURATION
|
| 28 |
-
# ============================================================================
|
| 29 |
-
|
| 30 |
-
LORA_CONFIG = {
|
| 31 |
-
"rank": 16,
|
| 32 |
-
"alpha": 16,
|
| 33 |
-
"target_modules": [
|
| 34 |
-
"to_q", "to_k", "to_v", "to_out.0",
|
| 35 |
-
"add_q_proj", "add_k_proj", "add_v_proj", "to_add_out"
|
| 36 |
-
],
|
| 37 |
-
"instance_prompt": "photo of [DOODLECHAR] character, crayon drawing style",
|
| 38 |
-
"class_prompt": "photo of a character",
|
| 39 |
-
"pretrained_model": "black-forest-labs/FLUX.2-klein-4B",
|
| 40 |
-
"resolution": 512,
|
| 41 |
-
"train_batch_size": 1,
|
| 42 |
-
"gradient_accumulation_steps": 4,
|
| 43 |
-
"learning_rate": 1e-4,
|
| 44 |
-
"max_train_steps": 300,
|
| 45 |
-
"checkpointing_steps": 50,
|
| 46 |
-
"seed": 42,
|
| 47 |
-
}
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
def train_lora(
|
| 51 |
-
training_images: list[str],
|
| 52 |
-
output_dir: str = "./lora-weights",
|
| 53 |
-
num_steps: int = 300,
|
| 54 |
-
learning_rate: float = 1e-4,
|
| 55 |
-
rank: int = 16,
|
| 56 |
-
alpha: int = 16,
|
| 57 |
-
):
|
| 58 |
-
"""
|
| 59 |
-
Train LoRA on FLUX.2-klein for crayon-style character consistency.
|
| 60 |
-
|
| 61 |
-
Uses DreamBooth-style fine-tuning:
|
| 62 |
-
- Trigger token: [DOODLECHAR]
|
| 63 |
-
- Target: Cross-attention layers in UNet
|
| 64 |
-
- Loss: Cross-entropy with instance prompt
|
| 65 |
-
|
| 66 |
-
Args:
|
| 67 |
-
training_images: Paths to training images (10-15 images recommended)
|
| 68 |
-
output_dir: Where to save LoRA weights
|
| 69 |
-
num_steps: Training steps (200-400 recommended)
|
| 70 |
-
learning_rate: Learning rate (1e-4 default)
|
| 71 |
-
rank: LoRA rank (16 recommended)
|
| 72 |
-
alpha: LoRA alpha (16 recommended, equals rank for scaling=1.0)
|
| 73 |
-
|
| 74 |
-
Returns:
|
| 75 |
-
Path to saved LoRA weights
|
| 76 |
-
"""
|
| 77 |
-
import torch
|
| 78 |
-
from diffusers import FluxPipeline
|
| 79 |
-
from peft import LoraConfig, get_peft_model
|
| 80 |
-
from torch.utils.data import Dataset, DataLoader
|
| 81 |
-
from PIL import Image
|
| 82 |
-
import torchvision.transforms as transforms
|
| 83 |
-
|
| 84 |
-
# Create output directory
|
| 85 |
-
os.makedirs(output_dir, exist_ok=True)
|
| 86 |
-
|
| 87 |
-
logger.info(f"Training LoRA with {len(training_images)} images")
|
| 88 |
-
logger.info(f"Config: rank={rank}, alpha={alpha}, steps={num_steps}, lr={learning_rate}")
|
| 89 |
-
|
| 90 |
-
# Load FLUX pipeline
|
| 91 |
-
logger.info("Loading FLUX.2-klein pipeline...")
|
| 92 |
-
pipe = FluxPipeline.from_pretrained(
|
| 93 |
-
LORA_CONFIG["pretrained_model"],
|
| 94 |
-
torch_dtype=torch.bfloat16
|
| 95 |
-
)
|
| 96 |
-
|
| 97 |
-
# Configure LoRA
|
| 98 |
-
lora_config = LoraConfig(
|
| 99 |
-
r=rank,
|
| 100 |
-
lora_alpha=alpha,
|
| 101 |
-
target_modules=LORA_CONFIG["target_modules"],
|
| 102 |
-
lora_dropout=0.0,
|
| 103 |
-
bias="none",
|
| 104 |
-
)
|
| 105 |
-
|
| 106 |
-
# Apply LoRA to UNet
|
| 107 |
-
logger.info("Applying LoRA to UNet...")
|
| 108 |
-
pipe.unet = get_peft_model(pipe.unet, lora_config)
|
| 109 |
-
pipe.unet.print_trainable_parameters()
|
| 110 |
-
|
| 111 |
-
# Create dataset
|
| 112 |
-
class CrayonDataset(Dataset):
|
| 113 |
-
def __init__(self, image_paths, transform=None):
|
| 114 |
-
self.image_paths = image_paths
|
| 115 |
-
self.transform = transform or transforms.Compose([
|
| 116 |
-
transforms.Resize((512, 512)),
|
| 117 |
-
transforms.ToTensor(),
|
| 118 |
-
transforms.Normalize([0.5], [0.5])
|
| 119 |
-
])
|
| 120 |
-
|
| 121 |
-
def __len__(self):
|
| 122 |
-
return len(self.image_paths)
|
| 123 |
-
|
| 124 |
-
def __getitem__(self, idx):
|
| 125 |
-
img = Image.open(self.image_paths[idx]).convert("RGB")
|
| 126 |
-
return self.transform(img)
|
| 127 |
-
|
| 128 |
-
dataset = CrayonDataset(training_images)
|
| 129 |
-
dataloader = DataLoader(dataset, batch_size=1, shuffle=True)
|
| 130 |
-
|
| 131 |
-
# Training loop
|
| 132 |
-
logger.info("Starting training...")
|
| 133 |
-
pipe.unet.train()
|
| 134 |
-
|
| 135 |
-
optimizer = torch.optim.AdamW(pipe.unet.parameters(), lr=learning_rate)
|
| 136 |
-
|
| 137 |
-
for step in range(num_steps):
|
| 138 |
-
for batch in dataloader:
|
| 139 |
-
# Forward pass with noise
|
| 140 |
-
noise = torch.randn_like(batch)
|
| 141 |
-
timesteps = torch.randint(0, 1000, (batch.shape[0],), device=batch.device)
|
| 142 |
-
|
| 143 |
-
# Simple training step (simplified for demonstration)
|
| 144 |
-
optimizer.zero_grad()
|
| 145 |
-
loss = torch.tensor(0.0, requires_grad=True) # Placeholder
|
| 146 |
-
loss.backward()
|
| 147 |
-
optimizer.step()
|
| 148 |
-
|
| 149 |
-
if (step + 1) % LORA_CONFIG["checkpointing_steps"] == 0:
|
| 150 |
-
logger.info(f"Step {step + 1}/{num_steps}")
|
| 151 |
-
|
| 152 |
-
# Save LoRA weights
|
| 153 |
-
logger.info("Saving LoRA weights...")
|
| 154 |
-
pipe.unet.save_pretrained(output_dir)
|
| 155 |
-
|
| 156 |
-
# Save training config
|
| 157 |
-
config_path = os.path.join(output_dir, "training_config.json")
|
| 158 |
-
with open(config_path, "w") as f:
|
| 159 |
-
json.dump({
|
| 160 |
-
**LORA_CONFIG,
|
| 161 |
-
"rank": rank,
|
| 162 |
-
"alpha": alpha,
|
| 163 |
-
"num_steps": num_steps,
|
| 164 |
-
"learning_rate": learning_rate,
|
| 165 |
-
"training_images": len(training_images),
|
| 166 |
-
}, f, indent=2)
|
| 167 |
-
|
| 168 |
-
logger.info(f"LoRA saved to: {output_dir}")
|
| 169 |
-
return output_dir
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
def publish_to_hf(
|
| 173 |
-
local_path: str,
|
| 174 |
-
repo_id: str = "build-small-hackathon/doodlebook-flux-lora"
|
| 175 |
-
):
|
| 176 |
-
"""Upload trained LoRA to HuggingFace Hub (Well-Tuned badge)."""
|
| 177 |
-
from huggingface_hub import HfApi
|
| 178 |
-
|
| 179 |
-
api = HfApi()
|
| 180 |
-
|
| 181 |
-
# Create repo if it doesn't exist
|
| 182 |
-
api.create_repo(repo_id, repo_type="model", exist_ok=True)
|
| 183 |
-
|
| 184 |
-
# Upload files
|
| 185 |
-
api.upload_folder(
|
| 186 |
-
folder_path=local_path,
|
| 187 |
-
repo_id=repo_id,
|
| 188 |
-
repo_type="model",
|
| 189 |
-
commit_message="Upload crayon-style LoRA for DoodleBook"
|
| 190 |
-
)
|
| 191 |
-
|
| 192 |
-
logger.info(f"Published LoRA to: https://huggingface.co/{repo_id}")
|
| 193 |
-
return f"https://huggingface.co/{repo_id}"
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
def prepare_training_data(
|
| 197 |
-
input_dir: str,
|
| 198 |
-
output_dir: str = "./training_data",
|
| 199 |
-
target_size: int = 512,
|
| 200 |
-
):
|
| 201 |
-
"""
|
| 202 |
-
Prepare training images for LoRA fine-tuning.
|
| 203 |
-
|
| 204 |
-
Steps:
|
| 205 |
-
1. Load original doodle images
|
| 206 |
-
2. Resize to target size
|
| 207 |
-
3. Create augmentations (flip, rotate)
|
| 208 |
-
4. Save with consistent naming
|
| 209 |
-
"""
|
| 210 |
-
from PIL import Image, ImageEnhance
|
| 211 |
-
import random
|
| 212 |
-
|
| 213 |
-
os.makedirs(output_dir, exist_ok=True)
|
| 214 |
-
|
| 215 |
-
image_extensions = {'.jpg', '.jpeg', '.png', '.webp'}
|
| 216 |
-
image_files = [
|
| 217 |
-
f for f in Path(input_dir).iterdir()
|
| 218 |
-
if f.suffix.lower() in image_extensions
|
| 219 |
-
]
|
| 220 |
-
|
| 221 |
-
logger.info(f"Found {len(image_files)} images in {input_dir}")
|
| 222 |
-
|
| 223 |
-
output_idx = 0
|
| 224 |
-
|
| 225 |
-
for img_path in image_files:
|
| 226 |
-
img = Image.open(img_path).convert("RGB")
|
| 227 |
-
img = img.resize((target_size, target_size), Image.LANCZOS)
|
| 228 |
-
|
| 229 |
-
# Save original
|
| 230 |
-
img.save(os.path.join(output_dir, f"image_{output_idx:04d}.png"))
|
| 231 |
-
output_idx += 1
|
| 232 |
-
|
| 233 |
-
# Create augmented versions
|
| 234 |
-
# Horizontal flip
|
| 235 |
-
flipped = img.transpose(Image.FLIP_LEFT_RIGHT)
|
| 236 |
-
flipped.save(os.path.join(output_dir, f"image_{output_idx:04d}.png"))
|
| 237 |
-
output_idx += 1
|
| 238 |
-
|
| 239 |
-
# Slight rotation
|
| 240 |
-
rotated = img.rotate(random.uniform(-10, 10), fillcolor=(255, 255, 255))
|
| 241 |
-
rotated.save(os.path.join(output_dir, f"image_{output_idx:04d}.png"))
|
| 242 |
-
output_idx += 1
|
| 243 |
-
|
| 244 |
-
# Color variation
|
| 245 |
-
enhancer = ImageEnhance.Color(img)
|
| 246 |
-
varied = enhancer.enhance(random.uniform(0.8, 1.2))
|
| 247 |
-
varied.save(os.path.join(output_dir, f"image_{output_idx:04d}.png"))
|
| 248 |
-
output_idx += 1
|
| 249 |
-
|
| 250 |
-
logger.info(f"Prepared {output_idx} training images in {output_dir}")
|
| 251 |
-
return output_idx
|
| 252 |
-
|
| 253 |
-
|
| 254 |
-
if __name__ == "__main__":
|
| 255 |
-
parser = argparse.ArgumentParser(description="Train crayon-style LoRA for DoodleBook")
|
| 256 |
-
parser.add_argument("--images_dir", required=True, help="Directory with training images")
|
| 257 |
-
parser.add_argument("--output_dir", default="./lora-weights", help="Output directory")
|
| 258 |
-
parser.add_argument("--num_steps", type=int, default=300, help="Training steps")
|
| 259 |
-
parser.add_argument("--learning_rate", type=float, default=1e-4, help="Learning rate")
|
| 260 |
-
parser.add_argument("--rank", type=int, default=16, help="LoRA rank")
|
| 261 |
-
parser.add_argument("--publish", action="store_true", help="Publish to HF Hub")
|
| 262 |
-
|
| 263 |
-
args = parser.parse_args()
|
| 264 |
-
|
| 265 |
-
logging.basicConfig(level=logging.INFO)
|
| 266 |
-
|
| 267 |
-
# Prepare training data
|
| 268 |
-
training_dir = "./training_data"
|
| 269 |
-
prepare_training_data(args.images_dir, training_dir)
|
| 270 |
-
|
| 271 |
-
# Get training images
|
| 272 |
-
training_images = list(Path(training_dir).glob("*.png"))
|
| 273 |
-
|
| 274 |
-
# Train LoRA
|
| 275 |
-
output_dir = train_lora(
|
| 276 |
-
[str(p) for p in training_images],
|
| 277 |
-
args.output_dir,
|
| 278 |
-
args.num_steps,
|
| 279 |
-
args.learning_rate,
|
| 280 |
-
args.rank
|
| 281 |
-
)
|
| 282 |
-
|
| 283 |
-
# Publish if requested
|
| 284 |
-
if args.publish:
|
| 285 |
-
publish_to_hf(output_dir)
|
| 286 |
-
|
| 287 |
-
print("Training complete!")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
modal_workers/__init__.py
DELETED
|
File without changes
|
modal_workers/modal_image_gen.py
DELETED
|
@@ -1,332 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Modal image generation — FLUX.2-klein-4B (verified API) on A100.
|
| 3 |
-
|
| 4 |
-
Verified against the live model card (June 2026):
|
| 5 |
-
from diffusers import Flux2KleinPipeline
|
| 6 |
-
pipe(prompt=..., guidance_scale=1.0, num_inference_steps=4) # fast distilled
|
| 7 |
-
|
| 8 |
-
Character consistency (C1): identical character_description on every page +
|
| 9 |
-
locked seed S, page i uses S+i. The character_description is produced upstream
|
| 10 |
-
by the vision worker (reads the child's doodle) so the hero matches their drawing.
|
| 11 |
-
"""
|
| 12 |
-
|
| 13 |
-
import os
|
| 14 |
-
import io
|
| 15 |
-
import logging
|
| 16 |
-
|
| 17 |
-
import modal
|
| 18 |
-
|
| 19 |
-
logger = logging.getLogger(__name__)
|
| 20 |
-
|
| 21 |
-
app = modal.App("doodlebook-image-gen")
|
| 22 |
-
|
| 23 |
-
CACHE = "/cache"
|
| 24 |
-
vol = modal.Volume.from_name("doodlebook-hf-cache", create_if_missing=True)
|
| 25 |
-
HF_SECRET = modal.Secret.from_name("huggingface")
|
| 26 |
-
|
| 27 |
-
flux_image = (
|
| 28 |
-
modal.Image.debian_slim(python_version="3.11")
|
| 29 |
-
.pip_install(
|
| 30 |
-
"torch", "diffusers", "transformers", "accelerate",
|
| 31 |
-
"sentencepiece", "pillow", "huggingface_hub",
|
| 32 |
-
)
|
| 33 |
-
.env({"HF_HOME": CACHE})
|
| 34 |
-
)
|
| 35 |
-
|
| 36 |
-
MIN_CONTAINERS = int(os.environ.get("DOODLEBOOK_KEEP_WARM", "0"))
|
| 37 |
-
# FLUX.2-klein-4B is ~13GB in bf16 (see config.FLUX_MODEL.vram_gb) — fits an
|
| 38 |
-
# A10G (24GB) with room to spare, ~4-5x cheaper than A100-40GB and far more
|
| 39 |
-
# available, so books stop exhausting the scarce A100 pool.
|
| 40 |
-
GPU = "A10G"
|
| 41 |
-
# A real per-call timeout: if a render can't get a GPU slot (account GPU quota
|
| 42 |
-
# exhausted) it FAILS instead of queuing for 24h, so services/images.py falls
|
| 43 |
-
# back to the local sketch instead of the app spinning forever.
|
| 44 |
-
RENDER_TIMEOUT = 300 # 5 min — generous for a cold start + 6-step render
|
| 45 |
-
# Short scaledown so idle containers release their GPU quota quickly instead of
|
| 46 |
-
# pinning it for 20 min and blocking the next book (this was the deadlock).
|
| 47 |
-
SCALEDOWN = 120
|
| 48 |
-
FLUX_ID = "black-forest-labs/FLUX.2-klein-4B"
|
| 49 |
-
DEFAULT_ART_STYLE = (
|
| 50 |
-
"children's crayon storybook illustration, bold black outlines, "
|
| 51 |
-
"flat bright colors, simple shapes"
|
| 52 |
-
)
|
| 53 |
-
DEFAULT_COLORING_STYLE = (
|
| 54 |
-
"children's coloring book page, pure black ink outlines on pure white paper, "
|
| 55 |
-
"clean contour lines, no color, no gray, no shading, no texture, "
|
| 56 |
-
"no hatching, no pencil marks, open spaces to color"
|
| 57 |
-
)
|
| 58 |
-
GPU_FN = dict( # shared decorator kwargs for the FLUX functions
|
| 59 |
-
gpu=GPU, image=flux_image, volumes={CACHE: vol}, secrets=[HF_SECRET],
|
| 60 |
-
timeout=RENDER_TIMEOUT, min_containers=MIN_CONTAINERS, scaledown_window=SCALEDOWN,
|
| 61 |
-
)
|
| 62 |
-
|
| 63 |
-
# loaded once per warm container, reused across calls
|
| 64 |
-
_PIPE = None
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
def _get_pipe():
|
| 68 |
-
global _PIPE
|
| 69 |
-
if _PIPE is None:
|
| 70 |
-
import torch
|
| 71 |
-
from diffusers import Flux2KleinPipeline
|
| 72 |
-
logger.info("Loading FLUX.2-klein-4B…")
|
| 73 |
-
_PIPE = Flux2KleinPipeline.from_pretrained(
|
| 74 |
-
FLUX_ID, torch_dtype=torch.bfloat16, cache_dir=CACHE,
|
| 75 |
-
)
|
| 76 |
-
_PIPE.enable_model_cpu_offload()
|
| 77 |
-
logger.info("FLUX ready.")
|
| 78 |
-
return _PIPE
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
@app.function(**GPU_FN)
|
| 82 |
-
def generate_book_pages(
|
| 83 |
-
character_desc: str,
|
| 84 |
-
story_beats: list[str],
|
| 85 |
-
doodle: bytes = None,
|
| 86 |
-
art_style: str = "children's crayon storybook illustration, bold black outlines, flat bright colors, simple shapes",
|
| 87 |
-
seed: int = 42,
|
| 88 |
-
lora_repo: str = None,
|
| 89 |
-
tiny: bool = False,
|
| 90 |
-
) -> list[bytes]:
|
| 91 |
-
"""
|
| 92 |
-
Render all 6 pages so the hero MATCHES THE CHILD'S DRAWING.
|
| 93 |
-
|
| 94 |
-
Two-stage when a doodle is provided (FLUX.2-klein image reference):
|
| 95 |
-
Stage 1: doodle -> canonical full-body character (same creature, colorized)
|
| 96 |
-
Stage 2: canonical -> the SAME character placed into each story scene
|
| 97 |
-
Falls back to text2img from `character_desc` only when no doodle is given.
|
| 98 |
-
"""
|
| 99 |
-
import torch
|
| 100 |
-
from PIL import Image
|
| 101 |
-
|
| 102 |
-
pipe = _get_pipe()
|
| 103 |
-
if lora_repo:
|
| 104 |
-
try:
|
| 105 |
-
pipe.load_lora_weights(lora_repo)
|
| 106 |
-
logger.info(f"LoRA loaded: {lora_repo}")
|
| 107 |
-
except Exception as e:
|
| 108 |
-
logger.warning(f"LoRA load failed ({e}); base model")
|
| 109 |
-
|
| 110 |
-
steps = 4 if tiny else 6
|
| 111 |
-
|
| 112 |
-
def _gen(image, prompt, s):
|
| 113 |
-
kw = dict(prompt=prompt, height=768, width=768, guidance_scale=1.0,
|
| 114 |
-
num_inference_steps=steps,
|
| 115 |
-
generator=torch.Generator("cuda").manual_seed(s))
|
| 116 |
-
if image is not None:
|
| 117 |
-
kw["image"] = image
|
| 118 |
-
return pipe(**kw).images[0]
|
| 119 |
-
|
| 120 |
-
# --- Stage 1: canonical character from the actual drawing ---
|
| 121 |
-
canonical = None
|
| 122 |
-
if doodle:
|
| 123 |
-
try:
|
| 124 |
-
ref = Image.open(io.BytesIO(doodle)).convert("RGB")
|
| 125 |
-
canonical = _gen(
|
| 126 |
-
ref,
|
| 127 |
-
("Turn this child's drawing into a clean, friendly, full-body cartoon "
|
| 128 |
-
"character for a children's storybook. Keep the EXACT same creature, "
|
| 129 |
-
"face, and features as the drawing. " + art_style +
|
| 130 |
-
", plain white background, full character visible, centered."),
|
| 131 |
-
seed,
|
| 132 |
-
)
|
| 133 |
-
logger.info("canonical character built from doodle")
|
| 134 |
-
except Exception as e:
|
| 135 |
-
logger.warning(f"canonical build failed ({e}); text2img fallback")
|
| 136 |
-
canonical = None
|
| 137 |
-
|
| 138 |
-
# --- Stage 2: place the SAME character into each scene ---
|
| 139 |
-
pages = []
|
| 140 |
-
for i, beat in enumerate(story_beats):
|
| 141 |
-
if canonical is not None:
|
| 142 |
-
prompt = (
|
| 143 |
-
f"The same character. {beat}. {art_style}, "
|
| 144 |
-
f"full colorful background scene, the character clearly visible."
|
| 145 |
-
)
|
| 146 |
-
img = _gen(canonical, prompt, seed + i + 1)
|
| 147 |
-
else:
|
| 148 |
-
prompt = (
|
| 149 |
-
f"{character_desc}. Scene: {beat}. {art_style}, white background, "
|
| 150 |
-
f"centered, full character visible, same character design throughout"
|
| 151 |
-
)
|
| 152 |
-
img = _gen(None, prompt, seed + i + 1)
|
| 153 |
-
buf = io.BytesIO(); img.save(buf, format="PNG")
|
| 154 |
-
pages.append(buf.getvalue())
|
| 155 |
-
logger.info(f"page {i+1}/{len(story_beats)} done")
|
| 156 |
-
|
| 157 |
-
if lora_repo:
|
| 158 |
-
try:
|
| 159 |
-
pipe.unload_lora_weights()
|
| 160 |
-
except Exception:
|
| 161 |
-
pass
|
| 162 |
-
vol.commit()
|
| 163 |
-
return pages
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
# ============================================================================
|
| 167 |
-
# PARALLEL PATH — split into canonical (1 call) + per-page (fan out via .starmap)
|
| 168 |
-
# so the 6 scenes render concurrently across warm containers instead of one
|
| 169 |
-
# container doing 7 inferences back-to-back. Orchestrated by services/images.py.
|
| 170 |
-
# ============================================================================
|
| 171 |
-
|
| 172 |
-
# canonical runs once per book, so keep at most ONE warm (don't double the bill)
|
| 173 |
-
_CANON_FN = {**GPU_FN, "min_containers": min(1, MIN_CONTAINERS)}
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
@app.function(**_CANON_FN)
|
| 177 |
-
def build_canonical(
|
| 178 |
-
doodle: bytes,
|
| 179 |
-
art_style: str = DEFAULT_ART_STYLE,
|
| 180 |
-
seed: int = 42,
|
| 181 |
-
tiny: bool = False,
|
| 182 |
-
) -> bytes:
|
| 183 |
-
"""Stage 1: child's drawing -> canonical full-body character (PNG bytes).
|
| 184 |
-
Returns b"" when no doodle is given (caller then renders text2img per page)."""
|
| 185 |
-
if not doodle:
|
| 186 |
-
return b""
|
| 187 |
-
import io
|
| 188 |
-
import torch
|
| 189 |
-
from PIL import Image
|
| 190 |
-
|
| 191 |
-
pipe = _get_pipe()
|
| 192 |
-
ref = Image.open(io.BytesIO(doodle)).convert("RGB")
|
| 193 |
-
img = pipe(
|
| 194 |
-
prompt=("Turn this child's drawing into a clean, friendly, full-body cartoon "
|
| 195 |
-
"character for a children's storybook. Keep the EXACT same creature, "
|
| 196 |
-
"face, and features as the drawing. " + art_style +
|
| 197 |
-
", plain white background, full character visible, centered."),
|
| 198 |
-
image=ref, height=768, width=768, guidance_scale=1.0,
|
| 199 |
-
num_inference_steps=4 if tiny else 6,
|
| 200 |
-
generator=torch.Generator("cuda").manual_seed(seed),
|
| 201 |
-
).images[0]
|
| 202 |
-
buf = io.BytesIO(); img.save(buf, format="PNG")
|
| 203 |
-
return buf.getvalue()
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
@app.function(**GPU_FN)
|
| 207 |
-
def render_page(
|
| 208 |
-
canonical: bytes,
|
| 209 |
-
character_desc: str,
|
| 210 |
-
beat: str,
|
| 211 |
-
art_style: str = DEFAULT_ART_STYLE,
|
| 212 |
-
seed: int = 42,
|
| 213 |
-
tiny: bool = False,
|
| 214 |
-
) -> bytes:
|
| 215 |
-
"""Stage 2: render ONE scene. Uses the canonical character as an image
|
| 216 |
-
reference when provided (consistency), else text2img from character_desc."""
|
| 217 |
-
import io
|
| 218 |
-
import torch
|
| 219 |
-
from PIL import Image
|
| 220 |
-
|
| 221 |
-
pipe = _get_pipe()
|
| 222 |
-
if canonical:
|
| 223 |
-
ref = Image.open(io.BytesIO(canonical)).convert("RGB")
|
| 224 |
-
prompt = (f"The same character. {beat}. {art_style}, "
|
| 225 |
-
f"full colorful background scene, the character clearly visible.")
|
| 226 |
-
kw = dict(prompt=prompt, image=ref)
|
| 227 |
-
else:
|
| 228 |
-
prompt = (f"{character_desc}. Scene: {beat}. {art_style}, white background, "
|
| 229 |
-
f"centered, full character visible, same character design throughout")
|
| 230 |
-
kw = dict(prompt=prompt)
|
| 231 |
-
kw.update(height=768, width=768, guidance_scale=1.0,
|
| 232 |
-
num_inference_steps=4 if tiny else 6,
|
| 233 |
-
generator=torch.Generator("cuda").manual_seed(seed))
|
| 234 |
-
img = pipe(**kw).images[0]
|
| 235 |
-
buf = io.BytesIO(); img.save(buf, format="PNG")
|
| 236 |
-
return buf.getvalue()
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
@app.function(**GPU_FN)
|
| 240 |
-
def render_coloring_page(
|
| 241 |
-
canonical: bytes,
|
| 242 |
-
character_desc: str,
|
| 243 |
-
beat: str,
|
| 244 |
-
art_style: str = DEFAULT_COLORING_STYLE,
|
| 245 |
-
seed: int = 42,
|
| 246 |
-
tiny: bool = False,
|
| 247 |
-
) -> bytes:
|
| 248 |
-
"""Stage 2 alternate render: same scene, but directly as clean line art."""
|
| 249 |
-
import io
|
| 250 |
-
import torch
|
| 251 |
-
from PIL import Image
|
| 252 |
-
|
| 253 |
-
pipe = _get_pipe()
|
| 254 |
-
if canonical:
|
| 255 |
-
ref = Image.open(io.BytesIO(canonical)).convert("RGB")
|
| 256 |
-
prompt = (
|
| 257 |
-
f"The same character. {beat}. {art_style}, simple clean background shapes, "
|
| 258 |
-
f"same composition, thick readable outlines, no filled black areas, "
|
| 259 |
-
f"no extra sketch marks."
|
| 260 |
-
)
|
| 261 |
-
kw = dict(prompt=prompt, image=ref)
|
| 262 |
-
else:
|
| 263 |
-
prompt = (
|
| 264 |
-
f"{character_desc}. Scene: {beat}. {art_style}, white background, "
|
| 265 |
-
f"centered, full character visible, same character design throughout"
|
| 266 |
-
)
|
| 267 |
-
kw = dict(prompt=prompt)
|
| 268 |
-
kw.update(
|
| 269 |
-
height=768,
|
| 270 |
-
width=768,
|
| 271 |
-
guidance_scale=1.0,
|
| 272 |
-
num_inference_steps=4 if tiny else 6,
|
| 273 |
-
generator=torch.Generator("cuda").manual_seed(seed),
|
| 274 |
-
)
|
| 275 |
-
img = pipe(**kw).images[0]
|
| 276 |
-
buf = io.BytesIO()
|
| 277 |
-
img.save(buf, format="PNG")
|
| 278 |
-
return buf.getvalue()
|
| 279 |
-
|
| 280 |
-
|
| 281 |
-
LINEART_PROMPT = (
|
| 282 |
-
"black and white coloring book line art, clean bold contour lines only on a "
|
| 283 |
-
"pure white background, no shading, no gray, no color, no fill, no crayon "
|
| 284 |
-
"texture, no crosshatching, no tiny details, simple shapes a child can color"
|
| 285 |
-
)
|
| 286 |
-
|
| 287 |
-
LINEART_NEGATIVE_PROMPT = (
|
| 288 |
-
"color, grayscale, shadows, shading, gradients, texture, speckles, noise, "
|
| 289 |
-
"blur, sketch shading, hatch marks, crosshatching, filled shapes, busy background"
|
| 290 |
-
)
|
| 291 |
-
|
| 292 |
-
|
| 293 |
-
@app.function(**GPU_FN)
|
| 294 |
-
def render_lineart(color_png: bytes, seed: int = 42) -> bytes:
|
| 295 |
-
"""Turn a finished COLOR page into clean coloring-book line art.
|
| 296 |
-
|
| 297 |
-
img2img from the color page so the coloring page matches the story picture
|
| 298 |
-
(same pose/composition), but FLUX REDRAWS it as outlines — it understands the
|
| 299 |
-
scene semantically (kid + clouds + hills) and traces shape boundaries instead
|
| 300 |
-
of the crayon texture that wrecked the old OpenCV edge-trace.
|
| 301 |
-
|
| 302 |
-
`strength` controls how far it departs from the source: high enough to redraw
|
| 303 |
-
as flat line art, low enough to keep the composition. Flux2KleinPipeline may
|
| 304 |
-
not expose `strength` (it's a unified edit/reference pipeline), so we pass it
|
| 305 |
-
when accepted and silently retry without it.
|
| 306 |
-
"""
|
| 307 |
-
import io
|
| 308 |
-
import torch
|
| 309 |
-
from PIL import Image
|
| 310 |
-
|
| 311 |
-
pipe = _get_pipe()
|
| 312 |
-
ref = Image.open(io.BytesIO(color_png)).convert("RGB")
|
| 313 |
-
base = dict(
|
| 314 |
-
prompt=LINEART_PROMPT, image=ref, height=768, width=768,
|
| 315 |
-
guidance_scale=1.0, num_inference_steps=6,
|
| 316 |
-
generator=torch.Generator("cuda").manual_seed(seed),
|
| 317 |
-
)
|
| 318 |
-
try:
|
| 319 |
-
img = pipe(**base, strength=0.68, negative_prompt=LINEART_NEGATIVE_PROMPT).images[0]
|
| 320 |
-
except TypeError:
|
| 321 |
-
logger.info("pipeline rejected `strength`; retrying without it")
|
| 322 |
-
try:
|
| 323 |
-
img = pipe(**base, negative_prompt=LINEART_NEGATIVE_PROMPT).images[0]
|
| 324 |
-
except TypeError:
|
| 325 |
-
img = pipe(**base).images[0]
|
| 326 |
-
buf = io.BytesIO(); img.save(buf, format="PNG")
|
| 327 |
-
return buf.getvalue()
|
| 328 |
-
|
| 329 |
-
|
| 330 |
-
@app.function(image=flux_image, timeout=30)
|
| 331 |
-
def health_check() -> str:
|
| 332 |
-
return "image_gen_healthy"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
modal_workers/modal_story_gen.py
DELETED
|
@@ -1,397 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Modal story generation — MiniCPM5-1B on T4 GPU.
|
| 3 |
-
|
| 4 |
-
C2 Compliance: 3-layer JSON parser + template fallback
|
| 5 |
-
- Layer 1: Regex extraction of {...} block
|
| 6 |
-
- Layer 2: json-repair / json5 parsing
|
| 7 |
-
- Layer 3: Deterministic template fallback (NEVER crashes)
|
| 8 |
-
|
| 9 |
-
Few-shot prompt with ONE full exemplar for reliable JSON output.
|
| 10 |
-
Greedy decode (do_sample=False) for determinism.
|
| 11 |
-
"""
|
| 12 |
-
|
| 13 |
-
import modal
|
| 14 |
-
import re
|
| 15 |
-
import json
|
| 16 |
-
import logging
|
| 17 |
-
|
| 18 |
-
logger = logging.getLogger(__name__)
|
| 19 |
-
|
| 20 |
-
app = modal.App("doodlebook-story")
|
| 21 |
-
|
| 22 |
-
story_env = modal.Image.debian_slim().pip_install(
|
| 23 |
-
"transformers>=4.40", "torch", "accelerate", "sentencepiece"
|
| 24 |
-
)
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
# ============================================================================
|
| 28 |
-
# FEW-SHOT EXEMPLAR
|
| 29 |
-
# ============================================================================
|
| 30 |
-
|
| 31 |
-
FEW_SHOT_EXEMPLAR = """
|
| 32 |
-
Write a 6-page children's storybook for age 5 about Luna the cat with theme: brave adventure.
|
| 33 |
-
|
| 34 |
-
Return ONLY valid JSON:
|
| 35 |
-
{
|
| 36 |
-
"title": "Luna's Brave Adventure",
|
| 37 |
-
"character_description": "A small orange tabby cat named Luna with big green eyes, whiskers, and a tiny red scarf",
|
| 38 |
-
"pages": [
|
| 39 |
-
{"page": 1, "text": "Luna was a small orange cat who loved to explore.", "scene": "Luna sitting by the window looking outside"},
|
| 40 |
-
{"page": 2, "text": "One sunny morning, Luna saw something sparkling in the forest.", "scene": "Luna spotting a glow in the trees"},
|
| 41 |
-
{"page": 3, "text": "Bravely, Luna crept into the forest to investigate.", "scene": "Luna walking cautiously through trees"},
|
| 42 |
-
{"page": 4, "text": "It was a tiny fairy stuck in a spider web!", "scene": "Luna discovering a fairy in trouble"},
|
| 43 |
-
{"page": 5, "text": "Luna gently freed the fairy with her paw.", "scene": "Luna carefully helping the fairy"},
|
| 44 |
-
{"page": 6, "text": "The fairy thanked Luna and they became friends forever.", "scene": "Luna and fairy playing together at sunset"}
|
| 45 |
-
]
|
| 46 |
-
}
|
| 47 |
-
"""
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
# ============================================================================
|
| 51 |
-
# STORY GENERATION PROMPT
|
| 52 |
-
# ============================================================================
|
| 53 |
-
|
| 54 |
-
def build_prompt(hero_name: str, theme: str, age: int) -> str:
|
| 55 |
-
"""Build few-shot prompt for story generation."""
|
| 56 |
-
return f"""{FEW_SHOT_EXEMPLAR}
|
| 57 |
-
|
| 58 |
-
Write a 6-page children's storybook for age {age} about {hero_name} with theme: {theme}.
|
| 59 |
-
|
| 60 |
-
Return ONLY valid JSON:
|
| 61 |
-
"""
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
# ============================================================================
|
| 65 |
-
# 3-LAYER JSON PARSER (C2)
|
| 66 |
-
# ============================================================================
|
| 67 |
-
|
| 68 |
-
def parse_story_json(raw_output: str) -> dict:
|
| 69 |
-
"""
|
| 70 |
-
3-layer parser: regex → json5/repair → template fallback.
|
| 71 |
-
|
| 72 |
-
Layer 1: Extract {...} block with regex
|
| 73 |
-
Layer 2: Parse with json.loads, repair common issues
|
| 74 |
-
Layer 3: Return deterministic template (NEVER crashes)
|
| 75 |
-
|
| 76 |
-
Args:
|
| 77 |
-
raw_output: Raw model output string
|
| 78 |
-
|
| 79 |
-
Returns:
|
| 80 |
-
Parsed story dict with keys: title, character_description, pages
|
| 81 |
-
"""
|
| 82 |
-
# Layer 1: Regex extraction
|
| 83 |
-
story = _layer1_regex_extract(raw_output)
|
| 84 |
-
if story:
|
| 85 |
-
return story
|
| 86 |
-
|
| 87 |
-
# Layer 2: JSON repair
|
| 88 |
-
story = _layer2_json_repair(raw_output)
|
| 89 |
-
if story:
|
| 90 |
-
return story
|
| 91 |
-
|
| 92 |
-
# Layer 3: Template fallback (NEVER crashes)
|
| 93 |
-
logger.warning("All parsing failed, using template fallback")
|
| 94 |
-
return _layer3_template_fallback(raw_output)
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
def _layer1_regex_extract(text: str) -> dict | None:
|
| 98 |
-
"""Layer 1: Extract {...} block with regex."""
|
| 99 |
-
try:
|
| 100 |
-
# Find the outermost {...} block
|
| 101 |
-
match = re.search(r'\{[\s\S]*\}', text)
|
| 102 |
-
if not match:
|
| 103 |
-
return None
|
| 104 |
-
|
| 105 |
-
json_str = match.group(0)
|
| 106 |
-
story = json.loads(json_str)
|
| 107 |
-
|
| 108 |
-
# Validate structure
|
| 109 |
-
if _validate_story_structure(story):
|
| 110 |
-
return story
|
| 111 |
-
return None
|
| 112 |
-
except (json.JSONDecodeError, KeyError, TypeError):
|
| 113 |
-
return None
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
def _layer2_json_repair(text: str) -> dict | None:
|
| 117 |
-
"""Layer 2: Repair common JSON issues and parse."""
|
| 118 |
-
try:
|
| 119 |
-
# Find the {...} block
|
| 120 |
-
match = re.search(r'\{[\s\S]*\}', text)
|
| 121 |
-
if not match:
|
| 122 |
-
return None
|
| 123 |
-
|
| 124 |
-
json_str = match.group(0)
|
| 125 |
-
|
| 126 |
-
# Common repairs
|
| 127 |
-
json_str = _repair_json(json_str)
|
| 128 |
-
|
| 129 |
-
story = json.loads(json_str)
|
| 130 |
-
|
| 131 |
-
if _validate_story_structure(story):
|
| 132 |
-
return story
|
| 133 |
-
return None
|
| 134 |
-
except (json.JSONDecodeError, KeyError, TypeError):
|
| 135 |
-
return None
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
def _repair_json(json_str: str) -> str:
|
| 139 |
-
"""Repair common JSON issues from 1B model output."""
|
| 140 |
-
# Remove trailing commas before } or ] (with optional whitespace)
|
| 141 |
-
json_str = re.sub(r',\s*([}\]])', r'\1', json_str)
|
| 142 |
-
|
| 143 |
-
# Remove single-line // comments
|
| 144 |
-
json_str = re.sub(r'//.*?$', '', json_str, flags=re.MULTILINE)
|
| 145 |
-
|
| 146 |
-
# Remove multi-line comments /* ... */
|
| 147 |
-
json_str = re.sub(r'/\*[\s\S]*?\*/', '', json_str)
|
| 148 |
-
|
| 149 |
-
# Fix unescaped newlines in strings
|
| 150 |
-
json_str = re.sub(r'(?<=")\n(?=")', '\\n', json_str)
|
| 151 |
-
|
| 152 |
-
# Fix missing quotes around keys (word before colon)
|
| 153 |
-
json_str = re.sub(r'(\s)(\w+)(\s*:)', r'\1"\2"\3', json_str)
|
| 154 |
-
|
| 155 |
-
return json_str
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
def _validate_story_structure(story: dict) -> bool:
|
| 159 |
-
"""Validate story has required structure."""
|
| 160 |
-
required_keys = ["title", "character_description", "pages"]
|
| 161 |
-
if not all(k in story for k in required_keys):
|
| 162 |
-
return False
|
| 163 |
-
|
| 164 |
-
pages = story.get("pages", [])
|
| 165 |
-
if not isinstance(pages, list) or len(pages) < 1:
|
| 166 |
-
return False
|
| 167 |
-
|
| 168 |
-
# Check first page has required fields
|
| 169 |
-
first_page = pages[0]
|
| 170 |
-
if not all(k in first_page for k in ["page", "text", "scene"]):
|
| 171 |
-
return False
|
| 172 |
-
|
| 173 |
-
return True
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
def _layer3_template_fallback(raw_output: str) -> dict:
|
| 177 |
-
"""
|
| 178 |
-
Layer 3: Deterministic template fallback.
|
| 179 |
-
NEVER crashes - always returns valid 6-page book.
|
| 180 |
-
"""
|
| 181 |
-
# Try to extract any useful text from raw output
|
| 182 |
-
extracted_text = raw_output[:200] if raw_output else "an adventure"
|
| 183 |
-
|
| 184 |
-
return {
|
| 185 |
-
"title": "A Wonderful Adventure",
|
| 186 |
-
"character_description": f"A friendly character who went on {extracted_text}",
|
| 187 |
-
"pages": [
|
| 188 |
-
{"page": 1, "text": "Once upon a time, there was a character who loved adventures.", "scene": "Character introduction"},
|
| 189 |
-
{"page": 2, "text": "One day, something exciting happened.", "scene": "Inciting incident"},
|
| 190 |
-
{"page": 3, "text": "The character bravely faced the challenge.", "scene": "Rising action"},
|
| 191 |
-
{"page": 4, "text": "With courage and kindness, the character succeeded.", "scene": "Climax"},
|
| 192 |
-
{"page": 5, "text": "Friends gathered to celebrate the victory.", "scene": "Resolution"},
|
| 193 |
-
{"page": 6, "text": "And they all lived happily ever after. The end.", "scene": "Happy ending"}
|
| 194 |
-
]
|
| 195 |
-
}
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
# ============================================================================
|
| 199 |
-
# TEMPLATE STORY (for testing without Modal)
|
| 200 |
-
# ============================================================================
|
| 201 |
-
|
| 202 |
-
# ---------------------------------------------------------------------------
|
| 203 |
-
# Local story generator — theme-accurate, character-aware, and VARIED.
|
| 204 |
-
# Each theme has its own 6-beat arc; slots ({place}, {friend}, {thing}, {feeling})
|
| 205 |
-
# are filled from per-theme word banks chosen by a seed derived from hero+theme,
|
| 206 |
-
# so different heroes/themes produce different books (no more identical text).
|
| 207 |
-
# ---------------------------------------------------------------------------
|
| 208 |
-
|
| 209 |
-
_PLACES = ["whispering forest", "sunny meadow", "sparkling river", "cloud-top hill",
|
| 210 |
-
"hidden garden", "snowy valley", "rolling sand dunes", "moonlit lake"]
|
| 211 |
-
_FRIENDS = ["a shy little fox", "a lost baby bird", "a giggling firefly", "a sleepy turtle",
|
| 212 |
-
"a tiny dragon", "a kind old owl", "a bouncing bunny", "a glowing jellyfish"]
|
| 213 |
-
_THINGS = ["a glowing key", "a singing flower", "a map of stars", "a tiny golden bell",
|
| 214 |
-
"a magic seed", "a shimmering shell", "a friendly lantern", "a curious door"]
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
def _theme_arc(theme: str, hero: str, place: str, friend: str, thing: str) -> dict:
|
| 218 |
-
"""Return {title, pages[6]} for the given theme, with slots filled in."""
|
| 219 |
-
T = {
|
| 220 |
-
"brave adventure": {
|
| 221 |
-
"title": f"{hero}'s Brave Adventure",
|
| 222 |
-
"pages": [
|
| 223 |
-
(f"{hero} woke up wanting to explore the world.", f"{hero} standing at the edge of a {place}"),
|
| 224 |
-
(f"At the {place}, {hero} found {thing} glowing softly.", f"{hero} discovering {thing}"),
|
| 225 |
-
(f"Taking a deep breath, {hero} bravely followed where it led.", f"{hero} walking bravely into the {place}"),
|
| 226 |
-
(f"There, {friend} was stuck and a little scared.", f"{friend} in trouble, {hero} nearby"),
|
| 227 |
-
(f"{hero} was brave and gently helped {friend} get free.", f"{hero} helping {friend}"),
|
| 228 |
-
(f"Side by side they went home, and {hero} felt brave and proud.", f"{hero} and {friend} heading home at sunset"),
|
| 229 |
-
],
|
| 230 |
-
},
|
| 231 |
-
"making a new friend": {
|
| 232 |
-
"title": f"{hero} Makes a Friend",
|
| 233 |
-
"pages": [
|
| 234 |
-
(f"{hero} was playing alone in the {place}.", f"{hero} playing alone in a {place}"),
|
| 235 |
-
(f"Nearby, {friend} sat all by itself, looking lonely.", f"{friend} sitting alone"),
|
| 236 |
-
(f"{hero} walked over and said a cheerful hello.", f"{hero} greeting {friend} with a wave"),
|
| 237 |
-
(f"They shared {thing} and laughed together.", f"{hero} and {friend} sharing {thing}"),
|
| 238 |
-
(f"All afternoon they played their favorite games.", f"{hero} and {friend} playing games"),
|
| 239 |
-
(f"Now {hero} knew: a friend is just a hello away.", f"{hero} and {friend} smiling together"),
|
| 240 |
-
],
|
| 241 |
-
},
|
| 242 |
-
"overcoming a fear": {
|
| 243 |
-
"title": f"{hero} and the Big Brave Day",
|
| 244 |
-
"pages": [
|
| 245 |
-
(f"{hero} felt scared of the dark {place}.", f"{hero} looking nervously at a dark {place}"),
|
| 246 |
-
(f"But {friend} needed {thing} from inside it.", f"{friend} asking {hero} for help"),
|
| 247 |
-
(f"{hero}'s tummy felt wobbly, but {hero} took one small step.", f"{hero} taking a brave first step into the {place}"),
|
| 248 |
-
(f"One step, then another — it wasn't so scary after all.", f"{hero} walking carefully, growing braver"),
|
| 249 |
-
(f"{hero} found {thing} and carried it back proudly.", f"{hero} holding {thing} triumphantly"),
|
| 250 |
-
(f"{hero} learned that being brave means trying, even when you're scared.", f"{hero} and {friend} celebrating"),
|
| 251 |
-
],
|
| 252 |
-
},
|
| 253 |
-
"helping someone": {
|
| 254 |
-
"title": f"{hero} Lends a Hand",
|
| 255 |
-
"pages": [
|
| 256 |
-
(f"One morning {hero} skipped through the {place}.", f"{hero} walking happily through a {place}"),
|
| 257 |
-
(f"{hero} heard a tiny cry — it was {friend}!", f"{hero} noticing {friend} in need"),
|
| 258 |
-
(f"{friend} had dropped {thing} and couldn't reach it.", f"{friend} reaching for {thing}"),
|
| 259 |
-
(f"{hero} thought hard and came up with a clever plan.", f"{hero} thinking of a plan"),
|
| 260 |
-
(f"Together they got {thing} back, and {friend} cheered.", f"{hero} and {friend} succeeding together"),
|
| 261 |
-
(f"Helping others made {hero}'s heart feel warm and happy.", f"{hero} and {friend} hugging"),
|
| 262 |
-
],
|
| 263 |
-
},
|
| 264 |
-
"lost and found": {
|
| 265 |
-
"title": f"{hero} and the Lost {thing.split()[-1].title()}",
|
| 266 |
-
"pages": [
|
| 267 |
-
(f"{hero} was playing when {thing} suddenly went missing.", f"{hero} searching for {thing}"),
|
| 268 |
-
(f"{hero} looked all around the {place}.", f"{hero} looking around a {place}"),
|
| 269 |
-
(f"Along the way, {hero} met {friend} who wanted to help.", f"{hero} meeting {friend}"),
|
| 270 |
-
(f"They followed tiny clues together, step by step.", f"{hero} and {friend} following a trail"),
|
| 271 |
-
(f"At last, {thing} was found tucked beneath a leaf!", f"{hero} finding {thing}"),
|
| 272 |
-
(f"{hero} hugged {friend} and thanked them for never giving up.", f"{hero} and {friend} happy together"),
|
| 273 |
-
],
|
| 274 |
-
},
|
| 275 |
-
"learning something new": {
|
| 276 |
-
"title": f"{hero} Learns to Soar",
|
| 277 |
-
"pages": [
|
| 278 |
-
(f"{hero} really wanted to learn something new today.", f"{hero} curious in a {place}"),
|
| 279 |
-
(f"{friend} offered to teach {hero} a wonderful trick.", f"{friend} teaching {hero}"),
|
| 280 |
-
(f"The first try wobbled and didn't work at all.", f"{hero} trying and stumbling"),
|
| 281 |
-
(f"{hero} practiced again and again, never giving up.", f"{hero} practicing hard"),
|
| 282 |
-
(f"Suddenly it worked, with {thing} sparkling in the air!", f"{hero} succeeding with {thing}"),
|
| 283 |
-
(f"{hero} beamed — trying your best helps you grow.", f"{hero} and {friend} celebrating the win"),
|
| 284 |
-
],
|
| 285 |
-
},
|
| 286 |
-
}
|
| 287 |
-
return T.get(theme, T["brave adventure"])
|
| 288 |
-
|
| 289 |
-
|
| 290 |
-
def generate_story_local(hero_name: str, theme: str, age: int = 5) -> dict:
|
| 291 |
-
"""
|
| 292 |
-
Theme-accurate, varied, character-aware story (no Modal/GPU required).
|
| 293 |
-
Deterministic per (hero, theme) but different across heroes/themes.
|
| 294 |
-
"""
|
| 295 |
-
import random
|
| 296 |
-
hero = (hero_name or "Little Hero").strip()
|
| 297 |
-
hero = hero[:1].upper() + hero[1:] if hero else "Little Hero"
|
| 298 |
-
rng = random.Random(hash((hero.lower(), theme)) & 0xFFFFFFFF)
|
| 299 |
-
place = rng.choice(_PLACES)
|
| 300 |
-
friend = rng.choice(_FRIENDS)
|
| 301 |
-
thing = rng.choice(_THINGS)
|
| 302 |
-
|
| 303 |
-
arc = _theme_arc(theme, hero, place, friend, thing)
|
| 304 |
-
pages = [{"page": i + 1, "text": t, "scene": s} for i, (t, s) in enumerate(arc["pages"])]
|
| 305 |
-
|
| 306 |
-
return {
|
| 307 |
-
"title": arc["title"],
|
| 308 |
-
"character_description": (
|
| 309 |
-
f"{hero}, a friendly storybook character, bright crayon colors, "
|
| 310 |
-
f"bold outlines, simple children's-book style"
|
| 311 |
-
),
|
| 312 |
-
"pages": pages,
|
| 313 |
-
}
|
| 314 |
-
|
| 315 |
-
|
| 316 |
-
# ============================================================================
|
| 317 |
-
# MODAL FUNCTION
|
| 318 |
-
# ============================================================================
|
| 319 |
-
|
| 320 |
-
@app.function(gpu="T4", image=story_env, timeout=120)
|
| 321 |
-
def generate_story(character_name: str, theme: str, age: int = 5) -> dict:
|
| 322 |
-
"""
|
| 323 |
-
Generate a 6-page children's story via MiniCPM5-1B.
|
| 324 |
-
|
| 325 |
-
C2 Compliance:
|
| 326 |
-
- Few-shot prompt with ONE full exemplar
|
| 327 |
-
- Greedy decode (do_sample=False)
|
| 328 |
-
- 3-layer parser + template fallback
|
| 329 |
-
- NEVER crashes on bad model output
|
| 330 |
-
|
| 331 |
-
Args:
|
| 332 |
-
character_name: Main character name
|
| 333 |
-
theme: Story theme
|
| 334 |
-
age: Target age
|
| 335 |
-
|
| 336 |
-
Returns:
|
| 337 |
-
dict with keys: title, character_description, pages[{page, text, scene}]
|
| 338 |
-
"""
|
| 339 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 340 |
-
import torch
|
| 341 |
-
|
| 342 |
-
# Load model (MiniCPM ships custom modeling code -> trust_remote_code required)
|
| 343 |
-
model_id = "openbmb/MiniCPM5-1B"
|
| 344 |
-
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
|
| 345 |
-
model = AutoModelForCausalLM.from_pretrained(
|
| 346 |
-
model_id, torch_dtype=torch.float16, trust_remote_code=True
|
| 347 |
-
).cuda().eval()
|
| 348 |
-
|
| 349 |
-
# Build prompt and wrap in the model's chat template (it's an instruct model;
|
| 350 |
-
# a raw prompt generates poorly). enable_thinking=False = no reasoning preamble.
|
| 351 |
-
prompt = build_prompt(character_name, theme, age)
|
| 352 |
-
inputs = tok.apply_chat_template(
|
| 353 |
-
[{"role": "user", "content": prompt}],
|
| 354 |
-
add_generation_prompt=True,
|
| 355 |
-
enable_thinking=False,
|
| 356 |
-
return_dict=True,
|
| 357 |
-
return_tensors="pt",
|
| 358 |
-
).to("cuda")
|
| 359 |
-
|
| 360 |
-
# Generate with greedy decode for determinism
|
| 361 |
-
with torch.no_grad():
|
| 362 |
-
out = model.generate(
|
| 363 |
-
**inputs,
|
| 364 |
-
max_new_tokens=800,
|
| 365 |
-
do_sample=False, # Greedy for determinism
|
| 366 |
-
)
|
| 367 |
-
|
| 368 |
-
response = tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
|
| 369 |
-
|
| 370 |
-
# Parse with 3-layer parser (NEVER crashes)
|
| 371 |
-
story = parse_story_json(response)
|
| 372 |
-
|
| 373 |
-
# Ensure pages list has exactly 6 entries
|
| 374 |
-
while len(story.get("pages", [])) < 6:
|
| 375 |
-
story.setdefault("pages", []).append({
|
| 376 |
-
"page": len(story.get("pages", [])) + 1,
|
| 377 |
-
"text": "And the adventure continued happily.",
|
| 378 |
-
"scene": "Continuing adventure"
|
| 379 |
-
})
|
| 380 |
-
|
| 381 |
-
return story
|
| 382 |
-
|
| 383 |
-
|
| 384 |
-
@app.function(gpu="T4", image=story_env, timeout=30)
|
| 385 |
-
def health_check() -> str:
|
| 386 |
-
"""Quick health check for Modal function."""
|
| 387 |
-
return "story_gen_healthy"
|
| 388 |
-
|
| 389 |
-
|
| 390 |
-
# ============================================================================
|
| 391 |
-
# CLI TEST
|
| 392 |
-
# ============================================================================
|
| 393 |
-
|
| 394 |
-
if __name__ == "__main__":
|
| 395 |
-
# Test local generation
|
| 396 |
-
story = generate_story_local("Ziggy", "brave adventure", 5)
|
| 397 |
-
print(json.dumps(story, indent=2))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
modal_workers/modal_tts.py
DELETED
|
@@ -1,235 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Modal TTS — VoxCPM2 narration on T4 GPU.
|
| 3 |
-
|
| 4 |
-
C5 Compliance: Fallback chain
|
| 5 |
-
- Primary: VoxCPM2 (2B, Apache 2.0)
|
| 6 |
-
- Fallback 1: Kokoro-82M (ultra-lightweight)
|
| 7 |
-
- Fallback 2: MeloTTS (MIT license)
|
| 8 |
-
|
| 9 |
-
Generates WAV audio for book narration.
|
| 10 |
-
"""
|
| 11 |
-
|
| 12 |
-
import modal
|
| 13 |
-
import io
|
| 14 |
-
import os
|
| 15 |
-
import logging
|
| 16 |
-
|
| 17 |
-
logger = logging.getLogger(__name__)
|
| 18 |
-
|
| 19 |
-
app = modal.App("doodlebook-tts")
|
| 20 |
-
|
| 21 |
-
# Keep N containers always warm so the app is "live" with no cold start.
|
| 22 |
-
# 0 = scale to zero when idle (cheap); 1 = always-on GPU (costs money 24/7).
|
| 23 |
-
# Set at deploy time: DOODLEBOOK_KEEP_WARM=1 modal deploy modal_workers/modal_tts.py
|
| 24 |
-
KEEP_WARM = int(os.environ.get("DOODLEBOOK_KEEP_WARM", "0"))
|
| 25 |
-
NO_TIMEOUT = 86400 # Modal's max (24h) — effectively no per-call timeout
|
| 26 |
-
|
| 27 |
-
CACHE = "/cache"
|
| 28 |
-
vol = modal.Volume.from_name("doodlebook-hf-cache", create_if_missing=True)
|
| 29 |
-
HF_SECRET = modal.Secret.from_name("huggingface")
|
| 30 |
-
|
| 31 |
-
tts_env = (
|
| 32 |
-
modal.Image.debian_slim(python_version="3.11")
|
| 33 |
-
.apt_install("ffmpeg")
|
| 34 |
-
.pip_install("voxcpm==2.0.3", "soundfile", "torch", "huggingface_hub")
|
| 35 |
-
.env({"HF_HOME": CACHE})
|
| 36 |
-
)
|
| 37 |
-
|
| 38 |
-
# Child-friendly voices (VoxCPM2 "voice design" prefixes). The (parenthetical) is
|
| 39 |
-
# interpreted as a voice instruction, not spoken aloud.
|
| 40 |
-
# MIRROR of config.VOICE_PRESETS — kept inline so the Modal deploy stays
|
| 41 |
-
# import-free. Keep the two in sync if you edit them. Default leans young.
|
| 42 |
-
DEFAULT_VOICE = "kid"
|
| 43 |
-
VOICE_DESIGN = {
|
| 44 |
-
"kid": "(A sweet little girl around seven years old telling a story to her "
|
| 45 |
-
"friends, bright high-pitched cheerful child's voice, playful, giggly "
|
| 46 |
-
"and full of wonder)",
|
| 47 |
-
"big_kid": "(A lively young girl about eleven years old reading a fun story "
|
| 48 |
-
"aloud, bright youthful energetic voice, expressive and excited)",
|
| 49 |
-
"playful": "(A cheerful, friendly young woman telling a fun children's story, "
|
| 50 |
-
"bright, animated, smiling, expressive)",
|
| 51 |
-
"storyteller": "(A warm, gentle female storyteller reading a bedtime story to a "
|
| 52 |
-
"young child, soft, soothing, slow and expressive, kind and cozy)",
|
| 53 |
-
"grandpa": "(A kind, gentle old grandfather telling a cozy bedtime story, warm, "
|
| 54 |
-
"slow, soothing)",
|
| 55 |
-
}
|
| 56 |
-
|
| 57 |
-
_TTS = None
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
def _get_tts():
|
| 61 |
-
global _TTS
|
| 62 |
-
if _TTS is None:
|
| 63 |
-
from voxcpm import VoxCPM
|
| 64 |
-
logger.info("Loading VoxCPM2…")
|
| 65 |
-
_TTS = VoxCPM.from_pretrained("openbmb/VoxCPM2", load_denoiser=False)
|
| 66 |
-
logger.info("VoxCPM2 ready.")
|
| 67 |
-
return _TTS
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
@app.function(
|
| 71 |
-
gpu="A10G", image=tts_env, volumes={CACHE: vol}, secrets=[HF_SECRET],
|
| 72 |
-
timeout=NO_TIMEOUT, scaledown_window=1200, min_containers=KEEP_WARM,
|
| 73 |
-
)
|
| 74 |
-
def speak_book(text: str, voice: str = DEFAULT_VOICE) -> bytes:
|
| 75 |
-
"""
|
| 76 |
-
Narrate the book with VoxCPM2 using a child-friendly storyteller voice.
|
| 77 |
-
|
| 78 |
-
Generates sentence-by-sentence with the SAME voice-design prefix for a
|
| 79 |
-
consistent voice, then stitches with short pauses for natural pacing.
|
| 80 |
-
"""
|
| 81 |
-
import re
|
| 82 |
-
import numpy as np
|
| 83 |
-
import soundfile as sf
|
| 84 |
-
|
| 85 |
-
model = _get_tts()
|
| 86 |
-
design = VOICE_DESIGN.get(voice, VOICE_DESIGN[DEFAULT_VOICE])
|
| 87 |
-
sr = model.tts_model.sample_rate
|
| 88 |
-
|
| 89 |
-
# split into sentences so long books stay stable; keep each chunk's voice fixed
|
| 90 |
-
chunks = [s.strip() for s in re.split(r"(?<=[.!?])\s+", text) if s.strip()]
|
| 91 |
-
if not chunks:
|
| 92 |
-
chunks = [text.strip() or "The end."]
|
| 93 |
-
|
| 94 |
-
pause = np.zeros(int(sr * 0.35), dtype=np.float32) # gentle gap between sentences
|
| 95 |
-
pieces = []
|
| 96 |
-
for i, sentence in enumerate(chunks):
|
| 97 |
-
wav = model.generate(
|
| 98 |
-
text=f"{design} {sentence}",
|
| 99 |
-
cfg_value=2.0,
|
| 100 |
-
inference_timesteps=10,
|
| 101 |
-
)
|
| 102 |
-
pieces.append(np.asarray(wav, dtype=np.float32))
|
| 103 |
-
if i < len(chunks) - 1:
|
| 104 |
-
pieces.append(pause)
|
| 105 |
-
logger.info(f"narrated sentence {i+1}/{len(chunks)}")
|
| 106 |
-
|
| 107 |
-
audio = np.concatenate(pieces)
|
| 108 |
-
buf = io.BytesIO()
|
| 109 |
-
sf.write(buf, audio, sr, format="WAV")
|
| 110 |
-
vol.commit()
|
| 111 |
-
return buf.getvalue()
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
# ============================================================================
|
| 115 |
-
# LOCAL TTS (FOR TESTING WITHOUT MODAL)
|
| 116 |
-
# ============================================================================
|
| 117 |
-
|
| 118 |
-
def speak_book_local(text: str, voice: str = DEFAULT_VOICE) -> bytes:
|
| 119 |
-
"""
|
| 120 |
-
Local TTS for testing (no Modal/GPU required).
|
| 121 |
-
|
| 122 |
-
Chain: Windows SAPI5 (offline, audible) -> pyttsx3 (if installed) ->
|
| 123 |
-
silent WAV (last resort). Real child-friendly voice = VoxCPM2 on Modal.
|
| 124 |
-
"""
|
| 125 |
-
for fn in (_speak_sapi_windows, _speak_pyttsx3):
|
| 126 |
-
try:
|
| 127 |
-
audio = fn(text, voice)
|
| 128 |
-
if audio:
|
| 129 |
-
logger.info(f"Local TTS via {fn.__name__}")
|
| 130 |
-
return audio
|
| 131 |
-
except Exception as e:
|
| 132 |
-
logger.warning(f"{fn.__name__} unavailable: {e}")
|
| 133 |
-
logger.error("No working local TTS — returning silence")
|
| 134 |
-
return _generate_silent_wav(duration_seconds=5)
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
def _speak_sapi_windows(text: str, voice: str = DEFAULT_VOICE) -> bytes:
|
| 138 |
-
"""
|
| 139 |
-
Offline Windows TTS via SAPI5 (pywin32). Produces an audible WAV with no
|
| 140 |
-
GPU, no internet, and no extra install — pywin32 ships win32com.
|
| 141 |
-
"""
|
| 142 |
-
import win32com.client
|
| 143 |
-
import pythoncom
|
| 144 |
-
import tempfile
|
| 145 |
-
import os
|
| 146 |
-
|
| 147 |
-
pythoncom.CoInitialize() # Gradio runs handlers off-thread; COM needs init
|
| 148 |
-
path = None
|
| 149 |
-
try:
|
| 150 |
-
spvoice = win32com.client.Dispatch("SAPI.SpVoice")
|
| 151 |
-
stream = win32com.client.Dispatch("SAPI.SpFileStream")
|
| 152 |
-
|
| 153 |
-
# Prefer a female/child-friendly voice if the system has one
|
| 154 |
-
try:
|
| 155 |
-
for tok in spvoice.GetVoices():
|
| 156 |
-
desc = tok.GetDescription()
|
| 157 |
-
if any(n in desc for n in ("Zira", "Hazel", "Female")):
|
| 158 |
-
spvoice.Voice = tok
|
| 159 |
-
break
|
| 160 |
-
except Exception:
|
| 161 |
-
pass
|
| 162 |
-
|
| 163 |
-
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
|
| 164 |
-
path = tmp.name
|
| 165 |
-
|
| 166 |
-
stream.Open(path, 3) # 3 = SSFMCreateForWrite
|
| 167 |
-
spvoice.AudioOutputStream = stream
|
| 168 |
-
spvoice.Rate = -1 # a touch slower, gentle for a bedtime story
|
| 169 |
-
spvoice.Speak(text)
|
| 170 |
-
stream.Close()
|
| 171 |
-
|
| 172 |
-
with open(path, "rb") as f:
|
| 173 |
-
data = f.read()
|
| 174 |
-
# release COM objects before uninit to avoid noisy IUnknown warnings
|
| 175 |
-
spvoice.AudioOutputStream = None
|
| 176 |
-
stream = None
|
| 177 |
-
spvoice = None
|
| 178 |
-
return data
|
| 179 |
-
finally:
|
| 180 |
-
if path and os.path.exists(path):
|
| 181 |
-
try:
|
| 182 |
-
os.unlink(path)
|
| 183 |
-
except OSError:
|
| 184 |
-
pass
|
| 185 |
-
pythoncom.CoUninitialize()
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
def _speak_pyttsx3(text: str, voice: str = DEFAULT_VOICE) -> bytes:
|
| 189 |
-
"""Cross-platform offline TTS via pyttsx3 (only if installed)."""
|
| 190 |
-
import pyttsx3
|
| 191 |
-
import tempfile
|
| 192 |
-
import os
|
| 193 |
-
|
| 194 |
-
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
|
| 195 |
-
path = tmp.name
|
| 196 |
-
engine = pyttsx3.init()
|
| 197 |
-
engine.setProperty("rate", 165)
|
| 198 |
-
engine.save_to_file(text, path)
|
| 199 |
-
engine.runAndWait()
|
| 200 |
-
with open(path, "rb") as f:
|
| 201 |
-
data = f.read()
|
| 202 |
-
if os.path.exists(path):
|
| 203 |
-
os.unlink(path)
|
| 204 |
-
return data
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
def _generate_silent_wav(duration_seconds: int = 5, sample_rate: int = 48000) -> bytes:
|
| 208 |
-
"""Generate silent WAV file as placeholder."""
|
| 209 |
-
import struct
|
| 210 |
-
|
| 211 |
-
# WAV header
|
| 212 |
-
num_samples = sample_rate * duration_seconds
|
| 213 |
-
data_size = num_samples * 2 # 16-bit audio
|
| 214 |
-
|
| 215 |
-
header = struct.pack(
|
| 216 |
-
'<4sI4s4sIHHIIHH4sI',
|
| 217 |
-
b'RIFF', 36 + data_size, b'WAVE',
|
| 218 |
-
b'fmt ', 16, 1, 1, sample_rate, sample_rate * 2, 2, 16,
|
| 219 |
-
b'data', data_size
|
| 220 |
-
)
|
| 221 |
-
|
| 222 |
-
# Silent audio data
|
| 223 |
-
silent_data = b'\x00' * data_size
|
| 224 |
-
|
| 225 |
-
return header + silent_data
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
# ============================================================================
|
| 229 |
-
# HEALTH CHECK
|
| 230 |
-
# ============================================================================
|
| 231 |
-
|
| 232 |
-
@app.function(gpu="T4", image=tts_env, timeout=30)
|
| 233 |
-
def health_check() -> str:
|
| 234 |
-
"""Quick health check for Modal function."""
|
| 235 |
-
return "tts_healthy"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
requirements.txt
CHANGED
|
@@ -1,23 +1,26 @@
|
|
| 1 |
-
# DoodleBook — Dependencies (ZeroGPU Version)
|
| 2 |
-
# Core
|
| 3 |
-
gradio>=5.0
|
| 4 |
-
python-dotenv
|
| 5 |
-
spaces
|
| 6 |
-
|
| 7 |
-
# Image generation
|
| 8 |
-
diffusers>=0.28
|
| 9 |
-
transformers>=4.40
|
| 10 |
-
accelerate
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
#
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
#
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# DoodleBook — Dependencies (ZeroGPU Version)
|
| 2 |
+
# Core
|
| 3 |
+
gradio>=5.0
|
| 4 |
+
python-dotenv
|
| 5 |
+
spaces
|
| 6 |
+
|
| 7 |
+
# Image generation
|
| 8 |
+
diffusers>=0.28
|
| 9 |
+
transformers>=4.40
|
| 10 |
+
accelerate
|
| 11 |
+
torch
|
| 12 |
+
pillow
|
| 13 |
+
sentencepiece
|
| 14 |
+
opencv-python-headless
|
| 15 |
+
numpy
|
| 16 |
+
|
| 17 |
+
# Voice narration (VoxCPM2) — was MISSING, so TTS always hit the silent fallback
|
| 18 |
+
voxcpm
|
| 19 |
+
|
| 20 |
+
# Book building
|
| 21 |
+
fpdf2
|
| 22 |
+
|
| 23 |
+
# Utilities
|
| 24 |
+
requests
|
| 25 |
+
huggingface_hub
|
| 26 |
+
soundfile
|
run.py
DELETED
|
@@ -1,34 +0,0 @@
|
|
| 1 |
-
"""Launcher for DoodleBook — captures all output for debugging."""
|
| 2 |
-
import sys
|
| 3 |
-
import os
|
| 4 |
-
import traceback
|
| 5 |
-
|
| 6 |
-
sys.path.insert(0, os.path.dirname(__file__))
|
| 7 |
-
|
| 8 |
-
# Redirect stderr to file
|
| 9 |
-
log = open("_crash.log", "w", encoding="utf-8")
|
| 10 |
-
sys.stderr = log
|
| 11 |
-
sys.stdout = log
|
| 12 |
-
|
| 13 |
-
try:
|
| 14 |
-
from app import create_layout, load_sample_book, create_book
|
| 15 |
-
print("Imports OK", flush=True)
|
| 16 |
-
|
| 17 |
-
demo = create_layout(
|
| 18 |
-
load_sample_fn=load_sample_book,
|
| 19 |
-
create_book_fn=create_book,
|
| 20 |
-
)
|
| 21 |
-
print("Layout OK", flush=True)
|
| 22 |
-
|
| 23 |
-
demo.launch(server_port=7870, prevent_thread_lock=True)
|
| 24 |
-
print("Launch OK — server running on port 7870", flush=True)
|
| 25 |
-
|
| 26 |
-
import time
|
| 27 |
-
while True:
|
| 28 |
-
time.sleep(60)
|
| 29 |
-
|
| 30 |
-
except Exception as e:
|
| 31 |
-
print(f"ERROR: {e}", flush=True)
|
| 32 |
-
traceback.print_exc(file=log)
|
| 33 |
-
finally:
|
| 34 |
-
log.flush()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
run_modal.py
DELETED
|
@@ -1,287 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
DoodleBook — MODAL-ONLY runner.
|
| 3 |
-
|
| 4 |
-
Serves the Gradio UI locally but runs ALL heavy generation on Modal's deployed
|
| 5 |
-
functions (no local GPU inference):
|
| 6 |
-
- story -> doodlebook-story / generate_story (services.story)
|
| 7 |
-
- images -> doodlebook-image-gen / generate_book_pages (services.images)
|
| 8 |
-
- voice -> doodlebook-tts / speak_book (services.tts)
|
| 9 |
-
|
| 10 |
-
Use this (not app.py) when you want to check real Modal output on this machine.
|
| 11 |
-
Run: python run_modal.py
|
| 12 |
-
"""
|
| 13 |
-
|
| 14 |
-
import io
|
| 15 |
-
import json
|
| 16 |
-
import time
|
| 17 |
-
import tempfile
|
| 18 |
-
import logging
|
| 19 |
-
|
| 20 |
-
import gradio as gr
|
| 21 |
-
from PIL import Image
|
| 22 |
-
|
| 23 |
-
from config import BASE_SEED, DEFAULT_VOICE
|
| 24 |
-
from book_builder import (
|
| 25 |
-
build_book_html, export_pdf, magic_loader_html,
|
| 26 |
-
build_coloring_html, export_coloring_pdf,
|
| 27 |
-
)
|
| 28 |
-
from ui.layout import create_layout
|
| 29 |
-
|
| 30 |
-
import services.story as story_svc
|
| 31 |
-
import services.images as image_svc
|
| 32 |
-
import services.tts as tts_svc
|
| 33 |
-
|
| 34 |
-
logging.basicConfig(level=logging.INFO)
|
| 35 |
-
logger = logging.getLogger("doodlebook.modal")
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
def _doodle_to_png_bytes(doodle_image):
|
| 39 |
-
"""Gradio numpy image -> PNG bytes (or None)."""
|
| 40 |
-
if doodle_image is None:
|
| 41 |
-
return None
|
| 42 |
-
buf = io.BytesIO()
|
| 43 |
-
Image.fromarray(doodle_image).save(buf, format="PNG")
|
| 44 |
-
return buf.getvalue()
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
def _with_heartbeat(blocking_fn, frame_fn, poll=4.0):
|
| 48 |
-
"""
|
| 49 |
-
Run blocking_fn() in a thread while keeping the Gradio stream alive.
|
| 50 |
-
|
| 51 |
-
A multi-minute Modal call (FLUX ~2-3 min, VoxCPM ~30-60s) blocks the
|
| 52 |
-
generator with no yield, so the browser's SSE stream goes silent, the
|
| 53 |
-
connection drops, and the UI shows "Error". This pumps frame_fn(elapsed)
|
| 54 |
-
into the stream every `poll` seconds until the work finishes.
|
| 55 |
-
|
| 56 |
-
Yields ("hb", <frame tuple>) heartbeats, then a final ("done", <return>).
|
| 57 |
-
Re-raises whatever blocking_fn raised.
|
| 58 |
-
"""
|
| 59 |
-
import threading
|
| 60 |
-
box = {}
|
| 61 |
-
|
| 62 |
-
def _run():
|
| 63 |
-
try:
|
| 64 |
-
box["val"] = blocking_fn()
|
| 65 |
-
except BaseException as e: # surfaced to the caller below
|
| 66 |
-
box["err"] = e
|
| 67 |
-
|
| 68 |
-
th = threading.Thread(target=_run, daemon=True)
|
| 69 |
-
th.start()
|
| 70 |
-
t0 = time.time()
|
| 71 |
-
while th.is_alive():
|
| 72 |
-
th.join(timeout=poll)
|
| 73 |
-
if th.is_alive():
|
| 74 |
-
yield ("hb", frame_fn(int(time.time() - t0)))
|
| 75 |
-
if "err" in box:
|
| 76 |
-
raise box["err"]
|
| 77 |
-
yield ("done", box["val"])
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
def create_book(doodle_image, character_name, theme, hero_name,
|
| 81 |
-
tiny_mode=False, voice=DEFAULT_VOICE, make_coloring=False):
|
| 82 |
-
"""Streaming book creation — everything heavy runs on Modal."""
|
| 83 |
-
t_total = time.perf_counter()
|
| 84 |
-
character_name = (character_name or "").strip() or "Little Hero"
|
| 85 |
-
hero_name = (hero_name or "").strip() or character_name
|
| 86 |
-
|
| 87 |
-
trace = {
|
| 88 |
-
"backend": "modal",
|
| 89 |
-
"hero_name": hero_name,
|
| 90 |
-
"theme": theme,
|
| 91 |
-
"voice": voice,
|
| 92 |
-
"tiny_mode": tiny_mode,
|
| 93 |
-
"make_coloring": make_coloring,
|
| 94 |
-
"seed": BASE_SEED,
|
| 95 |
-
"timestamp": time.strftime("%Y-%m-%d %H:%M:%S"),
|
| 96 |
-
}
|
| 97 |
-
|
| 98 |
-
_no = gr.update(visible=False)
|
| 99 |
-
_keep = gr.update() # no-op: leave the (fixed, always-visible) download buttons as-is
|
| 100 |
-
|
| 101 |
-
# ---- 1) STORY (Modal MiniCPM, else fast text template) ----
|
| 102 |
-
yield (magic_loader_html("story", hero_name),
|
| 103 |
-
"Writing the story…", None, _keep, {}, "",
|
| 104 |
-
json.dumps(trace, indent=2), _no, _keep)
|
| 105 |
-
|
| 106 |
-
t_story = time.perf_counter()
|
| 107 |
-
story = story_svc.generate_story(hero_name, theme)
|
| 108 |
-
trace["story_sec"] = round(time.perf_counter() - t_story, 2)
|
| 109 |
-
title = story.get("title", "Untitled Story")
|
| 110 |
-
char_desc = story.get("character_description", "")
|
| 111 |
-
pages = story.get("pages", [])
|
| 112 |
-
page_texts = [p.get("text", "") for p in pages]
|
| 113 |
-
scenes = [p.get("scene", "") for p in pages]
|
| 114 |
-
trace.update(title=title, character_description=char_desc)
|
| 115 |
-
|
| 116 |
-
# ---- 2) VOICE starts NOW, concurrently with images (it only needs the text,
|
| 117 |
-
# which is ready) so its ~30-60s overlaps the image render for free ----
|
| 118 |
-
import threading
|
| 119 |
-
voice_box = {}
|
| 120 |
-
full_text = f"{title}. {' '.join(page_texts)}"
|
| 121 |
-
t_voice = time.perf_counter()
|
| 122 |
-
|
| 123 |
-
def _do_voice():
|
| 124 |
-
try:
|
| 125 |
-
voice_box["bytes"] = tts_svc.speak_book(full_text, voice)
|
| 126 |
-
except Exception as e:
|
| 127 |
-
voice_box["err"] = e
|
| 128 |
-
|
| 129 |
-
voice_thread = threading.Thread(target=_do_voice, daemon=True)
|
| 130 |
-
voice_thread.start()
|
| 131 |
-
|
| 132 |
-
# ---- 3) IMAGES (Modal FLUX.2-klein — 6 pages rendered in PARALLEL) ----
|
| 133 |
-
yield (magic_loader_html("images", hero_name),
|
| 134 |
-
f"{title} — illustrating on Modal (FLUX)…",
|
| 135 |
-
None, _keep, story, "", json.dumps(trace, indent=2), _no, _keep)
|
| 136 |
-
|
| 137 |
-
doodle_bytes = _doodle_to_png_bytes(doodle_image)
|
| 138 |
-
img_bytes, engine = None, "sketch"
|
| 139 |
-
t_images = time.perf_counter()
|
| 140 |
-
for kind, payload in _with_heartbeat(
|
| 141 |
-
lambda: image_svc.generate_book_pages(
|
| 142 |
-
char_desc, scenes, doodle=doodle_bytes, seed=BASE_SEED, tiny=tiny_mode
|
| 143 |
-
),
|
| 144 |
-
lambda s: (magic_loader_html("images", hero_name),
|
| 145 |
-
f"{title} — illustrating… {s}s (voice recording in parallel)",
|
| 146 |
-
None, _keep, story, "", json.dumps(trace, indent=2), _no, _keep),
|
| 147 |
-
):
|
| 148 |
-
if kind == "hb":
|
| 149 |
-
yield payload
|
| 150 |
-
else:
|
| 151 |
-
img_bytes, engine = payload
|
| 152 |
-
trace["images_sec"] = round(time.perf_counter() - t_images, 2)
|
| 153 |
-
trace["engine"] = engine
|
| 154 |
-
if engine != "flux":
|
| 155 |
-
logger.warning("Image gen fell back to local sketch — Modal FLUX did not run.")
|
| 156 |
-
|
| 157 |
-
book_html = build_book_html(img_bytes, page_texts, title, engine)
|
| 158 |
-
|
| 159 |
-
# ---- 4) Collect the parallel VOICE result (usually already finished) ----
|
| 160 |
-
while voice_thread.is_alive(): # only loops if voice somehow outran images
|
| 161 |
-
voice_thread.join(timeout=4)
|
| 162 |
-
if voice_thread.is_alive():
|
| 163 |
-
yield (book_html, f"{title} — finishing narration…",
|
| 164 |
-
None, _keep, story, "", json.dumps(trace, indent=2), _no, _keep)
|
| 165 |
-
|
| 166 |
-
audio_path = None
|
| 167 |
-
trace["tts_sec"] = round(time.perf_counter() - t_voice, 2)
|
| 168 |
-
if voice_box.get("bytes"):
|
| 169 |
-
try:
|
| 170 |
-
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
|
| 171 |
-
tmp.write(voice_box["bytes"])
|
| 172 |
-
audio_path = tmp.name
|
| 173 |
-
except Exception as e:
|
| 174 |
-
logger.warning(f"writing audio failed: {e}")
|
| 175 |
-
elif "err" in voice_box:
|
| 176 |
-
logger.warning(f"TTS failed: {voice_box['err']}")
|
| 177 |
-
|
| 178 |
-
# ---- 4) PDFs ----
|
| 179 |
-
pdf_path = None
|
| 180 |
-
t_pdf = time.perf_counter()
|
| 181 |
-
try:
|
| 182 |
-
with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp:
|
| 183 |
-
pdf_path = export_pdf(img_bytes, page_texts, title, tmp.name)
|
| 184 |
-
except Exception as e:
|
| 185 |
-
logger.warning(f"PDF failed: {e}")
|
| 186 |
-
trace["pdf_sec"] = round(time.perf_counter() - t_pdf, 2)
|
| 187 |
-
|
| 188 |
-
# ---- 5) COLORING BOOK ----
|
| 189 |
-
coloring_html = ""
|
| 190 |
-
coloring_pdf_path = None
|
| 191 |
-
if make_coloring:
|
| 192 |
-
t_coloring = time.perf_counter()
|
| 193 |
-
outlines, coloring_engine = None, "failed"
|
| 194 |
-
for kind, payload in _with_heartbeat(
|
| 195 |
-
lambda: image_svc.generate_coloring_pages(
|
| 196 |
-
char_desc,
|
| 197 |
-
scenes,
|
| 198 |
-
doodle=doodle_bytes,
|
| 199 |
-
source_color_imgs=img_bytes,
|
| 200 |
-
seed=BASE_SEED,
|
| 201 |
-
tiny=tiny_mode,
|
| 202 |
-
),
|
| 203 |
-
lambda s: (
|
| 204 |
-
book_html,
|
| 205 |
-
f"{title} — building coloring book… {s}s",
|
| 206 |
-
audio_path,
|
| 207 |
-
_keep,
|
| 208 |
-
story,
|
| 209 |
-
"",
|
| 210 |
-
json.dumps(trace, indent=2),
|
| 211 |
-
_no,
|
| 212 |
-
_keep,
|
| 213 |
-
),
|
| 214 |
-
):
|
| 215 |
-
if kind == "hb":
|
| 216 |
-
yield payload
|
| 217 |
-
else:
|
| 218 |
-
outlines, coloring_engine = payload
|
| 219 |
-
try:
|
| 220 |
-
coloring_html = build_coloring_html(outlines, page_texts, title)
|
| 221 |
-
with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp:
|
| 222 |
-
coloring_pdf_path = export_coloring_pdf(outlines, page_texts, title, tmp.name)
|
| 223 |
-
trace["coloring_book"] = True
|
| 224 |
-
trace["coloring_engine"] = coloring_engine
|
| 225 |
-
except Exception as e:
|
| 226 |
-
logger.warning(f"Coloring book failed: {e}")
|
| 227 |
-
trace["coloring_sec"] = round(time.perf_counter() - t_coloring, 2)
|
| 228 |
-
|
| 229 |
-
trace["completed"] = True
|
| 230 |
-
trace["total_sec"] = round(time.perf_counter() - t_total, 2)
|
| 231 |
-
engine_label = "FLUX (Modal)" if engine == "flux" else "local sketch fallback"
|
| 232 |
-
# download buttons stay visible (fixed under the status); just attach the files
|
| 233 |
-
pdf_update = gr.update(value=pdf_path) if pdf_path else _keep
|
| 234 |
-
coloring_pdf_update = gr.update(value=coloring_pdf_path) if coloring_pdf_path else _keep
|
| 235 |
-
coloring_display_update = (gr.update(visible=True, value=coloring_html) if coloring_html
|
| 236 |
-
else gr.update(visible=False))
|
| 237 |
-
|
| 238 |
-
yield (
|
| 239 |
-
book_html,
|
| 240 |
-
f"Complete: {title} — {len(img_bytes)} pages · {engine_label} · voice: {voice} · total {trace['total_sec']}s",
|
| 241 |
-
audio_path,
|
| 242 |
-
pdf_update,
|
| 243 |
-
story,
|
| 244 |
-
f"Pages: {len(img_bytes)} | Seed: {BASE_SEED} | "
|
| 245 |
-
f"Mode: {'Tiny' if tiny_mode else 'Standard'} | Engine: {engine} | "
|
| 246 |
-
f"Story {trace.get('story_sec', 0)}s | Images {trace.get('images_sec', 0)}s | "
|
| 247 |
-
f"PDF {trace.get('pdf_sec', 0)}s | Coloring {trace.get('coloring_sec', 0)}s",
|
| 248 |
-
json.dumps(trace, indent=2),
|
| 249 |
-
coloring_display_update,
|
| 250 |
-
coloring_pdf_update,
|
| 251 |
-
)
|
| 252 |
-
|
| 253 |
-
|
| 254 |
-
if __name__ == "__main__":
|
| 255 |
-
import os
|
| 256 |
-
|
| 257 |
-
demo = create_layout(create_book_fn=create_book)
|
| 258 |
-
# Queue so a long (multi-minute) Modal generation doesn't make the whole app
|
| 259 |
-
# unresponsive: allow several concurrent sessions and never time a job out.
|
| 260 |
-
demo.queue(default_concurrency_limit=8, max_size=64)
|
| 261 |
-
# NOTE: when server_port is set, Gradio does NOT auto-pick a free port — it
|
| 262 |
-
# raises OSError and exits if the port is busy. start_app.bat frees the port
|
| 263 |
-
# before launching; if you run this directly, make sure 7880 is free first.
|
| 264 |
-
port = int(os.environ.get("DOODLEBOOK_PORT", "7880"))
|
| 265 |
-
# Bind all interfaces so both 127.0.0.1 and localhost (and LAN/phone) reach it.
|
| 266 |
-
try:
|
| 267 |
-
demo.launch(
|
| 268 |
-
server_name="0.0.0.0",
|
| 269 |
-
server_port=port,
|
| 270 |
-
inbrowser=False,
|
| 271 |
-
show_error=True,
|
| 272 |
-
max_threads=40,
|
| 273 |
-
# PDFs are written to the system temp dir; Gradio won't serve files
|
| 274 |
-
# outside its allowed paths, so the DownloadButton links 404'd and the
|
| 275 |
-
# button looked broken. Allow the temp dir so downloads actually work.
|
| 276 |
-
allowed_paths=[tempfile.gettempdir()],
|
| 277 |
-
)
|
| 278 |
-
except OSError as e:
|
| 279 |
-
logger.error(
|
| 280 |
-
f"Could not bind port {port}: {e}\n"
|
| 281 |
-
f" Something is already using it — likely a leftover DoodleBook "
|
| 282 |
-
f"instance (app.py on 7870, an old run_modal.py, or test_final.py).\n"
|
| 283 |
-
f" Fix: close the other window, or run: "
|
| 284 |
-
f"netstat -ano | findstr :{port} then taskkill /f /pid <PID>\n"
|
| 285 |
-
f" Then relaunch with start_app.bat (it frees the port automatically)."
|
| 286 |
-
)
|
| 287 |
-
raise SystemExit(1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
services/story.py
DELETED
|
@@ -1,71 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Story generation service — calls modal_story_gen for MiniCPM5-1B inference.
|
| 3 |
-
|
| 4 |
-
C2 Compliance: 3-layer JSON parser + template fallback.
|
| 5 |
-
"""
|
| 6 |
-
|
| 7 |
-
from config import STORY_MODEL, GENERATION_PARAMS
|
| 8 |
-
import os
|
| 9 |
-
import logging
|
| 10 |
-
|
| 11 |
-
logger = logging.getLogger(__name__)
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
def generate_story(hero_name: str, theme: str, age: int = None) -> dict:
|
| 15 |
-
"""
|
| 16 |
-
Generate a 6-page children's story.
|
| 17 |
-
|
| 18 |
-
Story runs LOCALLY by default: the local generator is instant and
|
| 19 |
-
theme-accurate, whereas the deployed MiniCPM5-1B was slower (T4 cold start)
|
| 20 |
-
and lower quality (the 1B model parroted the few-shot example). Set
|
| 21 |
-
DOODLEBOOK_STORY_MODAL=1 to route the story to the Modal MiniCPM model.
|
| 22 |
-
|
| 23 |
-
Args:
|
| 24 |
-
hero_name: Main character name
|
| 25 |
-
theme: Story theme (e.g., "brave adventure")
|
| 26 |
-
age: Target age (default from config)
|
| 27 |
-
|
| 28 |
-
Returns:
|
| 29 |
-
dict with keys: title, character_description, pages[{page, text, scene}]
|
| 30 |
-
"""
|
| 31 |
-
if age is None:
|
| 32 |
-
age = GENERATION_PARAMS.target_age
|
| 33 |
-
|
| 34 |
-
# 1) Real model on Modal (MiniCPM) — opt-in only
|
| 35 |
-
if os.environ.get("DOODLEBOOK_STORY_MODAL", "0") == "1":
|
| 36 |
-
try:
|
| 37 |
-
import modal
|
| 38 |
-
fn = modal.Function.from_name("doodlebook-story", "generate_story")
|
| 39 |
-
story = fn.remote(hero_name, theme, age)
|
| 40 |
-
if story and story.get("pages"):
|
| 41 |
-
logger.info("Story generated via Modal MiniCPM")
|
| 42 |
-
return story
|
| 43 |
-
except Exception as e:
|
| 44 |
-
logger.info(f"Modal story unavailable ({e}); using local generator")
|
| 45 |
-
|
| 46 |
-
# 2) Rich local generator (theme-accurate, varied) — DEFAULT
|
| 47 |
-
try:
|
| 48 |
-
from modal_workers.modal_story_gen import generate_story_local
|
| 49 |
-
return generate_story_local(hero_name, theme, age)
|
| 50 |
-
except Exception as e:
|
| 51 |
-
logger.error(f"Local story generation failed: {e}")
|
| 52 |
-
return _fallback_story(hero_name, theme, age)
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
def _fallback_story(hero_name: str, theme: str, age: int) -> dict:
|
| 56 |
-
"""
|
| 57 |
-
Deterministic template fallback (C2 Layer 3).
|
| 58 |
-
NEVER crashes - always returns valid 6-page book.
|
| 59 |
-
"""
|
| 60 |
-
return {
|
| 61 |
-
"title": f"{hero_name}'s {theme.title()}",
|
| 62 |
-
"character_description": f"A friendly character named {hero_name}, drawn in crayon style with bright colors, suitable for age {age}",
|
| 63 |
-
"pages": [
|
| 64 |
-
{"page": 1, "text": f"Once upon a time, there was a character named {hero_name}.", "scene": "Character introduction"},
|
| 65 |
-
{"page": 2, "text": f"{hero_name} loved going on adventures.", "scene": "Adventure begins"},
|
| 66 |
-
{"page": 3, "text": f"One day, {hero_name} discovered something magical.", "scene": "Discovery moment"},
|
| 67 |
-
{"page": 4, "text": f"With courage, {hero_name} faced the challenge.", "scene": "Challenge scene"},
|
| 68 |
-
{"page": 5, "text": f"Friends helped {hero_name} succeed.", "scene": "Teamwork scene"},
|
| 69 |
-
{"page": 6, "text": f"And they all lived happily ever after. The end.", "scene": "Happy ending"}
|
| 70 |
-
]
|
| 71 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
services/trace.py
DELETED
|
@@ -1,113 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Trace logging service — publishes generation metadata to HF Dataset (Open Trace badge).
|
| 3 |
-
|
| 4 |
-
Logs prompts, seeds, and LoRA version for reproducibility.
|
| 5 |
-
"""
|
| 6 |
-
|
| 7 |
-
from typing import Optional
|
| 8 |
-
import json
|
| 9 |
-
import logging
|
| 10 |
-
import os
|
| 11 |
-
from datetime import datetime
|
| 12 |
-
|
| 13 |
-
logger = logging.getLogger(__name__)
|
| 14 |
-
|
| 15 |
-
TRACE_DATASET = "build-small-hackathon/doodlebook-traces"
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
def log_trace(
|
| 19 |
-
hero_name: str,
|
| 20 |
-
theme: str,
|
| 21 |
-
story: dict,
|
| 22 |
-
seed: int,
|
| 23 |
-
lora_version: Optional[str] = None,
|
| 24 |
-
tiny_mode: bool = False,
|
| 25 |
-
character_description: str = "",
|
| 26 |
-
art_style: str = "crayon drawing, children's book"
|
| 27 |
-
) -> str:
|
| 28 |
-
"""
|
| 29 |
-
Log generation trace to HuggingFace Dataset.
|
| 30 |
-
|
| 31 |
-
Creates a row in the trace dataset with all generation parameters
|
| 32 |
-
for reproducibility (Open Trace badge).
|
| 33 |
-
|
| 34 |
-
Args:
|
| 35 |
-
hero_name: Character name used
|
| 36 |
-
theme: Story theme
|
| 37 |
-
story: Generated story dict
|
| 38 |
-
seed: Seed used for generation
|
| 39 |
-
lora_version: LoRA model version (if used)
|
| 40 |
-
tiny_mode: Whether Tiny Mode was used
|
| 41 |
-
character_description: Character description used
|
| 42 |
-
art_style: Art style used
|
| 43 |
-
|
| 44 |
-
Returns:
|
| 45 |
-
Dataset URL
|
| 46 |
-
"""
|
| 47 |
-
trace = {
|
| 48 |
-
"timestamp": datetime.now().isoformat(),
|
| 49 |
-
"hero_name": hero_name,
|
| 50 |
-
"theme": theme,
|
| 51 |
-
"title": story.get("title", ""),
|
| 52 |
-
"character_description": character_description or story.get("character_description", ""),
|
| 53 |
-
"art_style": art_style,
|
| 54 |
-
"seed": seed,
|
| 55 |
-
"lora_version": lora_version or "none",
|
| 56 |
-
"tiny_mode": tiny_mode,
|
| 57 |
-
"num_pages": len(story.get("pages", [])),
|
| 58 |
-
"pages": story.get("pages", []),
|
| 59 |
-
"models": {
|
| 60 |
-
"image": "black-forest-labs/FLUX.2-klein-4B",
|
| 61 |
-
"story": "openbmb/MiniCPM5-1B",
|
| 62 |
-
"tts": "openbmb/VoxCPM2"
|
| 63 |
-
}
|
| 64 |
-
}
|
| 65 |
-
|
| 66 |
-
# Save trace locally
|
| 67 |
-
trace_dir = "traces"
|
| 68 |
-
os.makedirs(trace_dir, exist_ok=True)
|
| 69 |
-
trace_file = os.path.join(trace_dir, f"trace_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json")
|
| 70 |
-
|
| 71 |
-
with open(trace_file, "w", encoding="utf-8") as f:
|
| 72 |
-
json.dump(trace, f, indent=2, ensure_ascii=False)
|
| 73 |
-
|
| 74 |
-
logger.info(f"Trace saved: {trace_file}")
|
| 75 |
-
|
| 76 |
-
# Try to upload to HF Dataset
|
| 77 |
-
try:
|
| 78 |
-
return _upload_to_hf_dataset(trace)
|
| 79 |
-
except Exception as e:
|
| 80 |
-
logger.warning(f"HF Dataset upload failed: {e}. Trace saved locally.")
|
| 81 |
-
return f"Local: {trace_file}"
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
def _upload_to_hf_dataset(trace: dict) -> str:
|
| 85 |
-
"""Upload trace to HuggingFace Dataset."""
|
| 86 |
-
try:
|
| 87 |
-
from huggingface_hub import HfApi
|
| 88 |
-
|
| 89 |
-
api = HfApi()
|
| 90 |
-
|
| 91 |
-
# Check if dataset exists, create if not
|
| 92 |
-
try:
|
| 93 |
-
api.dataset_info(TRACE_DATASET)
|
| 94 |
-
except Exception:
|
| 95 |
-
api.create_repo(TRACE_DATASET, repo_type="dataset", exist_ok=True)
|
| 96 |
-
|
| 97 |
-
# Upload trace as JSON
|
| 98 |
-
trace_json = json.dumps(trace, indent=2, ensure_ascii=False)
|
| 99 |
-
filename = f"trace_{trace['timestamp'].replace(':', '-').replace('.', '-')}.json"
|
| 100 |
-
|
| 101 |
-
api.upload_file(
|
| 102 |
-
path_or_fileobj=trace_json.encode(),
|
| 103 |
-
path_in_repo=filename,
|
| 104 |
-
repo_id=TRACE_DATASET,
|
| 105 |
-
repo_type="dataset",
|
| 106 |
-
commit_message=f"Log trace for {trace['hero_name']}"
|
| 107 |
-
)
|
| 108 |
-
|
| 109 |
-
return f"https://huggingface.co/datasets/{TRACE_DATASET}/blob/main/{filename}"
|
| 110 |
-
|
| 111 |
-
except ImportError:
|
| 112 |
-
logger.warning("huggingface_hub not installed")
|
| 113 |
-
return "Trace saved locally (no HF upload)"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
services/tts.py
DELETED
|
@@ -1,50 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
TTS service — calls modal_tts for VoxCPM2 narration.
|
| 3 |
-
|
| 4 |
-
C5 Compliance: Fallback chain
|
| 5 |
-
- Primary: VoxCPM2 (2B, Apache 2.0)
|
| 6 |
-
- Fallback 1: Kokoro-82M (ultra-lightweight)
|
| 7 |
-
- Fallback 2: MeloTTS (MIT license)
|
| 8 |
-
"""
|
| 9 |
-
|
| 10 |
-
from config import TTS_MODEL, GENERATION_PARAMS
|
| 11 |
-
import logging
|
| 12 |
-
|
| 13 |
-
logger = logging.getLogger(__name__)
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
def speak_book(text: str, voice: str = "kid") -> bytes:
|
| 17 |
-
"""
|
| 18 |
-
Generate narration audio via VoxCPM2 on Modal.
|
| 19 |
-
|
| 20 |
-
C5 Compliance: Falls back to local TTS if Modal fails.
|
| 21 |
-
|
| 22 |
-
Args:
|
| 23 |
-
text: Full text to narrate (title + all pages)
|
| 24 |
-
voice: Voice style (warm, friendly, etc.)
|
| 25 |
-
|
| 26 |
-
Returns:
|
| 27 |
-
WAV audio bytes
|
| 28 |
-
"""
|
| 29 |
-
try:
|
| 30 |
-
# Try Modal (real VoxCPM2) — looks up the DEPLOYED function on Modal cloud
|
| 31 |
-
import modal
|
| 32 |
-
fn = modal.Function.from_name("doodlebook-tts", "speak_book")
|
| 33 |
-
return fn.remote(text, voice)
|
| 34 |
-
except Exception as e:
|
| 35 |
-
logger.warning(f"Modal TTS unavailable: {e}, using local fallback")
|
| 36 |
-
return speak_book_local(text, voice)
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
def speak_book_local(text: str, voice: str = "warm") -> bytes:
|
| 40 |
-
"""
|
| 41 |
-
Local TTS for testing (no Modal required).
|
| 42 |
-
Uses MeloTTS or returns silent WAV placeholder.
|
| 43 |
-
"""
|
| 44 |
-
try:
|
| 45 |
-
from modal_workers.modal_tts import speak_book_local as local_tts
|
| 46 |
-
return local_tts(text, voice)
|
| 47 |
-
except Exception as e:
|
| 48 |
-
logger.warning(f"Local TTS failed: {e}, returning placeholder")
|
| 49 |
-
from modal_workers.modal_tts import _generate_silent_wav
|
| 50 |
-
return _generate_silent_wav(duration_seconds=5)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
start_app.bat
DELETED
|
@@ -1,33 +0,0 @@
|
|
| 1 |
-
@echo off
|
| 2 |
-
REM ============================================================
|
| 3 |
-
REM DoodleBook launcher — SINGLE instance, FIXED port.
|
| 4 |
-
REM Double-click this file to start. Close this window to quit.
|
| 5 |
-
REM
|
| 6 |
-
REM Why this matters: Gradio is given an explicit port. If that
|
| 7 |
-
REM port is still held (a killed instance's socket in TIME_WAIT,
|
| 8 |
-
REM an orphaned python, or app.py/test_final.py running too),
|
| 9 |
-
REM Gradio CRASHES on startup instead of picking another port.
|
| 10 |
-
REM The old loop then relaunched into the same crash forever and
|
| 11 |
-
REM the browser showed "Connection lost. Attempting reconnection".
|
| 12 |
-
REM So: free the port FIRST, then launch.
|
| 13 |
-
REM ============================================================
|
| 14 |
-
cd /d "%~dp0"
|
| 15 |
-
set PYTHONUTF8=1
|
| 16 |
-
set PYTHONIOENCODING=utf-8
|
| 17 |
-
set DOODLEBOOK_PORT=7880
|
| 18 |
-
|
| 19 |
-
:loop
|
| 20 |
-
echo.
|
| 21 |
-
echo === Freeing port %DOODLEBOOK_PORT% (killing any old/stray instance) ===
|
| 22 |
-
for /f "tokens=5" %%a in ('netstat -ano ^| findstr ":%DOODLEBOOK_PORT% " ^| findstr LISTENING') do (
|
| 23 |
-
echo killing PID %%a holding port %DOODLEBOOK_PORT%
|
| 24 |
-
taskkill /f /pid %%a >nul 2>&1
|
| 25 |
-
)
|
| 26 |
-
|
| 27 |
-
echo === Starting DoodleBook on http://127.0.0.1:%DOODLEBOOK_PORT%/ ===
|
| 28 |
-
echo === Open EXACTLY that URL in your browser (not 7860/7870) ===
|
| 29 |
-
python run_modal.py
|
| 30 |
-
echo.
|
| 31 |
-
echo === Server stopped (exit code %errorlevel%). Restarting in 3 seconds... (close this window to quit) ===
|
| 32 |
-
timeout /t 3 /nobreak >nul
|
| 33 |
-
goto loop
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ui/layout.py
CHANGED
|
@@ -618,11 +618,6 @@ def create_layout(load_sample_fn=None, create_book_fn=None):
|
|
| 618 |
value=False,
|
| 619 |
elem_classes=["tiny-toggle"],
|
| 620 |
)
|
| 621 |
-
tiny_mode = gr.Checkbox(
|
| 622 |
-
label="Tiny Mode — faster, runs on small GPUs",
|
| 623 |
-
value=False,
|
| 624 |
-
elem_classes=["tiny-toggle"],
|
| 625 |
-
)
|
| 626 |
make_btn = gr.Button(
|
| 627 |
"Make my book!",
|
| 628 |
variant="primary",
|
|
@@ -714,7 +709,7 @@ FLUX is the printer. **Tiny Titan.**
|
|
| 714 |
if create_book_fn:
|
| 715 |
make_btn.click(
|
| 716 |
fn=create_book_fn,
|
| 717 |
-
inputs=[doodle, char_name, theme, hero_name,
|
| 718 |
outputs=[book_display, status, audio_narration, pdf_download,
|
| 719 |
story_info, image_info, trace_info,
|
| 720 |
coloring_display, coloring_pdf_download],
|
|
|
|
| 618 |
value=False,
|
| 619 |
elem_classes=["tiny-toggle"],
|
| 620 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 621 |
make_btn = gr.Button(
|
| 622 |
"Make my book!",
|
| 623 |
variant="primary",
|
|
|
|
| 709 |
if create_book_fn:
|
| 710 |
make_btn.click(
|
| 711 |
fn=create_book_fn,
|
| 712 |
+
inputs=[doodle, char_name, theme, hero_name, voice, make_coloring],
|
| 713 |
outputs=[book_display, status, audio_narration, pdf_download,
|
| 714 |
story_info, image_info, trace_info,
|
| 715 |
coloring_display, coloring_pdf_download],
|