Fix ZeroGPU build: real FLUX+VoxCPM, 3-model cleanup, remove Tiny Mode

#1
by sush0401 - opened
AGENT_HANDOFF.md DELETED
@@ -1,105 +0,0 @@
1
- # DoodleBook — Coding-Agent Handoff Prompt
2
- Paste everything in the fenced block below into Codex / OpenCode / Cursor Agent / Claude Code as the build task.
3
- It is self-contained and bakes in the 5 critical corrections (C1–C5) from `EXECUTION_PLAN.md`.
4
-
5
- ---
6
-
7
- ```
8
- ROLE
9
- You are the lead engineer building "DoodleBook" for the Build Small Hackathon 2026
10
- (Adventure in Thousand Token Wood track). Build a Gradio app deployed to Hugging Face
11
- Spaces that turns a child's crayon drawing into a consistent, narrated, illustrated
12
- 6-page storybook. Work in: D:\Project\Hugging_face_app\doodlebook
13
-
14
- CONTEXT FILES (read first, in this order)
15
- 1. B2_doodlebook_prompt.md — original concept, exact code sketches, frontmatter, badges
16
- 2. EXECUTION_PLAN.md — architecture, phases, and the 5 corrections you MUST honor
17
-
18
- NON-NEGOTIABLE CORRECTIONS (override the original prompt where they conflict)
19
- C1 Do NOT train a LoRA per child at runtime (infeasible). Train ONE crayon-style LoRA
20
- OFFLINE. Achieve per-character consistency at inference via: (a) locked seed S, page i
21
- uses seed S+i; (b) reuse the identical character_description on every page; (c) feed the
22
- uploaded doodle as an IMAGE PROMPT (IP-Adapter / FLUX Redux / img2img strength ~0.3-0.5).
23
- The app MUST run on base FLUX with NO LoRA (degrade gracefully, show "LoRA coming").
24
- C2 MiniCPM5-1B is unreliable at JSON. Use a few-shot prompt with ONE full exemplar, greedy
25
- decode, and a 3-layer parser: (1) regex extract {...}; (2) json-repair/json5; (3) a
26
- deterministic TEMPLATE fallback that always yields a valid 6-page book. App must NEVER
27
- crash on bad model output.
28
- C3 Modal cold-starts of a 12B diffusion model take minutes. Cache weights on a Modal Volume,
29
- expose a keep_warm option, generate all 6 pages in ONE warm container call, and ALWAYS
30
- ship a pre-generated sample book in assets/sample_book/ that loads instantly with zero compute.
31
- C4 This is a SMALL-MODELS hackathon. Frame the 1B story + 2B voice as the "brain" and FLUX as
32
- the "renderer" in the README (Tiny Titan argument). Implement a real "Tiny Mode" toggle that
33
- swaps FLUX for an SD-Turbo/SDXL-Turbo + style-LoRA path (1-4 steps) runnable on a T4/edge GPU.
34
- C5 Treat ALL model IDs as UNVERIFIED. FIRST TASK: verify each on the HF Hub and put the resolved
35
- IDs + fallbacks in config.py. Fallbacks: FLUX.2-klein -> FLUX.1-schnell; MiniCPM5-1B ->
36
- MiniCPM3-4B; VoxCPM2 -> Kokoro or MeloTTS. Nothing else imports a raw model string.
37
-
38
- TECH STACK
39
- Gradio 5.x (gr.Blocks, custom storybook CSS) on HF Spaces (CPU) · Modal for all GPU compute
40
- (FLUX on A100, MiniCPM on T4, TTS on T4/A10G) · diffusers · peft · transformers · fpdf2 ·
41
- Python 3.11. Space is a thin orchestrator; every heavy call is modal.Function.remote().
42
-
43
- DELIVERABLES (build in this dependency order; commit after each phase)
44
- PHASE 1 Foundation
45
- - Verify model IDs on HF Hub; create config.py = single source of truth (IDs, fallbacks,
46
- seeds, step counts, dimensions, lora repo, dataset repo).
47
- - Scaffold the directory structure exactly as in B2_doodlebook_prompt.md plus:
48
- config.py, services/, ui/, modal/.
49
- - requirements.txt, .env.example (HF_TOKEN, MODAL endpoint/token).
50
- - Bare Gradio shell that launches and displays the static sample book.
51
- PHASE 2 Core text pipeline
52
- - modal_story_gen.py: MiniCPM story -> JSON {title, character_description, pages:[{page,text,scene}]}
53
- with the C2 3-layer parser + template fallback.
54
- - book_builder.py: pages -> storybook HTML; PDF export via fpdf2 (gr.DownloadButton).
55
- - assets/custom.css storybook styling. Wire text-only book end to end.
56
- PHASE 3 AI integration
57
- - modal_image_gen.py: FLUX pipeline; generate_book_pages() makes all 6 pages in one warm
58
- container; C1 consistency stack (seed-lock + char_desc reuse + doodle image-prompt);
59
- optional LoRA fuse (~0.85) with graceful base-model fallback; Modal Volume cache; keep_warm.
60
- - modal_tts.py: VoxCPM2 narration of title+page texts -> wav, with fallback voice (C5).
61
- - Full pipeline: doodle -> story -> 6 images -> narration -> assembled book.
62
- PHASE 4 UX
63
- - Convert create_book to a GENERATOR (yield) so status + pages stream in page-by-page
64
- ("Illustrating page N of 6..."). Add "Behind the magic" gr.Accordion (prompts/seeds/LoRA).
65
- - Tiny Mode toggle (C4). Mobile-responsive CSS. Accessibility: alt text = page text on every
66
- image, AA contrast, prefers-reduced-motion. Examples auto-load the sample on launch.
67
- PHASE 5 Optimization & badges
68
- - lora_finetune/: train_lora.py (DreamBooth-style, FLUX, rank16 alpha16, crayon style,
69
- trigger [DOODLECHAR]), dataset_prep.py, README.md (reproduce steps). Publish LoRA to HF
70
- (Well-Tuned). bf16/turbo settings; CPU-offload fallback for smaller GPUs.
71
- - services/trace.py: log prompts/seeds/lora-version to HF dataset build-small-hackathon/
72
- doodlebook-traces (Open Trace).
73
- - Pre-generate and COMMIT the 6-page sample book to assets/sample_book/ (C3 non-negotiable).
74
- PHASE 6 Submission
75
- - README.md with the EXACT frontmatter from B2_doodlebook_prompt.md + Tiny Titan argument +
76
- model table + architecture diagram + install/usage + screenshots + demo + reproducibility +
77
- badges + license (Apache-2.0).
78
- - Field Notes blog draft (docs/blog.md) on FLUX+LoRA character consistency.
79
- - Deploy to HF Spaces; smoke test the cold-open judge path (sample loads with no compute);
80
- verify live generation; run the §9 submission checklist from EXECUTION_PLAN.md.
81
-
82
- INTERNAL CONTRACTS (keep stable)
83
- generate_story(hero_name, theme, age=5) -> {title, character_description, pages:[{page,text,scene}]}
84
- generate_book_pages(character_desc, story_beats, doodle=None, art_style, seed=42, tiny=False) -> list[bytes png]
85
- speak_book(text, voice="warm") -> bytes wav
86
- build_book_html(images, texts, title) -> html ; export_pdf(images, texts, title) -> path
87
- log_trace(payload) -> dataset_url
88
-
89
- ENGINEERING RULES
90
- - Every remote/model call wrapped in try/except with a user-friendly fallback; the app must
91
- never show a stack trace to a judge.
92
- - config.py is the ONLY place model IDs/params live.
93
- - Keep the sample-book path 100% independent of live compute so the demo always works.
94
- - Match the storybook visual identity (paper #FEF9E7, page #FFFDE7, ink #3E2723, CTA #FF7043,
95
- serif body). No default Gradio look (Off-Brand badge).
96
- - Prefer small, readable modules over clever code. Comment the consistency logic and the parser.
97
-
98
- DEFINITION OF DONE
99
- Public HF Space loads the sample book instantly; a live "Make my book!" produces a consistent,
100
- narrated 6-page book in <2 min warm; Tiny Mode works on a cheap GPU; LoRA repo + trace dataset
101
- are public; README + frontmatter + blog published; submission checklist fully ticked.
102
-
103
- START NOW with Phase 1, Task 0: verify the model IDs on the HF Hub and write config.py. Report
104
- what you find before proceeding.
105
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
B2_doodlebook_prompt.md DELETED
@@ -1,406 +0,0 @@
1
- # B2 — "DoodleBook" | Claude Code Build Prompt
2
- ## Build Small Hackathon 2026 | Thousand Token Wood Track
3
-
4
- ---
5
-
6
- ## Mission
7
- A child draws a crayon character. You photograph it. The app turns it into a consistent illustrated 6-page storybook — same character, same art style, across every page — narrated by MiniCPM5-1B and illustrated by a fine-tuned FLUX.2-klein LoRA. Nobody else in this competition is using FLUX.2-klein with a custom LoRA. That's the entire competitive moat.
8
-
9
- ---
10
-
11
- ## Models (ONLY sponsor models)
12
-
13
- | Role | Model ID | Params | Sponsor |
14
- |---|---|---|---|
15
- | Image generation | `black-forest-labs/FLUX.2-klein` | ~12B | Black Forest Labs |
16
- | LoRA fine-tune (character consistency) | Custom LoRA on FLUX.2-klein | — | Black Forest Labs |
17
- | Story generation | `openbmb/MiniCPM5-1B` | 1B | OpenBMB |
18
- | Voice narration | `openbmb/VoxCPM2` | 2B | OpenBMB |
19
-
20
- Total: ~15B — under 32B cap.
21
-
22
- **FLUX.2-klein model ID note:** Verify the exact HF model card ID at `https://huggingface.co/black-forest-labs`. At time of writing, check for `black-forest-labs/FLUX.2-klein` or `black-forest-labs/FLUX.1-schnell` as fallback. The hackathon docs specifically name FLUX.2-klein.
23
-
24
- **Tiny Titan note:** Story model (MiniCPM5-1B) is 1B. If you argue the "primary AI" is the story generator (not the image model), you could claim Tiny Titan. Make this argument in the README.
25
-
26
- ---
27
-
28
- ## Badge stack
29
-
30
- | Badge | How |
31
- |---|---|
32
- | ✅ Well-Tuned | Fine-tune LoRA on FLUX.2-klein for character consistency; publish to HF |
33
- | ✅ Off-Brand | Custom storybook UI — not remotely like default Gradio |
34
- | ✅ Field Notes | Blog post about FLUX.2-klein + LoRA character consistency approach |
35
- | ✅ Open Trace | Publish generation traces (prompts, seeds, LoRA weights used) |
36
-
37
- ---
38
-
39
- ## Tech stack
40
- - **Gradio 5.x** with `gr.Server` for storybook-style custom UI
41
- - **Modal** for FLUX.2-klein image generation (A100 recommended — diffusion is memory-heavy)
42
- - **Modal** for MiniCPM5-1B story generation + VoxCPM2 TTS
43
- - **diffusers** library for FLUX pipeline
44
- - **peft** for LoRA loading
45
- - **Python 3.11**
46
-
47
- ---
48
-
49
- ## Directory structure
50
- ```
51
- doodlebook/
52
- ├── app.py # Gradio entry point
53
- ├── modal_image_gen.py # FLUX.2-klein + LoRA generation on Modal
54
- ├── modal_story_gen.py # MiniCPM5-1B story generation on Modal
55
- ├── modal_tts.py # VoxCPM2 TTS on Modal
56
- ├── book_builder.py # Assembles pages into storybook HTML
57
- ├── lora_finetune/
58
- │ ├── train_lora.py # FLUX LoRA training script (run locally)
59
- │ ├── dataset_prep.py # Prepare character images for training
60
- │ └── README.md # How to reproduce the fine-tune
61
- ├── requirements.txt
62
- ├── .env.example # MODAL_ENDPOINT_URL, HF_TOKEN
63
- ├── README.md
64
- └── assets/
65
- ├── custom.css # Storybook CSS (yellowed pages, serif font)
66
- ├── page_template.html # Single page HTML template
67
- ├── sample_doodle.jpg # Example child's drawing
68
- └── sample_book/ # Pre-generated example book (6 pages)
69
- ├── page_1.png
70
- └── ...
71
- ```
72
-
73
- ---
74
-
75
- ## README.md — EXACT frontmatter
76
- ```yaml
77
- ---
78
- title: DoodleBook
79
- emoji: 📚
80
- colorFrom: yellow
81
- colorTo: orange
82
- sdk: gradio
83
- sdk_version: "5.0"
84
- app_file: app.py
85
- pinned: false
86
- tags:
87
- - hackathon
88
- - build-small
89
- - adventure-in-thousand-token-wood
90
- - black-forest-labs/FLUX.2-klein
91
- - openbmb/MiniCPM5-1B
92
- - openbmb/VoxCPM2
93
- - fine-tuned
94
- - lora
95
- - character-consistency
96
- - storybook
97
- - off-brand
98
- ---
99
- ```
100
-
101
- ---
102
-
103
- ## LoRA fine-tuning plan (run BEFORE submission)
104
-
105
- ### Goal
106
- Train a LoRA that makes FLUX.2-klein reproduce the visual style of a child's crayon drawing and maintain character consistency across 6 different scene prompts.
107
-
108
- ### Training data strategy
109
- 1. Take a child's crayon drawing (or generate 10-15 "crayon-style" reference images)
110
- 2. Create variations: same character in different poses/scenes, keeping style consistent
111
- 3. Use DreamBooth-style fine-tuning with a trigger token: `[DOODLECHAR]`
112
-
113
- ### Train script sketch (lora_finetune/train_lora.py)
114
- ```python
115
- from diffusers import FluxPipeline
116
- from peft import LoraConfig, get_peft_model
117
- # Use diffusers DreamBooth LoRA training
118
- # Follow: https://github.com/huggingface/diffusers/tree/main/examples/dreambooth
119
- # Target: FLUX.2-klein with rank=16, alpha=16
120
- # Training images: 10-15 images of the character
121
- # Instance prompt: "photo of [DOODLECHAR] character, crayon drawing style"
122
- # Epochs: 200-400 steps (fast with FLUX)
123
- ```
124
-
125
- After training:
126
- ```bash
127
- huggingface-cli upload build-small-hackathon/doodlebook-flux-lora ./lora-weights
128
- ```
129
-
130
- ---
131
-
132
- ## modal_image_gen.py
133
- ```python
134
- import modal
135
- app = modal.App("doodlebook-image-gen")
136
-
137
- flux_env = modal.Image.debian_slim().pip_install(
138
- "diffusers>=0.28", "torch", "accelerate", "transformers",
139
- "peft", "pillow", "sentencepiece"
140
- )
141
-
142
- @app.function(gpu="A100", image=flux_env, timeout=300, memory=32768)
143
- def generate_page(
144
- prompt: str,
145
- lora_repo: str = "build-small-hackathon/doodlebook-flux-lora",
146
- seed: int = 42,
147
- width: int = 768,
148
- height: int = 512
149
- ) -> bytes:
150
- from diffusers import FluxPipeline
151
- import torch, io
152
- from PIL import Image
153
-
154
- pipe = FluxPipeline.from_pretrained(
155
- "black-forest-labs/FLUX.2-klein", # verify ID on HF Hub
156
- torch_dtype=torch.bfloat16
157
- ).to("cuda")
158
-
159
- # Load character LoRA for consistency
160
- pipe.load_lora_weights(lora_repo)
161
- pipe.fuse_lora(lora_scale=0.85)
162
-
163
- generator = torch.Generator("cuda").manual_seed(seed)
164
- image = pipe(
165
- prompt=prompt,
166
- num_inference_steps=20, # FLUX.2-klein is fast
167
- guidance_scale=3.5,
168
- width=width,
169
- height=height,
170
- generator=generator
171
- ).images[0]
172
-
173
- buf = io.BytesIO()
174
- image.save(buf, format="PNG")
175
- return buf.getvalue()
176
-
177
- @app.function(gpu="A100", image=flux_env, timeout=300, memory=32768)
178
- def generate_book_pages(
179
- character_desc: str,
180
- story_beats: list[str],
181
- art_style: str = "crayon drawing, children's book, colorful, simple shapes",
182
- seed: int = 42
183
- ) -> list[bytes]:
184
- """Generate all 6 pages in one function call to reuse the loaded model."""
185
- from diffusers import FluxPipeline
186
- import torch, io
187
-
188
- pipe = FluxPipeline.from_pretrained(
189
- "black-forest-labs/FLUX.2-klein",
190
- torch_dtype=torch.bfloat16
191
- ).to("cuda")
192
- pipe.load_lora_weights("build-small-hackathon/doodlebook-flux-lora")
193
- pipe.fuse_lora(lora_scale=0.85)
194
-
195
- pages = []
196
- for i, beat in enumerate(story_beats):
197
- prompt = (
198
- f"[DOODLECHAR] {character_desc}, {beat}, "
199
- f"{art_style}, page {i+1} of children's book, "
200
- f"white background, simple illustration"
201
- )
202
- gen = torch.Generator("cuda").manual_seed(seed + i) # deterministic per page
203
- image = pipe(
204
- prompt=prompt,
205
- num_inference_steps=20,
206
- guidance_scale=3.5,
207
- width=768, height=512,
208
- generator=gen
209
- ).images[0]
210
- buf = io.BytesIO()
211
- image.save(buf, format="PNG")
212
- pages.append(buf.getvalue())
213
-
214
- return pages
215
- ```
216
-
217
- ---
218
-
219
- ## modal_story_gen.py
220
- ```python
221
- import modal
222
- app = modal.App("doodlebook-story")
223
-
224
- story_env = modal.Image.debian_slim().pip_install(
225
- "transformers>=4.40", "torch", "accelerate", "sentencepiece"
226
- )
227
-
228
- @app.function(gpu="T4", image=story_env, timeout=120)
229
- def generate_story(character_name: str, theme: str, age: int = 5) -> dict:
230
- from transformers import AutoTokenizer, AutoModelForCausalLM
231
- import torch, json
232
-
233
- tok = AutoTokenizer.from_pretrained("openbmb/MiniCPM5-1B")
234
- model = AutoModelForCausalLM.from_pretrained(
235
- "openbmb/MiniCPM5-1B", torch_dtype=torch.float16
236
- ).cuda().eval()
237
-
238
- prompt = f"""Write a 6-page children's storybook for age {age} about {character_name} with theme: {theme}.
239
-
240
- Return ONLY valid JSON:
241
- {{
242
- "title": "Book title",
243
- "character_description": "Visual description of {character_name} for illustration",
244
- "pages": [
245
- {{"page": 1, "text": "1-2 sentence page text (age {age})", "scene": "visual scene description for illustrator"}},
246
- {{"page": 2, ...}},
247
- {{"page": 3, ...}},
248
- {{"page": 4, ...}},
249
- {{"page": 5, ...}},
250
- {{"page": 6, "text": "Gentle ending. Goodnight.", "scene": "closing scene"}}
251
- ]
252
- }}"""
253
-
254
- inputs = tok(prompt, return_tensors="pt").to("cuda")
255
- with torch.no_grad():
256
- out = model.generate(**inputs, max_new_tokens=800, do_sample=False)
257
- response = tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
258
-
259
- import re
260
- match = re.search(r'\{.*\}', response, re.DOTALL)
261
- if match:
262
- return json.loads(match.group())
263
- return {"error": response}
264
- ```
265
-
266
- ---
267
-
268
- ## book_builder.py — storybook HTML assembler
269
- ```python
270
- import base64
271
-
272
- PAGE_HTML = """
273
- <div class="book-page" style="page-break-after: always;">
274
- <img src="data:image/png;base64,{img_b64}" style="width:100%; border-radius:8px;"/>
275
- <p class="page-text">{text}</p>
276
- <span class="page-num">{page_num}</span>
277
- </div>
278
- """
279
-
280
- def build_book_html(pages_images: list[bytes], pages_texts: list[str], title: str) -> str:
281
- pages_html = ""
282
- for i, (img_bytes, text) in enumerate(zip(pages_images, pages_texts)):
283
- b64 = base64.b64encode(img_bytes).decode()
284
- pages_html += PAGE_HTML.format(img_b64=b64, text=text, page_num=i+1)
285
-
286
- return f"""<div class="book-container">
287
- <h1 class="book-title">{title}</h1>
288
- {pages_html}
289
- </div>"""
290
- ```
291
-
292
- ---
293
-
294
- ## app.py — full Gradio storybook UI
295
- ```python
296
- import gradio as gr
297
- from modal_story_gen import generate_story
298
- from modal_image_gen import generate_book_pages
299
- from modal_tts import speak_book
300
- from book_builder import build_book_html
301
- import json
302
-
303
- THEMES = ["brave adventure", "making a new friend", "overcoming a fear",
304
- "helping someone", "lost and found", "learning something new"]
305
-
306
- CSS = """
307
- body { background: #fef9e7; font-family: 'Georgia', serif; }
308
- .book-container { max-width: 800px; margin: 0 auto; }
309
- .book-title { font-size: 32px; text-align: center; color: #5d4037; }
310
- .book-page { margin: 24px 0; padding: 20px; background: #fffde7;
311
- border-radius: 12px; box-shadow: 3px 3px 12px rgba(0,0,0,0.15); }
312
- .page-text { font-size: 22px; line-height: 1.9; color: #3e2723; text-align: center; }
313
- .page-num { color: #bcaaa4; font-size: 14px; }
314
- .gr-button-primary { background: #ff7043 !important; font-size: 20px; }
315
- """
316
-
317
- def create_book(doodle_image, character_name, theme, hero_name):
318
- if not character_name.strip():
319
- character_name = "Little Hero"
320
- if not hero_name.strip():
321
- hero_name = character_name
322
-
323
- # Step 1: Generate story
324
- story = generate_story.remote(hero_name, theme, age=5)
325
- if "error" in story:
326
- return None, f"Story generation failed: {story['error']}", None
327
-
328
- pages = story["pages"]
329
- char_desc = story["character_description"]
330
- title = story["title"]
331
-
332
- scene_beats = [p["scene"] for p in pages]
333
- page_texts = [p["text"] for p in pages]
334
-
335
- # Step 2: Generate all 6 images (one Modal call, model loaded once)
336
- img_bytes_list = generate_book_pages.remote(char_desc, scene_beats)
337
-
338
- # Step 3: Assemble HTML book
339
- book_html = build_book_html(img_bytes_list, page_texts, title)
340
-
341
- # Step 4: TTS narration of full book
342
- full_text = f"{title}. " + " ".join(page_texts)
343
- audio_bytes = speak_book.remote(full_text)
344
- audio_path = save_wav(audio_bytes)
345
-
346
- return book_html, f"✅ '{title}' — 6 pages generated!", audio_path
347
-
348
- with gr.Blocks(css=CSS, title="📚 DoodleBook") as demo:
349
- gr.Markdown("# 📚 DoodleBook\n*Draw a character. Get a storybook.*")
350
-
351
- with gr.Row():
352
- with gr.Column(scale=1):
353
- doodle = gr.Image(sources=["webcam","upload"], label="📸 Photo of your doodle", type="numpy")
354
- char_name = gr.Textbox(label="Character name", placeholder="Ziggy the robot")
355
- hero_name = gr.Textbox(label="Hero name in the story", placeholder="Ziggy")
356
- theme = gr.Dropdown(choices=THEMES, value=THEMES[0], label="Story theme")
357
- make_btn = gr.Button("✨ Make my book!", variant="primary")
358
- gr.Examples(
359
- examples=[["assets/sample_doodle.jpg", "Ziggy", "Ziggy", "brave adventure"]],
360
- inputs=[doodle, char_name, hero_name, theme]
361
- )
362
- status = gr.Textbox(label="Status", interactive=False)
363
-
364
- with gr.Column(scale=2):
365
- book_display = gr.HTML(label="Your storybook")
366
- audio_narration = gr.Audio(label="🎙️ Listen to your book", autoplay=False)
367
-
368
- make_btn.click(
369
- create_book,
370
- inputs=[doodle, char_name, theme, hero_name],
371
- outputs=[book_display, status, audio_narration]
372
- )
373
-
374
- demo.launch()
375
- ```
376
-
377
- ---
378
-
379
- ## TODO 1 — Doodle style extraction for LoRA prompt conditioning
380
- After core pipeline works: use MiniCPM-V to *describe* the uploaded doodle in visual terms ("thick black outlines, bright primary colors, stick figure proportions, sun in top corner"). Prepend this extracted style description to every FLUX prompt so the generated images actually *match* the child's drawing style, not just the character concept. This is what makes the output genuinely feel like "their" character.
381
-
382
- ## TODO 2 — PDF export + shareable link
383
- Assemble the 6 PNG pages into a downloadable PDF using `fpdf2` or `reportlab`. Add a "Download your book as PDF" button (gr.DownloadButton). Also export the full book as a shareable HF dataset entry (with the prompts, seeds, and LoRA version used) — this earns the Open Trace badge and means families can re-generate the same book later.
384
-
385
- ---
386
-
387
- ## Sponsor + badge alignment
388
-
389
- | Award | Why |
390
- |---|---|
391
- | Thousand Token Wood podium | Unique concept — nobody combines child doodle + FLUX LoRA + story |
392
- | Black Forest Labs ($3k pool) | FLUX.2-klein + custom LoRA — near-empty sponsor field |
393
- | OpenBMB award (Wood track) | MiniCPM5-1B (story) + VoxCPM2 (narration) |
394
- | Well-Tuned ($badge) | Published LoRA on HF |
395
- | Off-Brand ($1,500) | Storybook CSS with yellowed pages, serif font — zero Gradio defaults |
396
- | Best Demo ($1,000) | Child hearing their drawing narrated as a book = perfect 60-sec video |
397
- | Community Choice | Shareable, emotional — parents will post this |
398
-
399
- ---
400
-
401
- ## Non-negotiables
402
- - Pre-generate and include a complete sample book (all 6 pages) in `assets/sample_book/` so judges can see what it looks like without waiting for generation
403
- - FLUX on A100 (not A10G) — FLUX.2-klein may need 24GB+ VRAM; check Modal memory settings
404
- - If LoRA not yet trained, the app must still run with base FLUX.2-klein (no LoRA) — degrade gracefully, note "LoRA coming" in UI
405
- - Generation time: expect 60-90 seconds for 6 images. Show progress: "Illustrating page 1 of 6..."
406
- - Verify exact FLUX.2-klein model ID on HF Hub before writing any import statements
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
DEPLOY.md DELETED
@@ -1,102 +0,0 @@
1
- # DoodleBook — Hugging Face Deployment Guide
2
-
3
- ## Option 1: HF Spaces + Modal (Recommended)
4
-
5
- ### Setup Steps
6
-
7
- 1. **Create HF Space**
8
- ```bash
9
- # Install HF CLI
10
- pip install huggingface_hub
11
-
12
- # Login
13
- huggingface-cli login
14
-
15
- # Create space
16
- huggingface-cli space create your-username/doodlebook --sdk gradio
17
- ```
18
-
19
- 2. **Set Modal Secrets**
20
- ```bash
21
- # In HF Space Settings → Secrets:
22
- MODAL_TOKEN_ID=your_modal_token
23
- HF_TOKEN=your_hf_token
24
- ```
25
-
26
- 3. **Upload Code**
27
- ```bash
28
- cd doodlebook
29
- git init
30
- git add .
31
- git commit -m "Initial commit"
32
- git remote add origin https://huggingface.co/spaces/your-username/doodlebook
33
- git push -u origin main
34
- ```
35
-
36
- ### Inference Times
37
-
38
- | Scenario | Cold Start | Warm |
39
- |----------|-----------|------|
40
- | Sample Book | 0s | 0s |
41
- | First Generation | 2-3 min | 30-60s |
42
- | Subsequent | 30-60s | 20-40s |
43
-
44
- ---
45
-
46
- ## Option 2: HF ZeroGPU (Free, No Modal)
47
-
48
- ### Changes Needed
49
-
50
- Replace Modal calls with direct inference on ZeroGPU:
51
-
52
- ```python
53
- # In modal_workers/modal_image_gen.py
54
- # Remove Modal, use direct torch inference
55
- ```
56
-
57
- ### Inference Times
58
-
59
- | Scenario | Cold Start | Warm |
60
- |----------|-----------|------|
61
- | Sample Book | 0s | 0s |
62
- | First Generation | 3-5 min | 1-2 min |
63
- | Subsequent | 1-2 min | 45-90s |
64
-
65
- ---
66
-
67
- ## Option 3: HF Inference API
68
-
69
- ### Setup
70
-
71
- ```python
72
- import requests
73
-
74
- API_URL = "https://api-inference.huggingface.co/models/black-forest-labs/FLUX.2-klein-4B"
75
- headers = {"Authorization": "Bearer your_hf_token"}
76
-
77
- def query(payload):
78
- response = requests.post(API_URL, headers=headers, json=payload)
79
- return response.content
80
- ```
81
-
82
- ### Inference Times
83
-
84
- | Scenario | Cold Start | Warm |
85
- |----------|-----------|------|
86
- | Single Image | 10-30s | 5-15s |
87
-
88
- ---
89
-
90
- ## Recommendation for Hackathon
91
-
92
- **Use Option 1 (HF Spaces + Modal)** because:
93
- - ✅ Sample book loads instantly (no compute)
94
- - ✅ Warm generation ~30s (fast demo)
95
- - ✅ Modal keeps model warm during judging
96
- - ✅ Free HF Space (CPU only)
97
- - ✅ Modal only charges during generation
98
-
99
- ### Cost Estimate
100
- - HF Space: Free
101
- - Modal: ~$0.50 per demo generation
102
- - Total for hackathon: ~$5-10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
EXECUTION_PLAN.md DELETED
@@ -1,414 +0,0 @@
1
- # DoodleBook — Master Execution Plan
2
- **Build Small Hackathon 2026 · "Adventure in Thousand Token Wood" Track**
3
- _Senior architect / strategist / tech-lead review of `B2_doodlebook_prompt.md`_
4
-
5
- > Source-of-truth concept: A child draws a crayon character → photograph it → the app produces a
6
- > **consistent, illustrated 6-page storybook** (same character + same art style on every page),
7
- > written by **MiniCPM5-1B**, narrated by **VoxCPM2**, illustrated by **FLUX.2-klein + a crayon-style LoRA**.
8
-
9
- ---
10
-
11
- ## 0. Critical engineering corrections (read first)
12
-
13
- The original prompt is excellent on vision and badge strategy but has **5 load-bearing technical risks**. Fix these before writing code or you will fail the live demo.
14
-
15
- | # | Risk in original | Reality | Fix (this plan adopts) |
16
- |---|---|---|---|
17
- | **C1** | Implies per-child LoRA so each kid's character is reproduced | LoRA training = minutes–hours on A100. You **cannot** train per user at demo time. | LoRA trains **ONE crayon art-style** offline. Per-character consistency comes from **(a) locked seed, (b) the child's doodle used as an image prompt** via IP-Adapter / FLUX Redux / img2img, and **(c) a fixed character description** reused on every page. Never claim live per-child training. |
18
- | **C2** | 1B model must emit strict JSON | 1B models break JSON constantly (trailing commas, prose, truncation). | Constrained decoding + a **3-layer parser**: (1) regex extract, (2) `json5`/repair, (3) deterministic template fallback so the app NEVER crashes. Few-shot prompt with one full exemplar. |
19
- | **C3** | "60–90s for 6 images" | True for warm GPU; **Modal cold start of a 12B diffusion model = 2–4 min**. | Modal `@app.cls` with `keep_warm=1` during demo window + model weights on a **Modal Volume** (no re-download). Pre-warm button. Always ship a pre-generated sample book. |
20
- | **C4** | "Small Models" hackathon, but hero model is 12B | Judges may discount the "small" claim. | **Reframe narrative:** the *brain* (story + voice) is a **3B total small-model stack** (MiniCPM5-1B + VoxCPM2); FLUX is the *renderer/printer*. Add a real **"Tiny Mode"** (SDXL-Turbo or SD-Turbo + style LoRA, runnable on T4/edge) to make the small-model claim defensible and unlock the edge-device track. |
21
- | **C5** | Assumes model IDs are correct | Prompt itself flags FLUX.2-klein / MiniCPM5-1B / VoxCPM2 as unverified. | **Phase 1, Task 0:** verify every model-card ID on HF Hub; wire fallbacks (`FLUX.1-schnell`, `MiniCPM3-4B`, `MeloTTS`/`Kokoro`) behind one config block. |
22
-
23
- Everything below assumes these corrections.
24
-
25
- ---
26
-
27
- ## 1. Project Analysis
28
-
29
- ### Executive summary
30
- DoodleBook converts a child's hand-drawn crayon character (captured by photo) into a complete, narrated, visually-consistent 6-page picture book in ~2 minutes. It fuses three sponsor models into a single emotional, demo-perfect artifact: a 1B story writer, a 2B voice, and a 12B image model steered by a custom crayon-style LoRA. The moat is the **combination** — nobody else pairs a child's real drawing with a fine-tuned FLUX LoRA and on-model narration.
31
-
32
- ### Problem statement
33
- Kids create characters constantly but those drawings die on the fridge. Parents can't turn them into the stories children imagine. Existing "AI storybook" apps (a) ignore the child's actual art, (b) produce inconsistent characters page-to-page, and (c) feel like generic AI slop, not *the child's* creation. There is no tool that preserves the child's own visual style across a coherent narrated book.
34
-
35
- ### Target users
36
- - **Primary:** Parents of children 3–8 (gift/keepsake, bedtime, screen-time-with-purpose).
37
- - **Secondary:** Early-years teachers (creative writing prompts), pediatric/occupational therapists (expressive activities), grandparents (remote bonding).
38
- - **Demo persona:** Judge watching a 60-sec video of a kid hearing their own drawing read aloud as a book.
39
-
40
- ### Market need
41
- - AI-storytime apps are a growing category but commoditized and character-inconsistent.
42
- - Differentiator the market lacks: **"your child's actual drawing becomes the book's art."** Emotional keepsake > generic generation. High shareability (parents post their kids).
43
-
44
- ### Competitive advantage / moat
45
- 1. **Underused sponsor field** — FLUX.2-klein + custom LoRA is near-empty in the competition (per original recon).
46
- 2. **Style-faithful character consistency** via seed-lock + image-prompt conditioning + LoRA (multi-signal, robust).
47
- 3. **Full small-model stack** end-to-end on sponsor models (story + voice + image).
48
- 4. **Off-brand storybook UI** — zero Gradio defaults.
49
- 5. **Emotional demo** — strongest 60-second-video category in the whole event.
50
-
51
- ### Innovation score (self-assessed)
52
- | Axis | Score /10 | Note |
53
- |---|---|---|
54
- | Concept originality | 9 | Doodle→consistent book is genuinely novel |
55
- | Technical depth | 8 | LoRA + multi-model orchestration + consistency engineering |
56
- | Feasibility in hackathon window | 7 | Achievable IF corrections C1–C5 applied |
57
- | Demo/emotional impact | 10 | Best-in-show potential |
58
- | Track coverage | 9 | 5+ awards reachable |
59
- | **Composite** | **8.6** | Strong winner profile |
60
-
61
- ### Hackathon-winning potential
62
- **High.** Realistic path to: Thousand Token Wood podium + Black Forest Labs sponsor award + OpenBMB award + Off-Brand + Best Demo + Community Choice. Five+ simultaneous award surfaces is the strategy (see §7).
63
-
64
- ---
65
-
66
- ## 2. Product Vision
67
-
68
- ### Long-term vision
69
- The default way a family turns a child's imagination into a keepsake — "Instagram for the things your kid invents." Drawing → narrated book → printed photo-book → series with recurring characters.
70
-
71
- ### Future roadmap
72
- - **v1 (hackathon):** Doodle → 6-page narrated book, PDF export, Open-Trace share.
73
- - **v2:** Character library (recurring heroes across books), multi-character scenes, child-voice cloning (with guardian consent), print-on-demand.
74
- - **v3:** Collaborative books (siblings co-create), classroom mode, multilingual narration, animation (short clips per page).
75
- - **v4:** On-device "Tiny Mode" mobile app for offline bedtime generation.
76
-
77
- ### Scalability opportunities
78
- - Stateless generation workers (Modal autoscale) behind a thin Gradio/HF front door.
79
- - Cache by (doodle-hash + theme + seed) to dedupe regenerations.
80
- - Batch the 6 pages in one warm container (already in the original `generate_book_pages`).
81
-
82
- ### Edge-device deployment possibilities
83
- - **Story:** MiniCPM5-1B quantized (GGUF/llama.cpp, int4) runs on a laptop/phone NPU.
84
- - **Voice:** VoxCPM2 or a Kokoro/MeloTTS fallback runs on CPU.
85
- - **Image (Tiny Mode):** SD-Turbo / SDXL-Turbo + tiny style LoRA at 1–4 steps on a single consumer GPU or Apple Silicon; sub-second-to-a-few-seconds pages.
86
- - Ship a documented "edge profile" config to claim the edge/small-model narrative credibly.
87
-
88
- ### Small-model optimization strategy
89
- - 4-bit (NF4/bitsandbytes) for MiniCPM; `torch.compile` + bf16 for FLUX; sequential CPU offload to fit smaller GPUs.
90
- - FLUX turbo settings: 4–20 steps; Tiny Mode = 1–4 steps with turbo image model.
91
- - Modal Volume model cache; `keep_warm` only during judging.
92
- - KV-cache reuse + greedy decode for deterministic, fast story gen.
93
-
94
- ---
95
-
96
- ## 3. Technical Architecture
97
-
98
- ### System architecture (text diagram)
99
- ```
100
- ┌──────────────────────────────────────────────┐
101
- │ HF Space (Gradio 5.x, custom storybook UI) │
102
- │ app.py · custom.css · book_builder.py │
103
- └───────────────┬──────────────────────────────┘
104
- │ orchestration (sync calls)
105
- ┌────────────────────────────────┼─────────────────────────────────┐
106
- ▼ ▼ ▼
107
- ┌───────────────┐ ┌───────────────────┐ ┌──────────────────┐
108
- │ modal_story │ │ modal_image_gen │ │ modal_tts │
109
- │ MiniCPM5-1B │ │ FLUX.2-klein + │ │ VoxCPM2 │
110
- │ (T4) JSON │ char_desc │ crayon LoRA (A100)│ pages.png │ (T4/A10G) wav │
111
- │ story+scenes │ ───────────► │ + IP-Adapter/img2 │ ──────────► │ narration │
112
- └───────┬───────┘ scenes │ img from doodle │ └────────┬─────────┘
113
- │ └─────────┬─────────┘ │
114
- │ pages[text,scene] │ 6 page images │ audio
115
- └───────────────┬────────────────┴────────────────────────────────┘
116
-
117
- ┌────────────────────┐
118
- │ book_builder.py │ → storybook HTML (gr.HTML) + PDF (fpdf2)
119
- │ + open_trace.py │ → HF dataset trace (prompts/seeds/lora)
120
- └────────────────────┘
121
- ```
122
-
123
- ### Frontend architecture
124
- - **Gradio 5.x `gr.Blocks`** single-page, two-column: input panel (left), live book viewer (right).
125
- - Custom **storybook CSS** (yellowed paper, serif, drop shadows, page-flip feel) → Off-Brand badge.
126
- - Progressive reveal: pages stream in as generated ("Illustrating page 3 of 6…").
127
- - `gr.HTML` book canvas, `gr.Audio` narration, `gr.DownloadButton` PDF, `gr.Gallery` fallback.
128
-
129
- ### Backend architecture
130
- - **Modal** for all heavy compute, 3 apps (story/image/tts), each a warm-able class.
131
- - **Stateless** functions; book assembly + trace logging in the Space process.
132
- - Config module (`config.py`) holds every model ID + fallback + generation params (single source of truth → fixes C5).
133
-
134
- ### AI model architecture
135
- - **Story (MiniCPM5-1B, T4):** few-shot, greedy, `max_new_tokens≈800`, constrained JSON + 3-layer parser (C2). Outputs `title`, `character_description`, `pages[{page,text,scene}]`.
136
- - **Image (FLUX.2-klein + LoRA, A100):**
137
- - Crayon **style LoRA** (offline-trained, rank 16) fused at scale ~0.8.
138
- - **Consistency stack:** locked base seed `S`; page `i` uses `S+i`; reuse identical `character_description` token block; **doodle image fed as image prompt** (IP-Adapter / Redux / img2img strength ~0.3–0.5) so output resembles the child's drawing (this realizes original TODO 1).
139
- - 20 steps (Standard) / 4 steps (Tiny Mode), guidance ~3.5, 768×512.
140
- - **Voice (VoxCPM2, T4/A10G):** narrate `title + page texts`; return wav.
141
- - **Doodle understanding (MiniCPM-V, optional):** caption the drawing → style tokens prepended to FLUX prompt (original TODO 1).
142
-
143
- ### Data flow
144
- 1. User uploads/photographs doodle + name + theme.
145
- 2. Story worker → JSON (title, char desc, 6×{text,scene}).
146
- 3. (Opt) Doodle captioner → style tokens.
147
- 4. Image worker → 6 PNGs (seed-locked, LoRA + doodle-conditioned).
148
- 5. TTS worker → narration wav.
149
- 6. `book_builder` → HTML book + PDF; `open_trace` → HF dataset row.
150
-
151
- ### API structure (internal contracts)
152
- ```
153
- generate_story(hero_name:str, theme:str, age:int=5) -> {title, character_description, pages:[{page,text,scene}]}
154
- generate_book_pages(character_desc:str, story_beats:list[str], doodle:bytes|None,
155
- art_style:str, seed:int=42, tiny:bool=False) -> list[bytes]
156
- speak_book(text:str, voice:str="warm") -> bytes(wav)
157
- build_book_html(images:list[bytes], texts:list[str], title:str) -> str
158
- export_pdf(images, texts, title) -> path
159
- log_trace(payload) -> dataset_url
160
- ```
161
-
162
- ### Storage strategy
163
- - Model weights → **Modal Volume** (cache, no re-download → fixes C3).
164
- - Generated assets → ephemeral `/tmp` in Space; user downloads PDF.
165
- - Traces → **HF Dataset** `build-small-hackathon/doodlebook-traces` (Open Trace badge).
166
- - LoRA weights → **HF model repo** `build-small-hackathon/doodlebook-flux-lora` (Well-Tuned badge).
167
-
168
- ### Deployment strategy
169
- - Front end on **HF Spaces** (Gradio SDK 5.0, `app.py`).
170
- - Compute on **Modal** (secrets: `HF_TOKEN`, endpoint URLs via `.env`).
171
- - `keep_warm=1` on image app only during judging window; scale to 0 after.
172
-
173
- ### Performance optimization plan
174
- - One warm container generates all 6 pages (already designed).
175
- - bf16 + optional `torch.compile`; turbo step counts; sequential CPU offload fallback.
176
- - Stream page-by-page UI updates (perceived speed).
177
- - Pre-generated sample book for instant judge view (non-negotiable).
178
- - Tiny Mode for sub-10s full books on cheap GPU.
179
-
180
- ---
181
-
182
- ## 4. UI/UX Design Plan
183
-
184
- ### Design philosophy
185
- "**A warm digital picture book, not a dashboard.**" Tactile, nostalgic, magical — looks hand-made, hides all ML. Every interaction should feel like turning a page, not running a model.
186
-
187
- ### User journeys
188
- 1. **First-time parent (happy path):** land → see sample book glowing → upload kid's drawing → name + theme → "✨ Make my book!" → progress storybook fills page-by-page → narration auto-ready → download PDF / share. <2 min, zero jargon.
189
- 2. **Judge (cold, impatient):** lands on a finished sample book immediately (no generation wait) → clicks "Hear it" → reads the story → optionally generates one live → sees Open-Trace link. Wow in <15s.
190
- 3. **Returning user (v2 vision):** pick a saved character → new adventure → consistent hero.
191
-
192
- ### Wireframe descriptions
193
- - **Header:** centered title "📚 DoodleBook", subtitle "Draw a character. Get a storybook.", soft paper texture.
194
- - **Left input card (scale 1):** webcam/upload doodle, character name, hero name, theme dropdown, big orange "Make my book!" CTA, Examples row (loads sample), status line, "⚡ Tiny Mode" toggle.
195
- - **Right book viewer (scale 2):** large `gr.HTML` book — title page then 6 illustrated text pages with page numbers, yellowed background, serif body; narration audio bar pinned above; "⬇ Download PDF" + "🔗 Share trace" buttons below.
196
- - **Progress state:** skeleton page slots fill one-by-one with "Illustrating page N of 6…".
197
-
198
- ### Dashboard / layout
199
- - Single page, two columns desktop; stacked on mobile (input → book).
200
- - Optional collapsible "🔬 Behind the magic" panel showing prompts/seeds/LoRA (judge candy + Open Trace).
201
-
202
- ### Color palette
203
- | Token | Hex | Use |
204
- |---|---|---|
205
- | Paper | `#FEF9E7` | app background |
206
- | Page | `#FFFDE7` | book pages |
207
- | Ink | `#3E2723` | body text |
208
- | Title brown | `#5D4037` | headings |
209
- | Crayon orange | `#FF7043` | primary CTA |
210
- | Sky accent | `#4FC3F7` | secondary/links |
211
- | Muted | `#BCAAA4` | page numbers/meta |
212
-
213
- ### Typography
214
- - Display/title: **Georgia / "Fredoka" / "Baloo 2"** (rounded, child-friendly).
215
- - Body: **Georgia serif** 20–22px, line-height 1.9 (read-aloud comfortable).
216
- - Avoid system sans defaults — they read as "Gradio".
217
-
218
- ### Accessibility
219
- - WCAG AA contrast (ink on page passes); 18px+ body.
220
- - Audio narration = built-in alt for non-readers; captions = page text.
221
- - All controls keyboard reachable; alt text on every generated image (use page `text`).
222
- - Respect `prefers-reduced-motion` (disable page-flip animation).
223
-
224
- ### Mobile responsiveness
225
- - Columns collapse to stack; CTA full-width sticky; webcam capture works on phones (parents photograph the drawing in-app).
226
-
227
- ### Demo-friendly interactions
228
- - Auto-load sample book on launch (no empty state).
229
- - Page-by-page streaming reveal (visible progress = perceived magic).
230
- - One-tap "Play narration".
231
- - "Tiny Mode" toggle to show edge story live without long waits.
232
-
233
- ---
234
-
235
- ## 5. Gradio Implementation Plan
236
-
237
- ### App structure
238
- ```
239
- app.py
240
- ├─ config.py # model IDs + fallbacks + params (single source of truth)
241
- ├─ ui/
242
- │ ├─ layout.py # gr.Blocks layout
243
- │ └─ custom.css # storybook styling
244
- ├─ services/
245
- │ ├─ story.py # calls modal_story_gen
246
- │ ├─ images.py # calls modal_image_gen
247
- │ ├─ tts.py # calls modal_tts
248
- │ ├─ book_builder.py # HTML + PDF
249
- │ └─ trace.py # Open Trace dataset logging
250
- └─ modal/
251
- ├─ modal_story_gen.py
252
- ├─ modal_image_gen.py
253
- └─ modal_tts.py
254
- ```
255
-
256
- ### Component hierarchy
257
- ```
258
- gr.Blocks(css, theme)
259
- ├─ Header (gr.Markdown)
260
- ├─ gr.Row
261
- │ ├─ gr.Column(scale=1) # inputs
262
- │ │ ├─ gr.Image(sources=[webcam,upload])
263
- │ │ ├─ gr.Textbox char_name / hero_name
264
- │ │ ├─ gr.Dropdown theme
265
- │ │ ├─ gr.Checkbox tiny_mode
266
- │ │ ├─ gr.Button "Make my book!" (primary)
267
- │ │ ├─ gr.Examples (sample)
268
- │ │ └─ gr.Textbox status (interactive=False)
269
- │ └─ gr.Column(scale=2) # output
270
- │ ├─ gr.Audio narration
271
- │ ├─ gr.HTML book_display
272
- │ ├─ gr.DownloadButton PDF
273
- │ └─ gr.Accordion "Behind the magic" (prompts/seeds)
274
- ```
275
-
276
- ### Pages & navigation
277
- Single page (hackathon-optimal). "Pages" = sections of the book inside the HTML canvas. No router needed.
278
-
279
- ### User interaction flow
280
- `make_btn.click(create_book, inputs=[...], outputs=[book_html, status, audio, pdf])` — use a **generator function** (`yield`) so status + pages stream in, not one blocking return.
281
-
282
- ### Model integration approach
283
- - Space process is thin orchestrator; all GPU work via `modal.Function.remote()`.
284
- - Defensive: every remote call wrapped in try/except → graceful UI error + fallback (base FLUX if no LoRA, template story if JSON fails).
285
-
286
- ### Performance considerations
287
- - Single warm Modal container per book (6 images batched).
288
- - `gr.Progress()` for the progress bar; `yield` partial books.
289
- - Cache sample book in memory at startup.
290
-
291
- ### Deployment on HF Spaces
292
- - `sdk: gradio`, `sdk_version: "5.0"`, `app_file: app.py` (frontmatter already specified).
293
- - Secrets: `HF_TOKEN`, `MODAL_ENDPOINT_URL` (or Modal token) in Space settings.
294
- - Keep Space CPU-only (compute offloaded to Modal) → cheap, always-on.
295
-
296
- ---
297
-
298
- ## 6. Development Roadmap
299
-
300
- > Effort assumes a single builder + coding agent. Sequence is dependency-ordered.
301
-
302
- ### Phase 1 — Foundation
303
- - **Tasks:** Verify all model IDs on HF Hub (Task 0, fixes C5); scaffold repo per directory structure; `config.py` with IDs + fallbacks + params; `requirements.txt`; `.env.example`; Modal account + secrets; bare Gradio shell that loads and shows static sample book.
304
- - **Dependencies:** HF + Modal accounts, tokens.
305
- - **Effort:** ~0.5 day.
306
- - **Risks:** Model IDs differ from prompt → fallback wiring matters.
307
- - **Success:** `app.py` launches locally, shows sample book, `config.py` resolves real model IDs.
308
-
309
- ### Phase 2 — Core Features
310
- - **Tasks:** `modal_story_gen.py` with 3-layer JSON parser + template fallback (C2); `book_builder.py` HTML; PDF export (`fpdf2`); storybook CSS; wire story→book (text only, placeholder images).
311
- - **Dependencies:** Phase 1.
312
- - **Effort:** ~1 day.
313
- - **Risks:** 1B JSON instability → mitigated by parser + fallback.
314
- - **Success:** Enter name+theme → get a valid 6-page text book + PDF, no crashes even on bad model output.
315
-
316
- ### Phase 3 — AI Integration
317
- - **Tasks:** `modal_image_gen.py` FLUX pipeline; Modal Volume model cache + `keep_warm` (C3); seed-lock + doodle image-prompt consistency stack (C1); graceful base-FLUX fallback if no LoRA; `modal_tts.py` VoxCPM2 (+ Kokoro/MeloTTS fallback); full pipeline story→images→audio.
318
- - **Dependencies:** Phase 2; LoRA may still be training (degrade gracefully).
319
- - **Effort:** ~1.5 days.
320
- - **Risks:** Cold starts (C3), VRAM (use A100, CPU offload fallback), model-ID drift.
321
- - **Success:** End-to-end live book in <2 min warm; consistent character across pages; narration plays.
322
-
323
- ### Phase 4 — UI/UX Enhancement
324
- - **Tasks:** Streaming page-by-page reveal (`yield`); progress text; "Behind the magic" accordion; Tiny Mode toggle (C4); mobile responsive CSS; Examples auto-load; accessibility pass (alt text, contrast, reduced-motion).
325
- - **Dependencies:** Phase 3.
326
- - **Effort:** ~1 day.
327
- - **Risks:** Gradio streaming quirks; CSS scope leaks.
328
- - **Success:** Off-Brand-worthy UI; live progress; works on phone; Tiny Mode produces a book fast.
329
-
330
- ### Phase 5 — Optimization
331
- - **Tasks:** Train + publish crayon style LoRA (Well-Tuned); quantization/turbo settings; Tiny Mode SD-Turbo path; trace logging to HF dataset (Open Trace); pre-generate + commit sample book (6 pages); error hardening.
332
- - **Dependencies:** Phases 3–4.
333
- - **Effort:** ~1 day (+ LoRA train time in background).
334
- - **Risks:** LoRA quality/time → app must run on base model meanwhile (non-negotiable from original).
335
- - **Success:** LoRA on HF, traces logged, sample book committed, Tiny Mode real, no unhandled errors.
336
-
337
- ### Phase 6 — Submission Preparation
338
- - **Tasks:** README with exact frontmatter + Tiny Titan argument; record 60-sec demo video (child hearing book); blog post (Field Notes) on FLUX+LoRA consistency; screenshots/GIFs; deploy + smoke test on Spaces; final checklist (§9).
339
- - **Dependencies:** All prior.
340
- - **Effort:** ~0.5–1 day.
341
- - **Risks:** Last-minute deploy breakage → smoke test early, keep sample-book path independent of live compute.
342
- - **Success:** Public Space loads sample instantly, live gen works, all badges' artifacts published, video submitted.
343
-
344
- **Total:** ~5–6 focused days. Critical path: Phase 1 (model IDs) → Phase 3 (FLUX+consistency) → Phase 6 (deploy/video).
345
-
346
- ---
347
-
348
- ## 7. Hackathon Strategy
349
-
350
- ### Tracks / awards to target (stack as many as possible)
351
- | Award | Lever |
352
- |---|---|
353
- | Thousand Token Wood podium | Unique doodle→consistent-book concept |
354
- | Black Forest Labs ($3k) | FLUX.2-klein + published custom LoRA (sparse field) |
355
- | OpenBMB award | MiniCPM5-1B story + VoxCPM2 narration |
356
- | Well-Tuned | Published LoRA on HF |
357
- | Off-Brand ($1,500) | Storybook UI, zero Gradio defaults |
358
- | Best Demo ($1,000) | Child hearing their drawing narrated |
359
- | Community Choice | Shareable, emotional, parents repost |
360
- | Tiny Titan (claimable) | Argue story generator (1B) is the primary AI + real Tiny Mode |
361
-
362
- ### How to maximize scoring
363
- - One artifact, many badges: every badge needs a concrete published thing (LoRA repo, trace dataset, blog post, off-brand UI) — produce all four.
364
- - Lead with emotion + sponsor-model usage in README's first 5 lines.
365
- - Show, don't tell: pre-generated sample book + 60-sec video carry the score even if live gen is slow.
366
-
367
- ### What judges look for
368
- Working demo, clear sponsor-model use, originality, polish, reproducibility (traces + LoRA), and a story that makes them feel something. DoodleBook is built to hit all six.
369
-
370
- ### Demo strategy
371
- - Open on the finished sample book (instant wow, no wait).
372
- - Play narration immediately.
373
- - Then generate one live (or Tiny Mode) to prove it's real.
374
- - End on the Open-Trace link + "made by a 1B + 2B small-model brain."
375
-
376
- ### Presentation / storytelling
377
- Narrative arc: "Kids invent characters every day and we throw them away. Watch what happens when a 1B model gives one a story and FLUX gives it a book — in the child's own art." Personal, concrete, sponsor-forward.
378
-
379
- ### Key differentiators
380
- Child's real art preserved · cross-page character consistency engineering · full small-model stack · off-brand keepsake UX · reproducible traces + LoRA.
381
-
382
- ---
383
-
384
- ## 8. README Plan
385
- ```
386
- # 📚 DoodleBook (+ exact HF frontmatter block from original prompt)
387
- > Elevator pitch: Draw a character → get a narrated, illustrated 6-page storybook in your child's own art.
388
- 1. ✨ Features (consistency, narration, doodle-faithful art, PDF, Tiny Mode, traces)
389
- 2. 🧠 Models & why (table: FLUX.2-klein+LoRA / MiniCPM5-1B / VoxCPM2) + Tiny Titan argument
390
- 3. 🏗️ Architecture (diagram + data flow)
391
- 4. ⚙️ Installation (clone, requirements, Modal setup, HF token, .env)
392
- 5. ▶️ Usage (run app.py, upload doodle, make book)
393
- 6. 🖼️ Screenshots (sample book pages, UI)
394
- 7. 🎬 Demo (60-sec video link + live Space link)
395
- 8. 🔬 Reproducibility (LoRA repo, Open-Trace dataset, seeds)
396
- 9. 🛣️ Future work (character library, voice cloning, print-on-demand, edge app)
397
- 10. 🏅 Hackathon badges (Well-Tuned, Off-Brand, Field Notes, Open Trace)
398
- 11. 📄 License (Apache-2.0 / MIT) · 👥 Contributors
399
- ```
400
-
401
- ## 9. Submission Checklist
402
- **Code** ☐ app launches clean ☐ all Modal fns callable ☐ graceful fallbacks (no-LoRA, bad-JSON, remote error) ☐ config has verified model IDs + fallbacks
403
- **Docs** ☐ README + exact frontmatter ☐ install/usage ☐ LoRA reproduce README ☐ Field Notes blog published
404
- **UI polish** ☐ storybook CSS, no Gradio defaults ☐ mobile ok ☐ accessibility (alt/contrast/reduced-motion) ☐ Examples auto-load
405
- **Performance** ☐ warm gen <2 min ☐ keep_warm during judging ��� Tiny Mode works ☐ sample book loads instantly
406
- **Model optimization** ☐ LoRA trained + published ☐ bf16/turbo steps ☐ Tiny Mode SD-Turbo path
407
- **HF deployment** ☐ Space live ☐ secrets set ☐ smoke test ☐ trace dataset public ☐ LoRA repo public
408
- **Demo** ☐ 60-sec video (child hearing book) ☐ live Space link ☐ screenshots/GIFs
409
- **Presentation** ☐ pitch deck/blurb ☐ storytelling script ☐ badge artifacts linked
410
- **Final validation** ☐ fresh-clone run ☐ cold-open judge path tested ☐ all badge claims have a published URL
411
-
412
- ## 10. Agent Execution Prompt
413
- See `AGENT_HANDOFF.md` — a self-contained master prompt for Codex / OpenCode / Cursor / Claude Code to build the whole project with the C1–C5 corrections baked in.
414
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,289 +1,292 @@
1
- ---
2
- title: DoodleBook
3
- emoji: 📚
4
- colorFrom: yellow
5
- colorTo: red
6
- sdk: gradio
7
- sdk_version: "5.50.0"
8
- app_file: app.py
9
- pinned: false
10
- tags:
11
- - hackathon
12
- - build-small
13
- - adventure-in-thousand-token-wood
14
- - gradio
15
- - modal
16
- - flux
17
- - minicmp
18
- - voxcpm
19
- - storybook
20
- - coloring-book
21
- ---
22
-
23
- # DoodleBook
24
-
25
- Draw a character, upload it, and DoodleBook turns it into a narrated six-page picture book plus a matching printable coloring book.
26
-
27
- The project was built for the Build Small Hackathon 2026. The core idea is to keep the reasoning stack small, use a strong image renderer only where it matters, and make the whole flow feel like a child-facing product instead of a model demo.
28
-
29
- ## What it does
30
-
31
- - Takes a doodle photo from upload or webcam.
32
- - Generates a six-page children's story with a consistent hero.
33
- - Renders six full-color story pages with FLUX.
34
- - Generates narration audio for the whole book.
35
- - Exports a story PDF.
36
- - Generates a matching black-and-white coloring book as a second output.
37
-
38
- ## Current architecture
39
-
40
- There are two runtime modes in this repo.
41
-
42
- ### 1. Local Modal-backed app
43
-
44
- Use [run_modal.py](run_modal.py) for the real end-to-end flow during development.
45
-
46
- - UI: Gradio 5 custom Blocks layout
47
- - Story: local generator by default, optional Modal MiniCPM route
48
- - Images: Modal FLUX pipeline
49
- - TTS: Modal VoxCPM pipeline
50
- - PDFs: local export
51
- - Coloring book: direct FLUX line-art render, with traced fallback
52
-
53
- Start it with:
54
-
55
- ```bash
56
- python run_modal.py
57
- ```
58
-
59
- The default local URL is:
60
-
61
- ```text
62
- http://127.0.0.1:7880
63
- ```
64
-
65
- ### 2. HF Spaces / ZeroGPU-oriented app
66
-
67
- Use [app.py](app.py) for the official Hugging Face Gradio Space target.
68
-
69
- - `app.py` is the Space entrypoint declared in the repo metadata.
70
- - `app_zerogpu.py` is the alternate experimental path kept for local ZeroGPU-focused iteration.
71
-
72
- ## Stack used in the hackathon
73
-
74
- This project deliberately mixes a small-model reasoning stack, a stronger dedicated image renderer, a custom Gradio presentation layer, and remote inference infrastructure that is cheap enough to demo but strong enough to feel like a real product.
75
-
76
- The important distinction is:
77
-
78
- - the app "brain" is small
79
- - the renderer is specialized
80
- - the UX is product-shaped, not notebook-shaped
81
- - the deployment path is built around a Gradio Space front-end
82
-
83
- ### Full stack at a glance
84
-
85
- | Layer | Stack | Role in the product |
86
- |---|---|---|
87
- | Product UI | Gradio 5 Blocks + custom CSS/HTML/JS | Child-facing scrapbook interface, status streaming, downloads |
88
- | Story engine | MiniCPM5-1B + local structured fallback | Writes the six-page narrative and scene plan |
89
- | Image engine | FLUX.2-klein-4B on Modal | Draws consistent full-color pages and dedicated coloring pages |
90
- | Voice engine | VoxCPM2 on Modal | Narrates the full storybook |
91
- | Coloring engine | Direct FLUX line-art pass + cleanup fallback | Produces printable black-and-white pages |
92
- | Export layer | Pillow + FPDF | Builds printable story and coloring PDFs |
93
- | Hosting target | Hugging Face Spaces | Gradio app shell and user entrypoint |
94
- | Remote compute | Modal | GPU execution for heavy image and TTS work |
95
- | Observability | Heartbeat streaming + stage timing in trace panel | Keeps long runs visible and debuggable |
96
-
97
- ### Frontend and product shell
98
-
99
- - Gradio 5
100
- - Custom scrapbook-style UI in [ui/layout.py](ui/layout.py)
101
- - HTML-based book rendering in [book_builder.py](book_builder.py)
102
- - Fixed-position PDF downloads under the status panel
103
- - Streaming progress heartbeats to keep long jobs alive in the browser
104
- - File-backed page rendering instead of giant inline base64 payloads
105
-
106
- ### Story generation stack
107
-
108
- - `openbmb/MiniCPM5-1B`
109
- - Local fast fallback story generator in [services/story.py](services/story.py)
110
- - Optional Modal story worker in [modal_workers/modal_story_gen.py](modal_workers/modal_story_gen.py)
111
-
112
- Why it matters:
113
- - The story model is the small-model "brain" of the app.
114
- - It keeps the narrative stack small and hackathon-aligned.
115
- - The story system outputs both prose and scene prompts, so downstream image generation stays structured.
116
- - The local structured fallback means the Space can still produce a valid book if the remote story path is unavailable.
117
-
118
- ### Image generation stack
119
-
120
- - `black-forest-labs/FLUX.2-klein-4B`
121
- - Modal deployment for image generation in [modal_workers/modal_image_gen.py](modal_workers/modal_image_gen.py)
122
- - Parallel canonical-character plus per-page render flow in [services/images.py](services/images.py)
123
- - One canonical character render from the child doodle, then scene-specific page renders
124
- - Separate direct line-art render path for the coloring book
125
-
126
- Why it matters:
127
- - The app needs high visual quality and character consistency.
128
- - FLUX is used as the renderer, not as the reasoning engine.
129
- - The character consistency pipeline is what makes the book feel authored rather than randomly reimagined on every page.
130
- - The line-art renderer is separate because tracing finished crayon pages produced bad coloring results.
131
-
132
- ### TTS stack
133
-
134
- - `openbmb/VoxCPM2`
135
- - Modal TTS worker in [modal_workers/modal_tts.py](modal_workers/modal_tts.py)
136
- - Service wrapper in [services/tts.py](services/tts.py)
137
- - Parallelized with image generation in the real Modal-backed app
138
-
139
- Why it matters:
140
- - Narration is part of the child-facing experience, not a side feature.
141
- - TTS runs in parallel with image generation in the real local pipeline.
142
- - Overlapping TTS with illustration time reduces total wait without degrading output quality.
143
-
144
- ### Coloring-book stack
145
-
146
- - Direct FLUX line-art rendering for the same scenes
147
- - Cleanup and fallback pipeline in [services/coloring.py](services/coloring.py)
148
- - Modal `render_coloring_page` for dedicated line-art scene generation
149
- - Local cleanup for thresholding, despeckling, and printable black-on-white output
150
-
151
- Why it matters:
152
- - The main bug fixed in this version was that the coloring book used to trace finished crayon-textured images.
153
- - The improved pipeline renders dedicated line-art pages instead of trying to strip color out after the fact.
154
- - This is the main quality improvement that separates the current version from the earlier broken coloring-book output.
155
-
156
- ### Infrastructure stack
157
-
158
- - Modal for remote GPU inference
159
- - Hugging Face Spaces as the target host
160
- - Python 3.11 / 3.13 local development
161
- - `diffusers`, `transformers`, `torch`, `accelerate`
162
- - `Pillow`, `OpenCV`, `FPDF`
163
- - Gradio client-compatible API surface for testing and debugging
164
- - Hugging Face org deployment target: `build-small-hackathon`
165
-
166
- ### Sponsor and hackathon alignment
167
-
168
- This app directly reflects the hackathon sponsor/tool stack:
169
-
170
- - `OpenBMB`: MiniCPM5-1B and VoxCPM2
171
- - `Black Forest Labs`: FLUX.2-klein-4B
172
- - `Modal`: remote GPU inference
173
- - `OpenAI Codex`: debugging, architecture fixes, deployment preparation, README/release work
174
- - `Hugging Face Spaces`: final Gradio app surface
175
-
176
- For hackathon judging, the main narrative is:
177
-
178
- - Tiny Titan reasoning stack
179
- - Off-brand custom UI
180
- - Real multimodal product loop
181
- - Remote GPU orchestration with a Gradio user experience
182
- - Child-usable output artifacts: storybook PDF, audio, coloring book PDF
183
-
184
- ## Key engineering fixes in this version
185
-
186
- - Added direct Modal coloring-page rendering with `render_coloring_page`.
187
- - Fixed the live app to keep the Gradio stream alive during long coloring generation.
188
- - Added stage timing so story, image, PDF, TTS, and coloring costs are visible.
189
- - Reduced final-page payload size by replacing giant inline base64 book HTML with file-backed image URLs.
190
- - Fixed download serving through Gradio temp-file paths.
191
- - Removed port confusion between the local test app and the real Modal-backed app.
192
-
193
- ## Measured performance
194
-
195
- Measured against the real local Modal-backed app flow:
196
-
197
- - Story-only stage: about `0.3s`
198
- - Full-color book, warm: about `75s to 80s`
199
- - Full-color book + coloring book, warm: about `200s`
200
- - Slowest stage: coloring-book generation
201
-
202
- The current bottleneck is still the coloring-book path, even after the direct line-art fix.
203
-
204
- ## Repository layout
205
-
206
- ```text
207
- app.py Main Gradio app variant
208
- app_zerogpu.py ZeroGPU-oriented app variant
209
- run_modal.py Real local Modal-backed app
210
- book_builder.py HTML and PDF assembly
211
- services/ Orchestration and fallbacks
212
- modal_workers/ Modal remote workers
213
- ui/ Custom Gradio layout
214
- assets/ Sample doodles and sample book pages
215
- docs/ Specs and notes
216
- ```
217
-
218
- ## Local setup
219
-
220
- ```bash
221
- pip install -r requirements.txt
222
- python run_modal.py
223
- ```
224
-
225
- If you want the real Modal-backed app, use `run_modal.py`, not `app.py`.
226
-
227
- ## Hugging Face Space deployment target
228
-
229
- The intended hosted version is a Gradio Space in the `build-small-hackathon` org.
230
-
231
- Target format:
232
-
233
- ```text
234
- build-small-hackathon/DoodleBook
235
- ```
236
-
237
- Official target configuration:
238
-
239
- - Hugging Face Space SDK: `gradio`
240
- - Space entrypoint: `app.py`
241
- - Hardware target: `ZeroGPU`
242
- - Space frontend and API live on Hugging Face
243
- - Local or Spaces-managed inference path should be preferred for the official org deployment
244
-
245
- Important distinction:
246
-
247
- - `run_modal.py` is the best local development and debugging path.
248
- - `app.py` is the correct Hugging Face Space entrypoint.
249
- - Do not point the Space metadata at `run_modal.py`, because that is the Modal-backed dev runtime rather than the official hosted Gradio runtime.
250
-
251
- If you choose the Modal-backed hosted variant later, that becomes a different deployment shape and requires secrets.
252
-
253
- Required secrets only for the Modal-backed hosted variant:
254
-
255
- - `MODAL_TOKEN_ID`
256
- - `MODAL_TOKEN_SECRET`
257
- - any Hugging Face token needed by Modal workers for model pulls
258
-
259
- Why the Gradio Space + ZeroGPU shape is preferred for the hackathon org:
260
-
261
- - keeps the user-facing app as a normal Gradio Space
262
- - matches the official hackathon org publishing model
263
- - keeps the demo easy to judge, share, and run from the org page
264
- - avoids depending on a separate private frontend host
265
-
266
- Tradeoff:
267
-
268
- - The pure ZeroGPU path is easier to host in the official org.
269
- - The Modal-backed path currently gives stronger image and TTS quality.
270
- - The repo keeps both because local quality validation and official hosting have different constraints.
271
-
272
- ## Hackathon fit
273
-
274
- This project targets the hackathon stack in a deliberate way:
275
-
276
- - Small-model reasoning for story generation
277
- - Strong but scoped rendering model for visuals
278
- - Distinct multimodal outputs: story, illustrations, narration, coloring book
279
- - Real product UX instead of a bare prompt box
280
- - Clear deployment story for Hugging Face Spaces plus Modal GPU workers
281
-
282
- ## Contributors
283
-
284
- - Sushruth S.
285
- - OpenAI Codex: debugging, architecture fixes, rendering pipeline fixes, README and release preparation
286
-
287
- ## License
288
-
289
- Apache-2.0. See [LICENSE](LICENSE).
 
 
 
 
1
+ ---
2
+ title: DoodleBook
3
+ emoji: 📚
4
+ colorFrom: yellow
5
+ colorTo: red
6
+ sdk: gradio
7
+ sdk_version: "5.50.0"
8
+ app_file: app.py
9
+ pinned: false
10
+ tags:
11
+ - hackathon
12
+ - build-small
13
+ - adventure-in-thousand-token-wood
14
+ - gradio
15
+ - flux
16
+ - minicpm
17
+ - voxcpm
18
+ - storybook
19
+ - coloring-book
20
+ models:
21
+ - black-forest-labs/FLUX.2-klein-4B
22
+ - openbmb/MiniCPM5-1B
23
+ - openbmb/VoxCPM2
24
+ ---
25
+
26
+ # DoodleBook
27
+
28
+ Draw a character, upload it, and DoodleBook turns it into a narrated six-page picture book plus a matching printable coloring book.
29
+
30
+ The project was built for the Build Small Hackathon 2026. The core idea is to keep the reasoning stack small, use a strong image renderer only where it matters, and make the whole flow feel like a child-facing product instead of a model demo.
31
+
32
+ ## What it does
33
+
34
+ - Takes a doodle photo from upload or webcam.
35
+ - Generates a six-page children's story with a consistent hero.
36
+ - Renders six full-color story pages with FLUX.
37
+ - Generates narration audio for the whole book.
38
+ - Exports a story PDF.
39
+ - Generates a matching black-and-white coloring book as a second output.
40
+
41
+ ## Current architecture
42
+
43
+ There are two runtime modes in this repo.
44
+
45
+ ### 1. Local Modal-backed app
46
+
47
+ Use [run_modal.py](run_modal.py) for the real end-to-end flow during development.
48
+
49
+ - UI: Gradio 5 custom Blocks layout
50
+ - Story: local generator by default, optional Modal MiniCPM route
51
+ - Images: Modal FLUX pipeline
52
+ - TTS: Modal VoxCPM pipeline
53
+ - PDFs: local export
54
+ - Coloring book: direct FLUX line-art render, with traced fallback
55
+
56
+ Start it with:
57
+
58
+ ```bash
59
+ python run_modal.py
60
+ ```
61
+
62
+ The default local URL is:
63
+
64
+ ```text
65
+ http://127.0.0.1:7880
66
+ ```
67
+
68
+ ### 2. HF Spaces / ZeroGPU-oriented app
69
+
70
+ Use [app.py](app.py) for the official Hugging Face Gradio Space target.
71
+
72
+ - `app.py` is the Space entrypoint declared in the repo metadata.
73
+ - `app_zerogpu.py` is the alternate experimental path kept for local ZeroGPU-focused iteration.
74
+
75
+ ## Stack used in the hackathon
76
+
77
+ This project deliberately mixes a small-model reasoning stack, a stronger dedicated image renderer, a custom Gradio presentation layer, and remote inference infrastructure that is cheap enough to demo but strong enough to feel like a real product.
78
+
79
+ The important distinction is:
80
+
81
+ - the app "brain" is small
82
+ - the renderer is specialized
83
+ - the UX is product-shaped, not notebook-shaped
84
+ - the deployment path is built around a Gradio Space front-end
85
+
86
+ ### Full stack at a glance
87
+
88
+ | Layer | Stack | Role in the product |
89
+ |---|---|---|
90
+ | Product UI | Gradio 5 Blocks + custom CSS/HTML/JS | Child-facing scrapbook interface, status streaming, downloads |
91
+ | Story engine | MiniCPM5-1B + local structured fallback | Writes the six-page narrative and scene plan |
92
+ | Image engine | FLUX.2-klein-4B on Modal | Draws consistent full-color pages and dedicated coloring pages |
93
+ | Voice engine | VoxCPM2 on Modal | Narrates the full storybook |
94
+ | Coloring engine | Direct FLUX line-art pass + cleanup fallback | Produces printable black-and-white pages |
95
+ | Export layer | Pillow + FPDF | Builds printable story and coloring PDFs |
96
+ | Hosting target | Hugging Face Spaces | Gradio app shell and user entrypoint |
97
+ | Remote compute | Modal | GPU execution for heavy image and TTS work |
98
+ | Observability | Heartbeat streaming + stage timing in trace panel | Keeps long runs visible and debuggable |
99
+
100
+ ### Frontend and product shell
101
+
102
+ - Gradio 5
103
+ - Custom scrapbook-style UI in [ui/layout.py](ui/layout.py)
104
+ - HTML-based book rendering in [book_builder.py](book_builder.py)
105
+ - Fixed-position PDF downloads under the status panel
106
+ - Streaming progress heartbeats to keep long jobs alive in the browser
107
+ - File-backed page rendering instead of giant inline base64 payloads
108
+
109
+ ### Story generation stack
110
+
111
+ - `openbmb/MiniCPM5-1B`
112
+ - Local fast fallback story generator in [services/story.py](services/story.py)
113
+ - Optional Modal story worker in [modal_workers/modal_story_gen.py](modal_workers/modal_story_gen.py)
114
+
115
+ Why it matters:
116
+ - The story model is the small-model "brain" of the app.
117
+ - It keeps the narrative stack small and hackathon-aligned.
118
+ - The story system outputs both prose and scene prompts, so downstream image generation stays structured.
119
+ - The local structured fallback means the Space can still produce a valid book if the remote story path is unavailable.
120
+
121
+ ### Image generation stack
122
+
123
+ - `black-forest-labs/FLUX.2-klein-4B`
124
+ - Modal deployment for image generation in [modal_workers/modal_image_gen.py](modal_workers/modal_image_gen.py)
125
+ - Parallel canonical-character plus per-page render flow in [services/images.py](services/images.py)
126
+ - One canonical character render from the child doodle, then scene-specific page renders
127
+ - Separate direct line-art render path for the coloring book
128
+
129
+ Why it matters:
130
+ - The app needs high visual quality and character consistency.
131
+ - FLUX is used as the renderer, not as the reasoning engine.
132
+ - The character consistency pipeline is what makes the book feel authored rather than randomly reimagined on every page.
133
+ - The line-art renderer is separate because tracing finished crayon pages produced bad coloring results.
134
+
135
+ ### TTS stack
136
+
137
+ - `openbmb/VoxCPM2`
138
+ - Modal TTS worker in [modal_workers/modal_tts.py](modal_workers/modal_tts.py)
139
+ - Service wrapper in [services/tts.py](services/tts.py)
140
+ - Parallelized with image generation in the real Modal-backed app
141
+
142
+ Why it matters:
143
+ - Narration is part of the child-facing experience, not a side feature.
144
+ - TTS runs in parallel with image generation in the real local pipeline.
145
+ - Overlapping TTS with illustration time reduces total wait without degrading output quality.
146
+
147
+ ### Coloring-book stack
148
+
149
+ - Direct FLUX line-art rendering for the same scenes
150
+ - Cleanup and fallback pipeline in [services/coloring.py](services/coloring.py)
151
+ - Modal `render_coloring_page` for dedicated line-art scene generation
152
+ - Local cleanup for thresholding, despeckling, and printable black-on-white output
153
+
154
+ Why it matters:
155
+ - The main bug fixed in this version was that the coloring book used to trace finished crayon-textured images.
156
+ - The improved pipeline renders dedicated line-art pages instead of trying to strip color out after the fact.
157
+ - This is the main quality improvement that separates the current version from the earlier broken coloring-book output.
158
+
159
+ ### Infrastructure stack
160
+
161
+ - Modal for remote GPU inference
162
+ - Hugging Face Spaces as the target host
163
+ - Python 3.11 / 3.13 local development
164
+ - `diffusers`, `transformers`, `torch`, `accelerate`
165
+ - `Pillow`, `OpenCV`, `FPDF`
166
+ - Gradio client-compatible API surface for testing and debugging
167
+ - Hugging Face org deployment target: `build-small-hackathon`
168
+
169
+ ### Sponsor and hackathon alignment
170
+
171
+ This app directly reflects the hackathon sponsor/tool stack:
172
+
173
+ - `OpenBMB`: MiniCPM5-1B and VoxCPM2
174
+ - `Black Forest Labs`: FLUX.2-klein-4B
175
+ - `Modal`: remote GPU inference
176
+ - `OpenAI Codex`: debugging, architecture fixes, deployment preparation, README/release work
177
+ - `Hugging Face Spaces`: final Gradio app surface
178
+
179
+ For hackathon judging, the main narrative is:
180
+
181
+ - Tiny Titan reasoning stack
182
+ - Off-brand custom UI
183
+ - Real multimodal product loop
184
+ - Remote GPU orchestration with a Gradio user experience
185
+ - Child-usable output artifacts: storybook PDF, audio, coloring book PDF
186
+
187
+ ## Key engineering fixes in this version
188
+
189
+ - Added direct Modal coloring-page rendering with `render_coloring_page`.
190
+ - Fixed the live app to keep the Gradio stream alive during long coloring generation.
191
+ - Added stage timing so story, image, PDF, TTS, and coloring costs are visible.
192
+ - Reduced final-page payload size by replacing giant inline base64 book HTML with file-backed image URLs.
193
+ - Fixed download serving through Gradio temp-file paths.
194
+ - Removed port confusion between the local test app and the real Modal-backed app.
195
+
196
+ ## Measured performance
197
+
198
+ Measured against the real local Modal-backed app flow:
199
+
200
+ - Story-only stage: about `0.3s`
201
+ - Full-color book, warm: about `75s to 80s`
202
+ - Full-color book + coloring book, warm: about `200s`
203
+ - Slowest stage: coloring-book generation
204
+
205
+ The current bottleneck is still the coloring-book path, even after the direct line-art fix.
206
+
207
+ ## Repository layout
208
+
209
+ ```text
210
+ app.py Main Gradio app variant
211
+ app_zerogpu.py ZeroGPU-oriented app variant
212
+ run_modal.py Real local Modal-backed app
213
+ book_builder.py HTML and PDF assembly
214
+ services/ Orchestration and fallbacks
215
+ modal_workers/ Modal remote workers
216
+ ui/ Custom Gradio layout
217
+ assets/ Sample doodles and sample book pages
218
+ docs/ Specs and notes
219
+ ```
220
+
221
+ ## Local setup
222
+
223
+ ```bash
224
+ pip install -r requirements.txt
225
+ python run_modal.py
226
+ ```
227
+
228
+ If you want the real Modal-backed app, use `run_modal.py`, not `app.py`.
229
+
230
+ ## Hugging Face Space deployment target
231
+
232
+ The intended hosted version is a Gradio Space in the `build-small-hackathon` org.
233
+
234
+ Target format:
235
+
236
+ ```text
237
+ build-small-hackathon/DoodleBook
238
+ ```
239
+
240
+ Official target configuration:
241
+
242
+ - Hugging Face Space SDK: `gradio`
243
+ - Space entrypoint: `app.py`
244
+ - Hardware target: `ZeroGPU`
245
+ - Space frontend and API live on Hugging Face
246
+ - Local or Spaces-managed inference path should be preferred for the official org deployment
247
+
248
+ Important distinction:
249
+
250
+ - `run_modal.py` is the best local development and debugging path.
251
+ - `app.py` is the correct Hugging Face Space entrypoint.
252
+ - Do not point the Space metadata at `run_modal.py`, because that is the Modal-backed dev runtime rather than the official hosted Gradio runtime.
253
+
254
+ If you choose the Modal-backed hosted variant later, that becomes a different deployment shape and requires secrets.
255
+
256
+ Required secrets only for the Modal-backed hosted variant:
257
+
258
+ - `MODAL_TOKEN_ID`
259
+ - `MODAL_TOKEN_SECRET`
260
+ - any Hugging Face token needed by Modal workers for model pulls
261
+
262
+ Why the Gradio Space + ZeroGPU shape is preferred for the hackathon org:
263
+
264
+ - keeps the user-facing app as a normal Gradio Space
265
+ - matches the official hackathon org publishing model
266
+ - keeps the demo easy to judge, share, and run from the org page
267
+ - avoids depending on a separate private frontend host
268
+
269
+ Tradeoff:
270
+
271
+ - The pure ZeroGPU path is easier to host in the official org.
272
+ - The Modal-backed path currently gives stronger image and TTS quality.
273
+ - The repo keeps both because local quality validation and official hosting have different constraints.
274
+
275
+ ## Hackathon fit
276
+
277
+ This project targets the hackathon stack in a deliberate way:
278
+
279
+ - Small-model reasoning for story generation
280
+ - Strong but scoped rendering model for visuals
281
+ - Distinct multimodal outputs: story, illustrations, narration, coloring book
282
+ - Real product UX instead of a bare prompt box
283
+ - Clear deployment story for Hugging Face Spaces plus Modal GPU workers
284
+
285
+ ## Contributors
286
+
287
+ - Sushruth S.
288
+ - OpenAI Codex: debugging, architecture fixes, rendering pipeline fixes, README and release preparation
289
+
290
+ ## License
291
+
292
+ Apache-2.0. See [LICENSE](LICENSE).
app.py CHANGED
@@ -1,726 +1,697 @@
1
- """
2
- DoodleBook — HF ZeroGPU Version
3
-
4
- Free T4 GPU on Hugging Face Spaces!
5
- No Modal needed.
6
- """
7
-
8
- import gradio as gr
9
- import os
10
- import sys
11
- import torch
12
- try:
13
- import spaces
14
- except ModuleNotFoundError:
15
- # `spaces` only exists on HF ZeroGPU. Off-HF (local/dev) provide a no-op so
16
- # the app still runs; generation then uses whatever local GPU/CPU exists.
17
- class _SpacesShim:
18
- @staticmethod
19
- def GPU(*args, **kwargs):
20
- if args and callable(args[0]): # bare @spaces.GPU
21
- return args[0]
22
- def deco(fn): # @spaces.GPU(duration=...)
23
- return fn
24
- return deco
25
- spaces = _SpacesShim()
26
- import json
27
- import time
28
- import tempfile
29
- import logging
30
- import struct
31
- import re
32
-
33
- sys.path.insert(0, os.path.dirname(__file__))
34
-
35
- from config import (
36
- FLUX_MODEL, STORY_MODEL, TTS_MODEL,
37
- GENERATION_PARAMS, SAMPLE_BOOK_PATH, BASE_SEED, page_seed,
38
- DEFAULT_VOICE, voice_design,
39
- )
40
- from book_builder import (
41
- build_book_html, export_pdf, magic_loader_html,
42
- build_coloring_html, export_coloring_pdf,
43
- )
44
- from ui.layout import create_layout
45
-
46
- logging.basicConfig(level=logging.INFO)
47
- logger = logging.getLogger(__name__)
48
-
49
- _STORY_MODEL = None
50
- _STORY_TOKENIZER = None
51
- _IMAGE_PIPE = None
52
- _IMAGE_PIPE_KIND = None
53
- _TTS_MODEL = None
54
-
55
- COLOR_ART_STYLE = (
56
- "children's crayon storybook illustration, bold black outlines, "
57
- "flat bright colors, simple shapes"
58
- )
59
- COLOR_PAGE_SUFFIX = "full colorful background scene, the character clearly visible."
60
- LINE_ART_STYLE = (
61
- "children's coloring book page, pure black ink outlines on pure white paper, "
62
- "clean contour lines, no color, no gray, no shading, no texture, "
63
- "no hatching, no pencil marks, open spaces to color"
64
- )
65
- LINE_ART_SUFFIX = (
66
- "simple clean background shapes, same composition, thick readable outlines, "
67
- "no filled black areas, no extra sketch marks."
68
- )
69
-
70
- THEME_TEMPLATES = {
71
- "brave adventure": [
72
- ("{hero} loved exploring new places.", "{hero} standing at the start of a bright adventure trail"),
73
- ("One morning, {hero} discovered something glowing nearby.", "{hero} spotting a magical glow in the distance"),
74
- ("Taking a deep breath, {hero} bravely went closer.", "{hero} walking forward with courage"),
75
- ("There, a new friend needed help.", "{hero} finding a small friend in trouble"),
76
- ("{hero} helped with kindness and a clever idea.", "{hero} helping the friend together"),
77
- ("Everyone cheered, and {hero} felt proud and brave.", "{hero} celebrating at sunset with the new friend"),
78
- ],
79
- "making a new friend": [
80
- ("{hero} was playing alone in a sunny place.", "{hero} playing under a bright sky"),
81
- ("Then {hero} noticed someone shy nearby.", "{hero} seeing a shy new friend nearby"),
82
- ("{hero} smiled and said hello.", "{hero} waving with a friendly smile"),
83
- ("Soon they were sharing stories and laughs.", "{hero} and the new friend laughing together"),
84
- ("They played games all afternoon.", "{hero} and the new friend playing together"),
85
- ("By sunset, {hero} had made a wonderful new friend.", "{hero} and the new friend smiling together at sunset"),
86
- ],
87
- }
88
-
89
- FEW_SHOT_EXEMPLAR = """
90
- Write a 6-page children's storybook for age 5 about Luna the cat with theme: brave adventure.
91
-
92
- Return ONLY valid JSON:
93
- {
94
- "title": "Luna's Brave Adventure",
95
- "character_description": "A small orange tabby cat named Luna with big green eyes, whiskers, and a tiny red scarf",
96
- "pages": [
97
- {"page": 1, "text": "Luna was a small orange cat who loved to explore.", "scene": "Luna sitting by the window looking outside"},
98
- {"page": 2, "text": "One sunny morning, Luna saw something sparkling in the forest.", "scene": "Luna spotting a glow in the trees"},
99
- {"page": 3, "text": "Bravely, Luna crept into the forest to investigate.", "scene": "Luna walking cautiously through trees"},
100
- {"page": 4, "text": "It was a tiny fairy stuck in a spider web!", "scene": "Luna discovering a fairy in trouble"},
101
- {"page": 5, "text": "Luna gently freed the fairy with her paw.", "scene": "Luna carefully helping the fairy"},
102
- {"page": 6, "text": "The fairy thanked Luna and they became friends forever.", "scene": "Luna and fairy playing together at sunset"}
103
- ]
104
- }
105
- """
106
-
107
-
108
- def build_story_prompt(hero_name: str, theme: str, age: int) -> str:
109
- return f"""{FEW_SHOT_EXEMPLAR}
110
-
111
- Write a 6-page children's storybook for age {age} about {hero_name} with theme: {theme}.
112
-
113
- Return ONLY valid JSON:
114
- """
115
-
116
-
117
- def _validate_story_structure(story: dict) -> bool:
118
- required_keys = ["title", "character_description", "pages"]
119
- if not all(k in story for k in required_keys):
120
- return False
121
- pages = story.get("pages", [])
122
- if not isinstance(pages, list) or len(pages) < 1:
123
- return False
124
- first_page = pages[0]
125
- return all(k in first_page for k in ["page", "text", "scene"])
126
-
127
-
128
- def _repair_json(json_str: str) -> str:
129
- json_str = re.sub(r',\s*([}\]])', r'\1', json_str)
130
- json_str = re.sub(r'//.*?$', '', json_str, flags=re.MULTILINE)
131
- json_str = re.sub(r'/\*[\s\S]*?\*/', '', json_str)
132
- json_str = re.sub(r'(?<=")\n(?=")', '\\n', json_str)
133
- json_str = re.sub(r'(\s)(\w+)(\s*:)', r'\1"\2"\3', json_str)
134
- return json_str
135
-
136
-
137
- def parse_story_json(raw_output: str) -> dict | None:
138
- match = re.search(r'\{[\s\S]*\}', raw_output or "")
139
- if not match:
140
- return None
141
- raw_json = match.group(0)
142
- for candidate in (raw_json, _repair_json(raw_json)):
143
- try:
144
- story = json.loads(candidate)
145
- if _validate_story_structure(story):
146
- return story
147
- except Exception:
148
- continue
149
- return None
150
-
151
-
152
- def _normalize_story(story: dict) -> dict:
153
- pages = list(story.get("pages", []))[:6]
154
- while len(pages) < 6:
155
- pages.append({
156
- "page": len(pages) + 1,
157
- "text": "And the adventure continued happily.",
158
- "scene": "Continuing adventure",
159
- })
160
- story["pages"] = pages
161
- story.setdefault("title", "A Wonderful Adventure")
162
- story.setdefault(
163
- "character_description",
164
- "A friendly children's storybook hero with bright colors and cheerful features",
165
- )
166
- return story
167
-
168
-
169
- def build_story_locally(hero_name: str, theme: str) -> dict:
170
- """Fast, deterministic fallback story that avoids any Modal dependency."""
171
- hero = (hero_name or "Little Hero").strip() or "Little Hero"
172
- beats = THEME_TEMPLATES.get(theme, THEME_TEMPLATES["brave adventure"])
173
- pages = [
174
- {"page": i + 1, "text": text.format(hero=hero), "scene": scene.format(hero=hero)}
175
- for i, (text, scene) in enumerate(beats)
176
- ]
177
- return {
178
- "title": f"{hero}'s Storybook Adventure",
179
- "character_description": (
180
- f"{hero}, a friendly children's storybook hero with bright colors, "
181
- "bold outlines, and a cheerful expressive face"
182
- ),
183
- "pages": pages,
184
- }
185
-
186
-
187
- def silent_wav_bytes(duration_seconds: int = 2, sample_rate: int = 24000) -> bytes:
188
- """Return a short silent WAV so the UI remains stable if TTS is unavailable."""
189
- num_samples = sample_rate * duration_seconds
190
- data_size = num_samples * 2
191
- header = struct.pack(
192
- "<4sI4s4sIHHIIHH4sI",
193
- b"RIFF", 36 + data_size, b"WAVE",
194
- b"fmt ", 16, 1, 1, sample_rate, sample_rate * 2, 2, 16,
195
- b"data", data_size,
196
- )
197
- return header + (b"\x00" * data_size)
198
-
199
-
200
- def _with_heartbeat(blocking_fn, frame_fn, poll=4.0):
201
- import threading
202
-
203
- box = {}
204
-
205
- def _run():
206
- try:
207
- box["val"] = blocking_fn()
208
- except BaseException as e:
209
- box["err"] = e
210
-
211
- th = threading.Thread(target=_run, daemon=True)
212
- th.start()
213
- t0 = time.time()
214
- while th.is_alive():
215
- th.join(timeout=poll)
216
- if th.is_alive():
217
- yield ("hb", frame_fn(int(time.time() - t0)))
218
- if "err" in box:
219
- raise box["err"]
220
- yield ("done", box["val"])
221
-
222
-
223
- # ============================================================================
224
- # SAMPLE BOOK (loads instantly, no GPU needed)
225
- # ============================================================================
226
-
227
- SAMPLE_BOOK_HTML = None
228
-
229
- def load_sample_book() -> str:
230
- """Load pre-generated sample book (C3: always ship sample)."""
231
- global SAMPLE_BOOK_HTML
232
- if SAMPLE_BOOK_HTML:
233
- return SAMPLE_BOOK_HTML
234
-
235
- sample_path = os.path.join(SAMPLE_BOOK_PATH, "sample.html")
236
- if os.path.exists(sample_path):
237
- with open(sample_path, "r", encoding="utf-8") as f:
238
- SAMPLE_BOOK_HTML = f.read()
239
- return SAMPLE_BOOK_HTML
240
-
241
- return "<div class='page-loading'>Loading sample book...</div>"
242
-
243
-
244
- # ============================================================================
245
- # ZEROGPU INFERENCE FUNCTIONS
246
- # ============================================================================
247
-
248
- @spaces.GPU(duration=60)
249
- def generate_story_gpu(hero_name: str, theme: str, age: int = 5) -> dict:
250
- """Generate a story on ZeroGPU, falling back to a deterministic local story."""
251
- global _STORY_MODEL, _STORY_TOKENIZER
252
- try:
253
- from transformers import AutoTokenizer, AutoModelForCausalLM
254
-
255
- if _STORY_MODEL is None or _STORY_TOKENIZER is None:
256
- logger.info(f"Loading story model: {STORY_MODEL.hub_id}")
257
- _STORY_TOKENIZER = AutoTokenizer.from_pretrained(STORY_MODEL.hub_id, trust_remote_code=True)
258
- _STORY_MODEL = AutoModelForCausalLM.from_pretrained(
259
- STORY_MODEL.hub_id,
260
- torch_dtype=torch.float16,
261
- trust_remote_code=True,
262
- ).cuda().eval()
263
-
264
- prompt = build_story_prompt(hero_name, theme, age)
265
- inputs = _STORY_TOKENIZER.apply_chat_template(
266
- [{"role": "user", "content": prompt}],
267
- add_generation_prompt=True,
268
- enable_thinking=False,
269
- return_dict=True,
270
- return_tensors="pt",
271
- ).to("cuda")
272
- with torch.no_grad():
273
- out = _STORY_MODEL.generate(
274
- **inputs,
275
- max_new_tokens=GENERATION_PARAMS.max_story_tokens,
276
- do_sample=False,
277
- )
278
- response = _STORY_TOKENIZER.decode(
279
- out[0][inputs["input_ids"].shape[1]:],
280
- skip_special_tokens=True,
281
- )
282
- parsed = parse_story_json(response)
283
- if parsed:
284
- return _normalize_story(parsed)
285
- logger.warning("Story parser failed; using deterministic local fallback")
286
- except Exception as e:
287
- logger.warning(f"ZeroGPU story generation failed: {e}")
288
- return _normalize_story(build_story_locally(hero_name, theme))
289
-
290
-
291
- def _get_image_pipe(tiny: bool):
292
- global _IMAGE_PIPE, _IMAGE_PIPE_KIND
293
- desired = "tiny" if tiny else "flux"
294
- if _IMAGE_PIPE is not None and _IMAGE_PIPE_KIND == desired:
295
- return _IMAGE_PIPE
296
-
297
- if tiny:
298
- from diffusers import AutoPipelineForText2Image
299
- pipe = AutoPipelineForText2Image.from_pretrained(
300
- "stabilityai/sd-turbo",
301
- torch_dtype=torch.float16,
302
- ).cuda()
303
- else:
304
- from diffusers import Flux2KleinPipeline
305
- pipe = Flux2KleinPipeline.from_pretrained(
306
- FLUX_MODEL.hub_id,
307
- torch_dtype=torch.bfloat16,
308
- ).cuda()
309
- pipe.enable_model_cpu_offload()
310
-
311
- _IMAGE_PIPE = pipe
312
- _IMAGE_PIPE_KIND = desired
313
- return pipe
314
-
315
-
316
- @spaces.GPU(duration=120)
317
- def generate_images_gpu(
318
- character_desc: str,
319
- scenes: list,
320
- doodle_bytes: bytes = None,
321
- seed: int = 42,
322
- tiny: bool = False
323
- ) -> list:
324
- """Generate all 6 images using FLUX on ZeroGPU."""
325
- import io
326
- from PIL import Image
327
-
328
- pipe = _get_image_pipe(tiny)
329
- if tiny:
330
- num_steps = 4
331
- guidance = 0.0
332
- else:
333
- num_steps = 6
334
- guidance = 1.0
335
-
336
- canonical = None
337
- if doodle_bytes:
338
- try:
339
- ref = Image.open(io.BytesIO(doodle_bytes)).convert("RGB")
340
- kw = dict(
341
- prompt=(f"Turn this child's drawing into a clean, friendly, full-body cartoon "
342
- f"character for a children's storybook. Keep the EXACT same creature, "
343
- f"face, and features as the drawing. {COLOR_ART_STYLE}, "
344
- f"plain white background, full character visible, centered."),
345
- height=768, width=768, guidance_scale=guidance,
346
- num_inference_steps=num_steps,
347
- generator=torch.Generator("cuda").manual_seed(seed)
348
- )
349
- if tiny:
350
- kw["prompt"] = f"A friendly cartoon character, {COLOR_ART_STYLE}"
351
- else:
352
- kw["image"] = ref
353
- canonical = pipe(**kw).images[0]
354
- logger.info("Canonical character built from doodle")
355
- except Exception as e:
356
- logger.warning(f"Canonical build failed ({e}); text2img fallback")
357
- canonical = None
358
-
359
- images = []
360
- for i, scene in enumerate(scenes):
361
- if canonical is not None and not tiny:
362
- prompt = f"The same character. {scene}. {COLOR_ART_STYLE}, {COLOR_PAGE_SUFFIX}"
363
- kw = dict(image=canonical, prompt=prompt)
364
- else:
365
- prompt = (
366
- f"{character_desc}. Scene: {scene}. {COLOR_ART_STYLE}, "
367
- f"white background, centered, full character visible"
368
- )
369
- kw = dict(prompt=prompt)
370
-
371
- kw.update(dict(
372
- height=768, width=768, guidance_scale=guidance,
373
- num_inference_steps=num_steps,
374
- generator=torch.Generator("cuda").manual_seed(seed + i + 1)
375
- ))
376
-
377
- image = pipe(**kw).images[0]
378
- images.append(image)
379
- logger.info(f"Generated page {i+1}/6")
380
-
381
- return images
382
-
383
-
384
- @spaces.GPU(duration=120)
385
- def generate_coloring_images_gpu(
386
- character_desc: str,
387
- scenes: list,
388
- doodle_bytes: bytes = None,
389
- seed: int = 42,
390
- tiny: bool = False
391
- ) -> list:
392
- """Generate coloring pages directly with FLUX instead of tracing color pages."""
393
- import io
394
- from PIL import Image
395
-
396
- pipe = _get_image_pipe(tiny)
397
- if tiny:
398
- num_steps = 4
399
- guidance = 0.0
400
- else:
401
- num_steps = 6
402
- guidance = 1.0
403
-
404
- canonical = None
405
- if doodle_bytes:
406
- try:
407
- ref = Image.open(io.BytesIO(doodle_bytes)).convert("RGB")
408
- kw = dict(
409
- prompt=(f"Turn this child's drawing into a clean, friendly, full-body cartoon "
410
- f"character for a children's coloring book. Keep the EXACT same creature, "
411
- f"face, and features as the drawing. {LINE_ART_STYLE}, "
412
- f"plain white background, full character visible, centered."),
413
- height=768, width=768, guidance_scale=guidance,
414
- num_inference_steps=num_steps,
415
- generator=torch.Generator('cuda').manual_seed(seed)
416
- )
417
- if tiny:
418
- kw["prompt"] = f"A friendly cartoon character, {LINE_ART_STYLE}"
419
- else:
420
- kw["image"] = ref
421
- canonical = pipe(**kw).images[0]
422
- logger.info("Line-art canonical character built from doodle")
423
- except Exception as e:
424
- logger.warning(f"Line-art canonical build failed ({e}); text2img fallback")
425
- canonical = None
426
-
427
- images = []
428
- for i, scene in enumerate(scenes):
429
- if canonical is not None and not tiny:
430
- prompt = f"The same character. {scene}. {LINE_ART_STYLE}, {LINE_ART_SUFFIX}"
431
- kw = dict(image=canonical, prompt=prompt)
432
- else:
433
- prompt = (
434
- f"{character_desc}. Scene: {scene}. {LINE_ART_STYLE}, "
435
- f"white background, centered, full character visible"
436
- )
437
- kw = dict(prompt=prompt)
438
-
439
- kw.update(dict(
440
- height=768, width=768, guidance_scale=guidance,
441
- num_inference_steps=num_steps,
442
- generator=torch.Generator("cuda").manual_seed(seed + i + 101)
443
- ))
444
-
445
- image = pipe(**kw).images[0]
446
- images.append(image)
447
- logger.info(f"Generated coloring page {i+1}/6")
448
-
449
- return images
450
-
451
-
452
- @spaces.GPU(duration=30)
453
- def generate_tts_gpu(text: str, voice: str = DEFAULT_VOICE) -> bytes:
454
- """Generate TTS when available; otherwise return a tiny silent WAV."""
455
- global _TTS_MODEL
456
- import io
457
- import numpy as np
458
-
459
- try:
460
- from voxcpm import VoxCPM
461
- if _TTS_MODEL is None:
462
- logger.info(f"Loading TTS model: {TTS_MODEL.hub_id}")
463
- _TTS_MODEL = VoxCPM.from_pretrained(TTS_MODEL.hub_id, load_denoiser=False)
464
- model = _TTS_MODEL
465
-
466
- design = voice_design(voice)
467
-
468
- import re
469
- chunks = [s.strip() for s in re.split(r"(?<=[.!?])\s+", text) if s.strip()]
470
- if not chunks:
471
- chunks = [text.strip() or "The end."]
472
-
473
- sr = model.tts_model.sample_rate
474
- pause = np.zeros(int(sr * 0.35), dtype=np.float32)
475
- pieces = []
476
-
477
- for i, sentence in enumerate(chunks):
478
- wav = model.generate(
479
- text=f"{design} {sentence}",
480
- cfg_value=2.0,
481
- inference_timesteps=10,
482
- )
483
- pieces.append(np.asarray(wav, dtype=np.float32))
484
- if i < len(chunks) - 1:
485
- pieces.append(pause)
486
-
487
- audio = np.concatenate(pieces)
488
- import soundfile as sf
489
- buf = io.BytesIO()
490
- sf.write(buf, audio, sr, format="WAV")
491
- return buf.getvalue()
492
-
493
- except Exception as e:
494
- logger.warning(f"TTS unavailable on Space ({e}); returning silent fallback")
495
- return silent_wav_bytes()
496
-
497
-
498
- # ============================================================================
499
- # MAIN BOOK CREATION (Generator for streaming)
500
- # ============================================================================
501
-
502
- def create_book(doodle_image, character_name, theme, hero_name, tiny_mode=False, voice=DEFAULT_VOICE, make_coloring=False):
503
- """ZeroGPU copy of the local app flow with heartbeats, timing, and coloring support."""
504
- t_total = time.perf_counter()
505
- character_name = (character_name or "").strip() or "Little Hero"
506
- hero_name = (hero_name or "").strip() or character_name
507
-
508
- trace_data = {
509
- "backend": "zerogpu",
510
- "hero_name": hero_name,
511
- "theme": theme,
512
- "tiny_mode": tiny_mode,
513
- "voice": voice,
514
- "make_coloring": make_coloring,
515
- "seed": BASE_SEED,
516
- "timestamp": time.strftime("%Y-%m-%d %H:%M:%S")
517
- }
518
-
519
- _no = gr.update(visible=False)
520
- _keep = gr.update()
521
-
522
- yield (
523
- magic_loader_html("story", hero_name),
524
- "Writing the story",
525
- None, _keep, {}, "", json.dumps(trace_data, indent=2),
526
- _no, _keep,
527
- )
528
-
529
- t_story = time.perf_counter()
530
- try:
531
- story = generate_story_gpu(hero_name, theme)
532
- except Exception as e:
533
- logger.error(f"Story generation failed: {e}")
534
- yield (
535
- f"<div class='page-loading'>Error: {e}</div>",
536
- f"Error: {e}",
537
- None, _keep, {}, "", "",
538
- _no, _keep,
539
- )
540
- return
541
- trace_data["story_sec"] = round(time.perf_counter() - t_story, 2)
542
-
543
- pages = story.get("pages", [])
544
- char_desc = story.get("character_description", "")
545
- title = story.get("title", "Untitled Story")
546
- page_texts = [p.get("text", "") for p in pages]
547
- scenes = [p.get("scene", "") for p in pages]
548
-
549
- trace_data["title"] = title
550
- trace_data["character_description"] = char_desc
551
-
552
- yield (
553
- magic_loader_html("images", hero_name),
554
- f"{title} illustrating on ZeroGPU…",
555
- None, _keep, story, "", json.dumps(trace_data, indent=2),
556
- _no, _keep,
557
- )
558
-
559
- doodle_bytes = None
560
- if doodle_image is not None:
561
- import io
562
- from PIL import Image
563
- img = Image.fromarray(doodle_image)
564
- buf = io.BytesIO()
565
- img.save(buf, format="PNG")
566
- doodle_bytes = buf.getvalue()
567
-
568
- import threading
569
- voice_box = {}
570
- full_text = f"{title}. {' '.join(page_texts)}"
571
- t_tts = time.perf_counter()
572
-
573
- def _do_voice():
574
- try:
575
- voice_box["bytes"] = generate_tts_gpu(full_text, voice)
576
- except Exception as e:
577
- voice_box["err"] = e
578
-
579
- voice_thread = threading.Thread(target=_do_voice, daemon=True)
580
- voice_thread.start()
581
-
582
- img_bytes, engine = None, "sketch"
583
- t_images = time.perf_counter()
584
- try:
585
- for kind, payload in _with_heartbeat(
586
- lambda: generate_images_gpu(char_desc, scenes, doodle_bytes, BASE_SEED, tiny_mode),
587
- lambda s: (
588
- magic_loader_html("images", hero_name),
589
- f"{title} — illustrating… {s}s (voice recording in parallel)",
590
- None, _keep, story, "", json.dumps(trace_data, indent=2), _no, _keep,
591
- ),
592
- ):
593
- if kind == "hb":
594
- yield payload
595
- else:
596
- images = payload
597
- import io
598
- img_bytes = []
599
- for img in images:
600
- buf = io.BytesIO()
601
- img.save(buf, format="PNG")
602
- img_bytes.append(buf.getvalue())
603
- engine = "flux"
604
- except Exception as e:
605
- logger.error(f"Image generation failed: {e}")
606
- from services.images import generate_placeholder_images
607
- img_bytes = generate_placeholder_images(char_desc, scenes, doodle_bytes)
608
- engine = "sketch"
609
- trace_data["images_sec"] = round(time.perf_counter() - t_images, 2)
610
- trace_data["engine"] = engine
611
-
612
- book_html = build_book_html(img_bytes, page_texts, title, engine)
613
-
614
- while voice_thread.is_alive():
615
- voice_thread.join(timeout=4)
616
- if voice_thread.is_alive():
617
- yield (
618
- book_html,
619
- f"{title} finishing narration…",
620
- None, _keep, story, "", json.dumps(trace_data, indent=2),
621
- _no, _keep,
622
- )
623
-
624
- audio_path = None
625
- trace_data["tts_sec"] = round(time.perf_counter() - t_tts, 2)
626
- if voice_box.get("bytes"):
627
- try:
628
- with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
629
- tmp.write(voice_box["bytes"])
630
- audio_path = tmp.name
631
- except Exception as e:
632
- logger.warning(f"writing audio failed: {e}")
633
- elif "err" in voice_box:
634
- logger.warning(f"TTS failed: {voice_box['err']}")
635
-
636
- pdf_path = None
637
- t_pdf = time.perf_counter()
638
- try:
639
- with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp:
640
- pdf_path = export_pdf(img_bytes, page_texts, title, tmp.name)
641
- except Exception as e:
642
- logger.warning(f"PDF failed: {e}")
643
- trace_data["pdf_sec"] = round(time.perf_counter() - t_pdf, 2)
644
-
645
- coloring_html = ""
646
- coloring_pdf_path = None
647
- if make_coloring:
648
- t_coloring = time.perf_counter()
649
- try:
650
- from services.coloring import _crispen
651
- for kind, payload in _with_heartbeat(
652
- lambda: generate_coloring_images_gpu(char_desc, scenes, doodle_bytes, BASE_SEED, tiny_mode),
653
- lambda s: (
654
- book_html,
655
- f"{title} building coloring book… {s}s",
656
- audio_path,
657
- _keep,
658
- story,
659
- "",
660
- json.dumps(trace_data, indent=2),
661
- _no,
662
- _keep,
663
- ),
664
- ):
665
- if kind == "hb":
666
- yield payload
667
- else:
668
- coloring_images = payload
669
- import io
670
- outlines = []
671
- for img in coloring_images:
672
- buf = io.BytesIO()
673
- img.save(buf, format="PNG")
674
- outlines.append(_crispen(buf.getvalue()))
675
- coloring_html = build_coloring_html(outlines, page_texts, title)
676
- with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp:
677
- coloring_pdf_path = export_coloring_pdf(outlines, page_texts, title, tmp.name)
678
- trace_data["coloring_book"] = True
679
- trace_data["coloring_engine"] = "flux-direct-lineart"
680
- except Exception as e:
681
- logger.warning(f"Direct FLUX coloring book failed ({e}); using traced fallback")
682
- try:
683
- from services.coloring import derive_coloring_pages
684
- outlines = derive_coloring_pages(img_bytes)
685
- coloring_html = build_coloring_html(outlines, page_texts, title)
686
- with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp:
687
- coloring_pdf_path = export_coloring_pdf(outlines, page_texts, title, tmp.name)
688
- trace_data["coloring_book"] = True
689
- trace_data["coloring_engine"] = "trace-fallback"
690
- except Exception as e2:
691
- logger.warning(f"Coloring book fallback failed: {e2}")
692
- trace_data["coloring_sec"] = round(time.perf_counter() - t_coloring, 2)
693
-
694
- trace_data["completed"] = True
695
- trace_data["pages_generated"] = len(img_bytes)
696
- trace_data["total_sec"] = round(time.perf_counter() - t_total, 2)
697
-
698
- pdf_update = gr.update(value=pdf_path) if pdf_path else _keep
699
- coloring_pdf_update = gr.update(value=coloring_pdf_path) if coloring_pdf_path else _keep
700
- coloring_display_update = (gr.update(visible=True, value=coloring_html) if coloring_html
701
- else _no)
702
-
703
- yield (
704
- book_html,
705
- f"Complete: {title} — {len(img_bytes)} pages · {'FLUX (ZeroGPU)' if engine == 'flux' else 'local sketch fallback'} · voice: {voice} · total {trace_data['total_sec']}s",
706
- audio_path,
707
- pdf_update,
708
- story,
709
- f"Pages: {len(img_bytes)} | Seed: {BASE_SEED} | Mode: {'Tiny' if tiny_mode else 'Standard'} | Engine: {engine} | Story {trace_data.get('story_sec', 0)}s | Images {trace_data.get('images_sec', 0)}s | PDF {trace_data.get('pdf_sec', 0)}s | Coloring {trace_data.get('coloring_sec', 0)}s",
710
- json.dumps(trace_data, indent=2),
711
- coloring_display_update,
712
- coloring_pdf_update,
713
- )
714
-
715
-
716
- # ============================================================================
717
- # MAIN
718
- # ============================================================================
719
-
720
- if __name__ == "__main__":
721
- demo = create_layout(
722
- load_sample_fn=load_sample_book,
723
- create_book_fn=create_book,
724
- )
725
- demo.queue(default_concurrency_limit=2, max_size=8)
726
- demo.launch(share=False, allowed_paths=[tempfile.gettempdir()])
 
1
+ """
2
+ DoodleBook — HF ZeroGPU Version
3
+
4
+ Free T4 GPU on Hugging Face Spaces!
5
+ No Modal needed.
6
+ """
7
+
8
+ import gradio as gr
9
+ import os
10
+ import sys
11
+ import torch
12
+ try:
13
+ import spaces
14
+ except ModuleNotFoundError:
15
+ # `spaces` only exists on HF ZeroGPU. Off-HF (local/dev) provide a no-op so
16
+ # the app still runs; generation then uses whatever local GPU/CPU exists.
17
+ class _SpacesShim:
18
+ @staticmethod
19
+ def GPU(*args, **kwargs):
20
+ if args and callable(args[0]): # bare @spaces.GPU
21
+ return args[0]
22
+ def deco(fn): # @spaces.GPU(duration=...)
23
+ return fn
24
+ return deco
25
+ spaces = _SpacesShim()
26
+ import json
27
+ import time
28
+ import tempfile
29
+ import logging
30
+ import struct
31
+ import re
32
+
33
+ sys.path.insert(0, os.path.dirname(__file__))
34
+
35
+ from config import (
36
+ FLUX_MODEL, STORY_MODEL, TTS_MODEL,
37
+ GENERATION_PARAMS, SAMPLE_BOOK_PATH, BASE_SEED, page_seed,
38
+ DEFAULT_VOICE, voice_design,
39
+ )
40
+ from book_builder import (
41
+ build_book_html, export_pdf, magic_loader_html,
42
+ build_coloring_html, export_coloring_pdf,
43
+ )
44
+ from ui.layout import create_layout
45
+
46
+ logging.basicConfig(level=logging.INFO)
47
+ logger = logging.getLogger(__name__)
48
+
49
+ # ZeroGPU sets SPACES_ZERO_GPU. On the Space we load models on cuda at IMPORT
50
+ # (a CUDA-emulation layer makes that work without a real GPU); lazy-loading
51
+ # inside @spaces.GPU is explicitly discouraged and was why FLUX kept failing
52
+ # sketch. Guarded so a local/dev import doesn't try to pull ~20GB of weights.
53
+ ON_ZEROGPU = bool(os.environ.get("SPACES_ZERO_GPU"))
54
+
55
+ _FLUX_PIPE = None
56
+ _STORY_MODEL = None
57
+ _STORY_TOKENIZER = None
58
+ _TTS_MODEL = None
59
+ _LOAD_ERRORS = {}
60
+
61
+
62
+ def load_flux():
63
+ """FLUX image pipeline placed on cuda at module scope (the ZeroGPU pattern).
64
+ No enable_model_cpu_offload() — that fights ZeroGPU's device management."""
65
+ global _FLUX_PIPE
66
+ if _FLUX_PIPE is None:
67
+ from diffusers import Flux2KleinPipeline
68
+ logger.info(f"Loading image model: {FLUX_MODEL.hub_id}")
69
+ pipe = Flux2KleinPipeline.from_pretrained(
70
+ FLUX_MODEL.hub_id, torch_dtype=torch.bfloat16,
71
+ )
72
+ pipe.to("cuda")
73
+ _FLUX_PIPE = pipe
74
+ return _FLUX_PIPE
75
+
76
+
77
+ def load_story():
78
+ global _STORY_MODEL, _STORY_TOKENIZER
79
+ if _STORY_MODEL is None:
80
+ from transformers import AutoTokenizer, AutoModelForCausalLM
81
+ logger.info(f"Loading story model: {STORY_MODEL.hub_id}")
82
+ _STORY_TOKENIZER = AutoTokenizer.from_pretrained(
83
+ STORY_MODEL.hub_id, trust_remote_code=True,
84
+ )
85
+ _STORY_MODEL = AutoModelForCausalLM.from_pretrained(
86
+ STORY_MODEL.hub_id, torch_dtype=torch.float16, trust_remote_code=True,
87
+ ).to("cuda").eval()
88
+ return _STORY_MODEL, _STORY_TOKENIZER
89
+
90
+
91
+ def load_tts():
92
+ global _TTS_MODEL
93
+ if _TTS_MODEL is None:
94
+ from voxcpm import VoxCPM
95
+ logger.info(f"Loading TTS model: {TTS_MODEL.hub_id}")
96
+ _TTS_MODEL = VoxCPM.from_pretrained(TTS_MODEL.hub_id, load_denoiser=False)
97
+ return _TTS_MODEL
98
+
99
+
100
+ if ON_ZEROGPU:
101
+ for _name, _loader in (("flux", load_flux), ("story", load_story), ("tts", load_tts)):
102
+ try:
103
+ _loader()
104
+ except Exception as _e: # keep the Space booting
105
+ _LOAD_ERRORS[_name] = repr(_e)
106
+ logger.exception(f"Module-level load failed for {_name}")
107
+
108
+ COLOR_ART_STYLE = (
109
+ "children's crayon storybook illustration, bold black outlines, "
110
+ "flat bright colors, simple shapes"
111
+ )
112
+ COLOR_PAGE_SUFFIX = "full colorful background scene, the character clearly visible."
113
+ LINE_ART_STYLE = (
114
+ "children's coloring book page, pure black ink outlines on pure white paper, "
115
+ "clean contour lines, no color, no gray, no shading, no texture, "
116
+ "no hatching, no pencil marks, open spaces to color"
117
+ )
118
+ LINE_ART_SUFFIX = (
119
+ "simple clean background shapes, same composition, thick readable outlines, "
120
+ "no filled black areas, no extra sketch marks."
121
+ )
122
+
123
+ THEME_TEMPLATES = {
124
+ "brave adventure": [
125
+ ("{hero} loved exploring new places.", "{hero} standing at the start of a bright adventure trail"),
126
+ ("One morning, {hero} discovered something glowing nearby.", "{hero} spotting a magical glow in the distance"),
127
+ ("Taking a deep breath, {hero} bravely went closer.", "{hero} walking forward with courage"),
128
+ ("There, a new friend needed help.", "{hero} finding a small friend in trouble"),
129
+ ("{hero} helped with kindness and a clever idea.", "{hero} helping the friend together"),
130
+ ("Everyone cheered, and {hero} felt proud and brave.", "{hero} celebrating at sunset with the new friend"),
131
+ ],
132
+ "making a new friend": [
133
+ ("{hero} was playing alone in a sunny place.", "{hero} playing under a bright sky"),
134
+ ("Then {hero} noticed someone shy nearby.", "{hero} seeing a shy new friend nearby"),
135
+ ("{hero} smiled and said hello.", "{hero} waving with a friendly smile"),
136
+ ("Soon they were sharing stories and laughs.", "{hero} and the new friend laughing together"),
137
+ ("They played games all afternoon.", "{hero} and the new friend playing together"),
138
+ ("By sunset, {hero} had made a wonderful new friend.", "{hero} and the new friend smiling together at sunset"),
139
+ ],
140
+ }
141
+
142
+ FEW_SHOT_EXEMPLAR = """
143
+ Write a 6-page children's storybook for age 5 about Luna the cat with theme: brave adventure.
144
+
145
+ Return ONLY valid JSON:
146
+ {
147
+ "title": "Luna's Brave Adventure",
148
+ "character_description": "A small orange tabby cat named Luna with big green eyes, whiskers, and a tiny red scarf",
149
+ "pages": [
150
+ {"page": 1, "text": "Luna was a small orange cat who loved to explore.", "scene": "Luna sitting by the window looking outside"},
151
+ {"page": 2, "text": "One sunny morning, Luna saw something sparkling in the forest.", "scene": "Luna spotting a glow in the trees"},
152
+ {"page": 3, "text": "Bravely, Luna crept into the forest to investigate.", "scene": "Luna walking cautiously through trees"},
153
+ {"page": 4, "text": "It was a tiny fairy stuck in a spider web!", "scene": "Luna discovering a fairy in trouble"},
154
+ {"page": 5, "text": "Luna gently freed the fairy with her paw.", "scene": "Luna carefully helping the fairy"},
155
+ {"page": 6, "text": "The fairy thanked Luna and they became friends forever.", "scene": "Luna and fairy playing together at sunset"}
156
+ ]
157
+ }
158
+ """
159
+
160
+
161
+ def build_story_prompt(hero_name: str, theme: str, age: int) -> str:
162
+ return f"""{FEW_SHOT_EXEMPLAR}
163
+
164
+ Write a 6-page children's storybook for age {age} about {hero_name} with theme: {theme}.
165
+
166
+ Return ONLY valid JSON:
167
+ """
168
+
169
+
170
+ def _validate_story_structure(story: dict) -> bool:
171
+ required_keys = ["title", "character_description", "pages"]
172
+ if not all(k in story for k in required_keys):
173
+ return False
174
+ pages = story.get("pages", [])
175
+ if not isinstance(pages, list) or len(pages) < 1:
176
+ return False
177
+ first_page = pages[0]
178
+ return all(k in first_page for k in ["page", "text", "scene"])
179
+
180
+
181
+ def _repair_json(json_str: str) -> str:
182
+ json_str = re.sub(r',\s*([}\]])', r'\1', json_str)
183
+ json_str = re.sub(r'//.*?$', '', json_str, flags=re.MULTILINE)
184
+ json_str = re.sub(r'/\*[\s\S]*?\*/', '', json_str)
185
+ json_str = re.sub(r'(?<=")\n(?=")', '\\n', json_str)
186
+ json_str = re.sub(r'(\s)(\w+)(\s*:)', r'\1"\2"\3', json_str)
187
+ return json_str
188
+
189
+
190
+ def parse_story_json(raw_output: str) -> dict | None:
191
+ match = re.search(r'\{[\s\S]*\}', raw_output or "")
192
+ if not match:
193
+ return None
194
+ raw_json = match.group(0)
195
+ for candidate in (raw_json, _repair_json(raw_json)):
196
+ try:
197
+ story = json.loads(candidate)
198
+ if _validate_story_structure(story):
199
+ return story
200
+ except Exception:
201
+ continue
202
+ return None
203
+
204
+
205
+ def _normalize_story(story: dict) -> dict:
206
+ pages = list(story.get("pages", []))[:6]
207
+ while len(pages) < 6:
208
+ pages.append({
209
+ "page": len(pages) + 1,
210
+ "text": "And the adventure continued happily.",
211
+ "scene": "Continuing adventure",
212
+ })
213
+ story["pages"] = pages
214
+ story.setdefault("title", "A Wonderful Adventure")
215
+ story.setdefault(
216
+ "character_description",
217
+ "A friendly children's storybook hero with bright colors and cheerful features",
218
+ )
219
+ return story
220
+
221
+
222
+ def build_story_locally(hero_name: str, theme: str) -> dict:
223
+ """Fast, deterministic fallback story that avoids any Modal dependency."""
224
+ hero = (hero_name or "Little Hero").strip() or "Little Hero"
225
+ beats = THEME_TEMPLATES.get(theme, THEME_TEMPLATES["brave adventure"])
226
+ pages = [
227
+ {"page": i + 1, "text": text.format(hero=hero), "scene": scene.format(hero=hero)}
228
+ for i, (text, scene) in enumerate(beats)
229
+ ]
230
+ return {
231
+ "title": f"{hero}'s Storybook Adventure",
232
+ "character_description": (
233
+ f"{hero}, a friendly children's storybook hero with bright colors, "
234
+ "bold outlines, and a cheerful expressive face"
235
+ ),
236
+ "pages": pages,
237
+ }
238
+
239
+
240
+ def silent_wav_bytes(duration_seconds: int = 2, sample_rate: int = 24000) -> bytes:
241
+ """Return a short silent WAV so the UI remains stable if TTS is unavailable."""
242
+ num_samples = sample_rate * duration_seconds
243
+ data_size = num_samples * 2
244
+ header = struct.pack(
245
+ "<4sI4s4sIHHIIHH4sI",
246
+ b"RIFF", 36 + data_size, b"WAVE",
247
+ b"fmt ", 16, 1, 1, sample_rate, sample_rate * 2, 2, 16,
248
+ b"data", data_size,
249
+ )
250
+ return header + (b"\x00" * data_size)
251
+
252
+
253
+ def _with_heartbeat(blocking_fn, frame_fn, poll=4.0):
254
+ import threading
255
+
256
+ box = {}
257
+
258
+ def _run():
259
+ try:
260
+ box["val"] = blocking_fn()
261
+ except BaseException as e:
262
+ box["err"] = e
263
+
264
+ th = threading.Thread(target=_run, daemon=True)
265
+ th.start()
266
+ t0 = time.time()
267
+ while th.is_alive():
268
+ th.join(timeout=poll)
269
+ if th.is_alive():
270
+ yield ("hb", frame_fn(int(time.time() - t0)))
271
+ if "err" in box:
272
+ raise box["err"]
273
+ yield ("done", box["val"])
274
+
275
+
276
+ # ============================================================================
277
+ # SAMPLE BOOK (loads instantly, no GPU needed)
278
+ # ============================================================================
279
+
280
+ SAMPLE_BOOK_HTML = None
281
+
282
+ def load_sample_book() -> str:
283
+ """Load pre-generated sample book (C3: always ship sample)."""
284
+ global SAMPLE_BOOK_HTML
285
+ if SAMPLE_BOOK_HTML:
286
+ return SAMPLE_BOOK_HTML
287
+
288
+ sample_path = os.path.join(SAMPLE_BOOK_PATH, "sample.html")
289
+ if os.path.exists(sample_path):
290
+ with open(sample_path, "r", encoding="utf-8") as f:
291
+ SAMPLE_BOOK_HTML = f.read()
292
+ return SAMPLE_BOOK_HTML
293
+
294
+ return "<div class='page-loading'>Loading sample book...</div>"
295
+
296
+
297
+ # ============================================================================
298
+ # ZEROGPU INFERENCE FUNCTIONS
299
+ # ============================================================================
300
+
301
+ @spaces.GPU(duration=60)
302
+ def generate_story_gpu(hero_name: str, theme: str, age: int = 5) -> dict:
303
+ """Generate a story on ZeroGPU, falling back to a deterministic local story."""
304
+ try:
305
+ model, tok = load_story()
306
+ prompt = build_story_prompt(hero_name, theme, age)
307
+ inputs = tok.apply_chat_template(
308
+ [{"role": "user", "content": prompt}],
309
+ add_generation_prompt=True,
310
+ enable_thinking=False,
311
+ return_dict=True,
312
+ return_tensors="pt",
313
+ ).to("cuda")
314
+ with torch.no_grad():
315
+ out = model.generate(
316
+ **inputs,
317
+ max_new_tokens=GENERATION_PARAMS.max_story_tokens,
318
+ do_sample=False,
319
+ )
320
+ response = tok.decode(
321
+ out[0][inputs["input_ids"].shape[1]:],
322
+ skip_special_tokens=True,
323
+ )
324
+ parsed = parse_story_json(response)
325
+ if parsed:
326
+ return _normalize_story(parsed)
327
+ logger.warning("Story parser failed; using deterministic local fallback")
328
+ except Exception as e:
329
+ logger.warning(f"ZeroGPU story generation failed: {e}")
330
+ return _normalize_story(build_story_locally(hero_name, theme))
331
+
332
+
333
+ @spaces.GPU(duration=150)
334
+ def generate_images_gpu(
335
+ character_desc: str,
336
+ scenes: list,
337
+ doodle_bytes: bytes = None,
338
+ seed: int = 42,
339
+ ) -> list:
340
+ """Generate all story pages with FLUX on ZeroGPU (two-stage: canonical
341
+ character from the doodle, then the same character in each scene)."""
342
+ import io
343
+ from PIL import Image
344
+
345
+ pipe = load_flux()
346
+ num_steps, guidance = 6, 1.0
347
+
348
+ canonical = None
349
+ if doodle_bytes:
350
+ try:
351
+ ref = Image.open(io.BytesIO(doodle_bytes)).convert("RGB")
352
+ canonical = pipe(
353
+ prompt=(f"Turn this child's drawing into a clean, friendly, full-body cartoon "
354
+ f"character for a children's storybook. Keep the EXACT same creature, "
355
+ f"face, and features as the drawing. {COLOR_ART_STYLE}, "
356
+ f"plain white background, full character visible, centered."),
357
+ image=ref, height=768, width=768, guidance_scale=guidance,
358
+ num_inference_steps=num_steps,
359
+ generator=torch.Generator("cuda").manual_seed(seed),
360
+ ).images[0]
361
+ logger.info("Canonical character built from doodle")
362
+ except Exception as e:
363
+ logger.warning(f"Canonical build failed ({e}); text2img fallback")
364
+ canonical = None
365
+
366
+ images = []
367
+ for i, scene in enumerate(scenes):
368
+ if canonical is not None:
369
+ prompt = f"The same character. {scene}. {COLOR_ART_STYLE}, {COLOR_PAGE_SUFFIX}"
370
+ kw = dict(image=canonical, prompt=prompt)
371
+ else:
372
+ prompt = (f"{character_desc}. Scene: {scene}. {COLOR_ART_STYLE}, "
373
+ f"white background, centered, full character visible")
374
+ kw = dict(prompt=prompt)
375
+ kw.update(height=768, width=768, guidance_scale=guidance,
376
+ num_inference_steps=num_steps,
377
+ generator=torch.Generator("cuda").manual_seed(seed + i + 1))
378
+ images.append(pipe(**kw).images[0])
379
+ logger.info(f"Generated page {i+1}/{len(scenes)}")
380
+ return images
381
+
382
+
383
+ @spaces.GPU(duration=150)
384
+ def generate_coloring_images_gpu(
385
+ character_desc: str,
386
+ scenes: list,
387
+ doodle_bytes: bytes = None,
388
+ seed: int = 42,
389
+ ) -> list:
390
+ """Generate coloring pages directly with FLUX as line art (no tracing)."""
391
+ import io
392
+ from PIL import Image
393
+
394
+ pipe = load_flux()
395
+ num_steps, guidance = 6, 1.0
396
+
397
+ canonical = None
398
+ if doodle_bytes:
399
+ try:
400
+ ref = Image.open(io.BytesIO(doodle_bytes)).convert("RGB")
401
+ canonical = pipe(
402
+ prompt=(f"Turn this child's drawing into a clean, friendly, full-body cartoon "
403
+ f"character for a children's coloring book. Keep the EXACT same creature, "
404
+ f"face, and features as the drawing. {LINE_ART_STYLE}, "
405
+ f"plain white background, full character visible, centered."),
406
+ image=ref, height=768, width=768, guidance_scale=guidance,
407
+ num_inference_steps=num_steps,
408
+ generator=torch.Generator("cuda").manual_seed(seed),
409
+ ).images[0]
410
+ logger.info("Line-art canonical character built from doodle")
411
+ except Exception as e:
412
+ logger.warning(f"Line-art canonical build failed ({e}); text2img fallback")
413
+ canonical = None
414
+
415
+ images = []
416
+ for i, scene in enumerate(scenes):
417
+ if canonical is not None:
418
+ prompt = f"The same character. {scene}. {LINE_ART_STYLE}, {LINE_ART_SUFFIX}"
419
+ kw = dict(image=canonical, prompt=prompt)
420
+ else:
421
+ prompt = (f"{character_desc}. Scene: {scene}. {LINE_ART_STYLE}, "
422
+ f"white background, centered, full character visible")
423
+ kw = dict(prompt=prompt)
424
+ kw.update(height=768, width=768, guidance_scale=guidance,
425
+ num_inference_steps=num_steps,
426
+ generator=torch.Generator("cuda").manual_seed(seed + i + 101))
427
+ images.append(pipe(**kw).images[0])
428
+ logger.info(f"Generated coloring page {i+1}/{len(scenes)}")
429
+ return images
430
+
431
+
432
+ @spaces.GPU(duration=120)
433
+ def generate_tts_gpu(text: str, voice: str = DEFAULT_VOICE) -> bytes:
434
+ """Narrate the book with VoxCPM2. Raises on failure so the caller can show
435
+ the real reason instead of silently shipping a silent clip."""
436
+ import io
437
+ import numpy as np
438
+
439
+ try:
440
+ model = load_tts()
441
+ design = voice_design(voice)
442
+
443
+ import re
444
+ chunks = [s.strip() for s in re.split(r"(?<=[.!?])\s+", text) if s.strip()]
445
+ if not chunks:
446
+ chunks = [text.strip() or "The end."]
447
+
448
+ sr = model.tts_model.sample_rate
449
+ pause = np.zeros(int(sr * 0.35), dtype=np.float32)
450
+ pieces = []
451
+
452
+ for i, sentence in enumerate(chunks):
453
+ wav = model.generate(
454
+ text=f"{design} {sentence}",
455
+ cfg_value=2.0,
456
+ inference_timesteps=10,
457
+ )
458
+ pieces.append(np.asarray(wav, dtype=np.float32))
459
+ if i < len(chunks) - 1:
460
+ pieces.append(pause)
461
+
462
+ audio = np.concatenate(pieces)
463
+ import soundfile as sf
464
+ buf = io.BytesIO()
465
+ sf.write(buf, audio, sr, format="WAV")
466
+ return buf.getvalue()
467
+
468
+ except Exception as e:
469
+ # Surface the real reason (e.g. missing model) instead of a silent clip
470
+ # that looks like it worked. create_book records this in the trace.
471
+ logger.exception("TTS failed")
472
+ raise
473
+
474
+
475
+ # ============================================================================
476
+ # MAIN BOOK CREATION (Generator for streaming)
477
+ # ============================================================================
478
+
479
+ def create_book(doodle_image, character_name, theme, hero_name, voice=DEFAULT_VOICE, make_coloring=False):
480
+ """ZeroGPU book flow: story → images → narration → PDFs → coloring book,
481
+ each a sequential @spaces.GPU call (ZeroGPU has one GPU per request)."""
482
+ t_total = time.perf_counter()
483
+ character_name = (character_name or "").strip() or "Little Hero"
484
+ hero_name = (hero_name or "").strip() or character_name
485
+
486
+ trace_data = {
487
+ "backend": "zerogpu",
488
+ "hero_name": hero_name,
489
+ "theme": theme,
490
+ "voice": voice,
491
+ "make_coloring": make_coloring,
492
+ "seed": BASE_SEED,
493
+ "timestamp": time.strftime("%Y-%m-%d %H:%M:%S"),
494
+ }
495
+ if _LOAD_ERRORS:
496
+ trace_data["model_load_errors"] = _LOAD_ERRORS
497
+
498
+ _no = gr.update(visible=False)
499
+ _keep = gr.update()
500
+
501
+ yield (
502
+ magic_loader_html("story", hero_name),
503
+ "Writing the story…",
504
+ None, _keep, {}, "", json.dumps(trace_data, indent=2),
505
+ _no, _keep,
506
+ )
507
+
508
+ t_story = time.perf_counter()
509
+ try:
510
+ story = generate_story_gpu(hero_name, theme)
511
+ except Exception as e:
512
+ logger.error(f"Story generation failed: {e}")
513
+ yield (
514
+ f"<div class='page-loading'>Error: {e}</div>",
515
+ f"Error: {e}",
516
+ None, _keep, {}, "", "",
517
+ _no, _keep,
518
+ )
519
+ return
520
+ trace_data["story_sec"] = round(time.perf_counter() - t_story, 2)
521
+
522
+ pages = story.get("pages", [])
523
+ char_desc = story.get("character_description", "")
524
+ title = story.get("title", "Untitled Story")
525
+ page_texts = [p.get("text", "") for p in pages]
526
+ scenes = [p.get("scene", "") for p in pages]
527
+
528
+ trace_data["title"] = title
529
+ trace_data["character_description"] = char_desc
530
+
531
+ yield (
532
+ magic_loader_html("images", hero_name),
533
+ f"{title} illustrating on ZeroGPU…",
534
+ None, _keep, story, "", json.dumps(trace_data, indent=2),
535
+ _no, _keep,
536
+ )
537
+
538
+ doodle_bytes = None
539
+ if doodle_image is not None:
540
+ import io
541
+ from PIL import Image
542
+ img = Image.fromarray(doodle_image)
543
+ buf = io.BytesIO()
544
+ img.save(buf, format="PNG")
545
+ doodle_bytes = buf.getvalue()
546
+
547
+ full_text = f"{title}. {' '.join(page_texts)}"
548
+
549
+ # ---- IMAGES (FLUX on ZeroGPU) ----
550
+ img_bytes, engine = None, "sketch"
551
+ t_images = time.perf_counter()
552
+ try:
553
+ for kind, payload in _with_heartbeat(
554
+ lambda: generate_images_gpu(char_desc, scenes, doodle_bytes, BASE_SEED),
555
+ lambda s: (
556
+ magic_loader_html("images", hero_name),
557
+ f"{title} — illustrating on ZeroGPU… {s}s",
558
+ None, _keep, story, "", json.dumps(trace_data, indent=2), _no, _keep,
559
+ ),
560
+ ):
561
+ if kind == "hb":
562
+ yield payload
563
+ else:
564
+ images = payload
565
+ import io
566
+ img_bytes = []
567
+ for img in images:
568
+ buf = io.BytesIO()
569
+ img.save(buf, format="PNG")
570
+ img_bytes.append(buf.getvalue())
571
+ engine = "flux"
572
+ except Exception as e:
573
+ logger.exception("Image generation failed")
574
+ trace_data["image_error"] = repr(e)
575
+ from services.images import generate_placeholder_images
576
+ img_bytes = generate_placeholder_images(char_desc, scenes, doodle_bytes)
577
+ engine = "sketch"
578
+ trace_data["images_sec"] = round(time.perf_counter() - t_images, 2)
579
+ trace_data["engine"] = engine
580
+
581
+ book_html = build_book_html(img_bytes, page_texts, title, engine)
582
+
583
+ # ---- NARRATION (VoxCPM2 on ZeroGPU) — sequential: one GPU per request ----
584
+ audio_path = None
585
+ t_tts = time.perf_counter()
586
+ try:
587
+ for kind, payload in _with_heartbeat(
588
+ lambda: generate_tts_gpu(full_text, voice),
589
+ lambda s: (
590
+ book_html,
591
+ f"{title} — recording the narration… {s}s",
592
+ None, _keep, story, "", json.dumps(trace_data, indent=2), _no, _keep,
593
+ ),
594
+ ):
595
+ if kind == "hb":
596
+ yield payload
597
+ else:
598
+ voice_bytes = payload
599
+ with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
600
+ tmp.write(voice_bytes)
601
+ audio_path = tmp.name
602
+ except Exception as e:
603
+ logger.exception("TTS failed")
604
+ trace_data["tts_error"] = repr(e)
605
+ trace_data["tts_sec"] = round(time.perf_counter() - t_tts, 2)
606
+
607
+ pdf_path = None
608
+ t_pdf = time.perf_counter()
609
+ try:
610
+ with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp:
611
+ pdf_path = export_pdf(img_bytes, page_texts, title, tmp.name)
612
+ except Exception as e:
613
+ logger.warning(f"PDF failed: {e}")
614
+ trace_data["pdf_sec"] = round(time.perf_counter() - t_pdf, 2)
615
+
616
+ coloring_html = ""
617
+ coloring_pdf_path = None
618
+ if make_coloring:
619
+ t_coloring = time.perf_counter()
620
+ try:
621
+ from services.coloring import _crispen
622
+ for kind, payload in _with_heartbeat(
623
+ lambda: generate_coloring_images_gpu(char_desc, scenes, doodle_bytes, BASE_SEED),
624
+ lambda s: (
625
+ book_html,
626
+ f"{title} — building coloring book… {s}s",
627
+ audio_path,
628
+ _keep,
629
+ story,
630
+ "",
631
+ json.dumps(trace_data, indent=2),
632
+ _no,
633
+ _keep,
634
+ ),
635
+ ):
636
+ if kind == "hb":
637
+ yield payload
638
+ else:
639
+ coloring_images = payload
640
+ import io
641
+ outlines = []
642
+ for img in coloring_images:
643
+ buf = io.BytesIO()
644
+ img.save(buf, format="PNG")
645
+ outlines.append(_crispen(buf.getvalue()))
646
+ coloring_html = build_coloring_html(outlines, page_texts, title)
647
+ with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp:
648
+ coloring_pdf_path = export_coloring_pdf(outlines, page_texts, title, tmp.name)
649
+ trace_data["coloring_book"] = True
650
+ trace_data["coloring_engine"] = "flux-direct-lineart"
651
+ except Exception as e:
652
+ logger.warning(f"Direct FLUX coloring book failed ({e}); using traced fallback")
653
+ try:
654
+ from services.coloring import derive_coloring_pages
655
+ outlines = derive_coloring_pages(img_bytes)
656
+ coloring_html = build_coloring_html(outlines, page_texts, title)
657
+ with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp:
658
+ coloring_pdf_path = export_coloring_pdf(outlines, page_texts, title, tmp.name)
659
+ trace_data["coloring_book"] = True
660
+ trace_data["coloring_engine"] = "trace-fallback"
661
+ except Exception as e2:
662
+ logger.warning(f"Coloring book fallback failed: {e2}")
663
+ trace_data["coloring_sec"] = round(time.perf_counter() - t_coloring, 2)
664
+
665
+ trace_data["completed"] = True
666
+ trace_data["pages_generated"] = len(img_bytes)
667
+ trace_data["total_sec"] = round(time.perf_counter() - t_total, 2)
668
+
669
+ pdf_update = gr.update(value=pdf_path) if pdf_path else _keep
670
+ coloring_pdf_update = gr.update(value=coloring_pdf_path) if coloring_pdf_path else _keep
671
+ coloring_display_update = (gr.update(visible=True, value=coloring_html) if coloring_html
672
+ else _no)
673
+
674
+ yield (
675
+ book_html,
676
+ f"Complete: {title} — {len(img_bytes)} pages · {'FLUX (ZeroGPU)' if engine == 'flux' else 'local sketch fallback'} · voice: {voice} · total {trace_data['total_sec']}s",
677
+ audio_path,
678
+ pdf_update,
679
+ story,
680
+ f"Pages: {len(img_bytes)} | Seed: {BASE_SEED} | Engine: {engine} | Story {trace_data.get('story_sec', 0)}s | Images {trace_data.get('images_sec', 0)}s | PDF {trace_data.get('pdf_sec', 0)}s | Coloring {trace_data.get('coloring_sec', 0)}s",
681
+ json.dumps(trace_data, indent=2),
682
+ coloring_display_update,
683
+ coloring_pdf_update,
684
+ )
685
+
686
+
687
+ # ============================================================================
688
+ # MAIN
689
+ # ============================================================================
690
+
691
+ if __name__ == "__main__":
692
+ demo = create_layout(
693
+ load_sample_fn=load_sample_book,
694
+ create_book_fn=create_book,
695
+ )
696
+ demo.queue(default_concurrency_limit=2, max_size=8)
697
+ demo.launch(share=False, allowed_paths=[tempfile.gettempdir()])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app_zerogpu.py DELETED
@@ -1,152 +0,0 @@
1
- """
2
- DoodleBook — ZeroGPU Version (Free HF Hosting)
3
-
4
- Runs directly on HF ZeroGPU without Modal.
5
- Slower but completely free.
6
- """
7
-
8
- import gradio as gr
9
- import os
10
- import sys
11
- import torch
12
- from pathlib import Path
13
-
14
- sys.path.insert(0, os.path.dirname(__file__))
15
-
16
- from config import FLUX_MODEL, STORY_MODEL, GENERATION_PARAMS, BASE_SEED
17
-
18
-
19
- # ============================================================================
20
- # ZEROGPU INFERENCE (No Modal)
21
- # ============================================================================
22
-
23
- @torch.inference_mode()
24
- def generate_story_zerogpu(hero_name: str, theme: str, age: int = 5) -> dict:
25
- """Generate story using MiniCPM5-1B on ZeroGPU."""
26
- from transformers import AutoTokenizer, AutoModelForCausalLM
27
-
28
- model_id = STORY_MODEL.hub_id
29
- tok = AutoTokenizer.from_pretrained(model_id)
30
- model = AutoModelForCausalLM.from_pretrained(
31
- model_id, torch_dtype=torch.float16
32
- ).cuda().eval()
33
-
34
- prompt = f"""Write a 6-page children's storybook for age {age} about {hero_name} with theme: {theme}.
35
- Return ONLY valid JSON:
36
- {{"title": "Title", "character_description": "Description", "pages": [{{"page": 1, "text": "Text", "scene": "Scene"}}]}}"""
37
-
38
- inputs = tok(prompt, return_tensors="pt").cuda()
39
- with torch.no_grad():
40
- out = model.generate(**inputs, max_new_tokens=800, do_sample=False)
41
-
42
- response = tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
43
-
44
- # Parse JSON
45
- import re, json
46
- match = re.search(r'\{.*\}', response, re.DOTALL)
47
- if match:
48
- try:
49
- return json.loads(match.group())
50
- except:
51
- pass
52
-
53
- # Fallback
54
- return {
55
- "title": f"{hero_name}'s Adventure",
56
- "character_description": f"A friendly character named {hero_name}",
57
- "pages": [{"page": i+1, "text": f"Page {i+1} text", "scene": f"Scene {i+1}"} for i in range(6)]
58
- }
59
-
60
-
61
- @torch.inference_mode()
62
- def generate_images_zerogpu(character_desc: str, scenes: list) -> list:
63
- """Generate images using FLUX on ZeroGPU."""
64
- from diffusers import FluxPipeline
65
-
66
- pipe = FluxPipeline.from_pretrained(
67
- FLUX_MODEL.hub_id,
68
- torch_dtype=torch.bfloat16
69
- ).cuda()
70
-
71
- images = []
72
- for i, scene in enumerate(scenes):
73
- prompt = f"{character_desc}, {scene}, crayon drawing style"
74
- generator = torch.Generator("cuda").manual_seed(BASE_SEED + i)
75
-
76
- image = pipe(
77
- prompt=prompt,
78
- num_inference_steps=20,
79
- guidance_scale=3.5,
80
- width=768,
81
- height=512,
82
- generator=generator
83
- ).images[0]
84
-
85
- images.append(image)
86
-
87
- return images
88
-
89
-
90
- # ============================================================================
91
- # MAIN FUNCTION (ZeroGPU compatible)
92
- # ============================================================================
93
-
94
- def create_book_zerogpu(doodle_image, character_name, theme, hero_name, tiny_mode=False):
95
- """
96
- Book creation without Modal.
97
- Uses ZeroGPU for inference.
98
- """
99
- import time
100
- from book_builder import build_book_html
101
- import io, base64
102
-
103
- # Generate story
104
- story = generate_story_zerogpu(hero_name, theme)
105
- title = story.get("title", "Story")
106
- pages = story.get("pages", [])
107
- char_desc = story.get("character_description", "")
108
- scenes = [p.get("scene", "") for p in pages]
109
- texts = [p.get("text", "") for p in pages]
110
-
111
- # Generate images
112
- images = generate_images_zerogpu(char_desc, scenes)
113
-
114
- # Convert to bytes
115
- img_bytes = []
116
- for img in images:
117
- buf = io.BytesIO()
118
- img.save(buf, format="PNG")
119
- img_bytes.append(buf.getvalue())
120
-
121
- # Build HTML
122
- html = build_book_html(img_bytes, texts, title)
123
-
124
- return html, f"Complete: {title}", None, None
125
-
126
-
127
- # ============================================================================
128
- # GRADIO UI
129
- # ============================================================================
130
-
131
- if __name__ == "__main__":
132
- with gr.Blocks(title="DoodleBook (Free)") as demo:
133
- gr.Markdown("# 📚 DoodleBook (ZeroGPU Version)")
134
-
135
- with gr.Row():
136
- with gr.Column():
137
- doodle = gr.Image(label="Doodle", type="numpy")
138
- name = gr.Textbox(label="Character name")
139
- theme = gr.Dropdown(["brave adventure", "making a friend"], label="Theme")
140
- btn = gr.Button("Make book!")
141
-
142
- with gr.Column():
143
- output = gr.HTML()
144
- status = gr.Textbox(label="Status")
145
-
146
- btn.click(
147
- create_book_zerogpu,
148
- inputs=[doodle, name, theme, name],
149
- outputs=[output, status]
150
- )
151
-
152
- demo.launch()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.py CHANGED
@@ -86,56 +86,24 @@ class ModelConfig:
86
  # Fallbacks selected for license compatibility (Apache 2.0 preferred).
87
  #
88
 
 
 
 
 
89
  FLUX_MODEL = ModelConfig(
90
  hub_id="black-forest-labs/FLUX.2-klein-4B",
91
  params_b=4.0,
92
  license=LicenseType.APACHE_2_0,
93
  vram_gb=13.0,
94
- fallback_id="black-forest-labs/FLUX.1-schnell",
95
- fallback_reason="12B, Apache 2.0, 1-4 step distilled, fast inference",
96
  modal_gpu="A10G", # 24GB fits the ~13GB model; A100-40GB was overkill
97
  modal_memory=32768,
98
  )
99
 
100
- FLUX_MODEL_9B = ModelConfig(
101
- hub_id="black-forest-labs/FLUX.2-klein-9B",
102
- params_b=9.0,
103
- license=LicenseType.NON_COMMERCIAL,
104
- vram_gb=29.0,
105
- fallback_id="black-forest-labs/FLUX.2-klein-4B",
106
- fallback_reason="4B variant with Apache 2.0 license",
107
- is_primary=False,
108
- modal_gpu="A100",
109
- modal_memory=32768,
110
- )
111
-
112
- FLUX_FALLBACK = ModelConfig(
113
- hub_id="black-forest-labs/FLUX.1-schnell",
114
- params_b=12.0,
115
- license=LicenseType.APACHE_2_0,
116
- vram_gb=24.0,
117
- is_primary=False,
118
- modal_gpu="A100",
119
- modal_memory=32768,
120
- )
121
-
122
  STORY_MODEL = ModelConfig(
123
  hub_id="openbmb/MiniCPM5-1B",
124
  params_b=1.0,
125
  license=LicenseType.APACHE_2_0,
126
  vram_gb=4.0,
127
- fallback_id="openbmb/MiniCPM3-4B",
128
- fallback_reason="4B, stronger capability but larger footprint",
129
- modal_gpu="T4",
130
- modal_memory=8192,
131
- )
132
-
133
- STORY_FALLBACK = ModelConfig(
134
- hub_id="openbmb/MiniCPM3-4B",
135
- params_b=4.0,
136
- license=LicenseType.APACHE_2_0,
137
- vram_gb=8.0,
138
- is_primary=False,
139
  modal_gpu="T4",
140
  modal_memory=8192,
141
  )
@@ -145,43 +113,6 @@ TTS_MODEL = ModelConfig(
145
  params_b=2.0,
146
  license=LicenseType.APACHE_2_0,
147
  vram_gb=8.0,
148
- fallback_id="hexgrad/Kokoro-82M",
149
- fallback_reason="82M params, ultra-lightweight, Apache 2.0",
150
- modal_gpu="T4",
151
- modal_memory=8192,
152
- )
153
-
154
- TTS_FALLBACK_KOKORO = ModelConfig(
155
- hub_id="hexgrad/Kokoro-82M",
156
- params_b=0.082,
157
- license=LicenseType.APACHE_2_0,
158
- vram_gb=1.0,
159
- is_primary=False,
160
- modal_gpu="T4",
161
- modal_memory=4096,
162
- )
163
-
164
- TTS_FALLBACK_MELO = ModelConfig(
165
- hub_id="myshell-ai/MeloTTS-English-v3",
166
- params_b=0.0, # Unknown exact size
167
- license=LicenseType.MIT,
168
- vram_gb=1.0,
169
- is_primary=False,
170
- modal_gpu="CPU",
171
- modal_memory=2048,
172
- )
173
-
174
- # ============================================================================
175
- # TINY MODE MODELS (C4: Edge/Tiny Model Support)
176
- # ============================================================================
177
-
178
- TINY_IMAGE_MODEL = ModelConfig(
179
- hub_id="stabilityai/sd-turbo",
180
- params_b=0.67,
181
- license=LicenseType.APACHE_2_0,
182
- vram_gb=4.0,
183
- fallback_id="stabilityai/sdxl-turbo",
184
- fallback_reason="SDXL-Turbo, higher quality but more VRAM",
185
  modal_gpu="T4",
186
  modal_memory=8192,
187
  )
@@ -443,17 +374,8 @@ def get_model_with_fallback(
443
  Returns:
444
  ModelConfig (primary or fallback)
445
  """
446
- if use_fallback and model.fallback_id:
447
- logger.info(f"Using fallback: {model.fallback_id} (reason: {model.fallback_reason})")
448
- # Return the appropriate fallback config
449
- fallback_map = {
450
- "black-forest-labs/FLUX.1-schnell": FLUX_FALLBACK,
451
- "openbmb/MiniCPM3-4B": STORY_FALLBACK,
452
- "hexgrad/Kokoro-82M": TTS_FALLBACK_KOKORO,
453
- "myshell-ai/MeloTTS-English-v3": TTS_FALLBACK_MELO,
454
- }
455
- return fallback_map.get(model.fallback_id, model)
456
-
457
  return model
458
 
459
 
 
86
  # Fallbacks selected for license compatibility (Apache 2.0 preferred).
87
  #
88
 
89
+ # The three sponsor models DoodleBook actually loads. Fallback/variant configs
90
+ # were removed so the HF Space links exactly these three (HF auto-links any
91
+ # model id it finds in the repo files).
92
+
93
  FLUX_MODEL = ModelConfig(
94
  hub_id="black-forest-labs/FLUX.2-klein-4B",
95
  params_b=4.0,
96
  license=LicenseType.APACHE_2_0,
97
  vram_gb=13.0,
 
 
98
  modal_gpu="A10G", # 24GB fits the ~13GB model; A100-40GB was overkill
99
  modal_memory=32768,
100
  )
101
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
  STORY_MODEL = ModelConfig(
103
  hub_id="openbmb/MiniCPM5-1B",
104
  params_b=1.0,
105
  license=LicenseType.APACHE_2_0,
106
  vram_gb=4.0,
 
 
 
 
 
 
 
 
 
 
 
 
107
  modal_gpu="T4",
108
  modal_memory=8192,
109
  )
 
113
  params_b=2.0,
114
  license=LicenseType.APACHE_2_0,
115
  vram_gb=8.0,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
  modal_gpu="T4",
117
  modal_memory=8192,
118
  )
 
374
  Returns:
375
  ModelConfig (primary or fallback)
376
  """
377
+ # Fallback model configs were removed (the Space links only the 3 primaries);
378
+ # there is no alternate config to swap in, so always return the primary.
 
 
 
 
 
 
 
 
 
379
  return model
380
 
381
 
docs/blog.md DELETED
@@ -1,121 +0,0 @@
1
- # Field Notes: FLUX + LoRA Character Consistency
2
-
3
- *How we achieved cross-page character consistency in DoodleBook using FLUX.2-klein, seed-locking, and a crayon-style LoRA.*
4
-
5
- ---
6
-
7
- ## The Challenge
8
-
9
- The core problem in AI-generated storybooks is **character consistency**. If you generate 6 pages of a story independently, each page produces a different character — different colors, different proportions, different style. The magic is lost.
10
-
11
- We needed: **the same character, in the same art style, across all 6 pages.**
12
-
13
- ---
14
-
15
- ## Our Approach: The Consistency Stack
16
-
17
- We didn't rely on a single technique. Instead, we layered three complementary approaches:
18
-
19
- ### 1. Seed Locking
20
-
21
- ```python
22
- BASE_SEED = 42
23
- def page_seed(page_num):
24
- return BASE_SEED + page_num # Page 0: 42, Page 1: 43, ...
25
- ```
26
-
27
- Each page uses a deterministic seed derived from a locked base. This ensures:
28
- - Reproducible generation (same inputs = same outputs)
29
- - Slight variation between pages (different seeds)
30
- - Consistent "feel" across the book
31
-
32
- ### 2. Character Description Reuse
33
-
34
- Every page uses the **exact same** `character_description` string:
35
-
36
- ```python
37
- prompt = f"""
38
- {character_description}, # IDENTICAL on every page
39
- {scene_description}, # UNIQUE per page
40
- {art_style}, page {i+1} of children's book
41
- """
42
- ```
43
-
44
- The character description acts as an anchor, keeping the model's interpretation consistent.
45
-
46
- ### 3. LoRA Fine-Tuning (The Secret Sauce)
47
-
48
- We trained a **crayon-style LoRA** on FLUX.2-klein:
49
-
50
- - **Trigger token:** `[DOODLECHAR]`
51
- - **Training data:** 10-15 crayon-style character images
52
- - **Rank:** 16 (balances quality vs. file size)
53
- - **Steps:** 300
54
-
55
- The LoRA teaches FLUX to generate images in a specific art style. Combined with the character description, this creates a consistent visual identity.
56
-
57
- ---
58
-
59
- ## The Results
60
-
61
- ### Before LoRA (Base FLUX)
62
- - Pages look like generic AI art
63
- - Character changes dramatically between pages
64
- - No consistent style
65
-
66
- ### After LoRA + Consistency Stack
67
- - Same character across all 6 pages
68
- - Consistent crayon art style
69
- - Recognizable as "the same book"
70
-
71
- ---
72
-
73
- ## Key Learnings
74
-
75
- 1. **Seed alone isn't enough.** Different prompts with the same seed produce different characters. You need description consistency too.
76
-
77
- 2. **LoRA provides style, not identity.** The LoRA teaches the art style (crayon, watercolor, etc.), but the character identity comes from the prompt.
78
-
79
- 3. **Image conditioning helps.** When available, feeding the child's actual doodle as an image prompt (via img2img) dramatically improves style matching.
80
-
81
- 4. **Quality vs. speed tradeoff.** FLUX.2-klein-4B (4B params) runs faster than 9B with minimal quality loss for storybook art.
82
-
83
- ---
84
-
85
- ## Technical Details
86
-
87
- ### Model Stack
88
- - **Image:** FLUX.2-klein-4B + crayon-style LoRA
89
- - **Story:** MiniCPM5-1B (1B)
90
- - **TTS:** VoxCPM2 (2B)
91
-
92
- ### Training Config
93
- ```yaml
94
- rank: 16
95
- alpha: 16
96
- learning_rate: 1e-4
97
- steps: 300
98
- resolution: 512
99
- batch_size: 1
100
- ```
101
-
102
- ### Inference Config
103
- ```yaml
104
- guidance_scale: 3.5
105
- num_inference_steps: 20 # Standard mode
106
- 4 # Tiny Mode (SD-Turbo)
107
- width: 768
108
- height: 512
109
- ```
110
-
111
- ---
112
-
113
- ## Conclusion
114
-
115
- Character consistency in AI storybooks requires a multi-layered approach: seed locking for reproducibility, prompt engineering for identity, and LoRA fine-tuning for style. No single technique solves the problem alone, but together they create a reliable system.
116
-
117
- The result? A child's crayon drawing becomes a consistent, narrated, illustrated storybook — their character, their style, brought to life by AI.
118
-
119
- ---
120
-
121
- *Built for Build Small Hackathon 2026 · Thousand Token Wood Track*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/superpowers/specs/2026-06-14-coloring-book-loader-pdf-cover-design.md DELETED
@@ -1,174 +0,0 @@
1
- # DoodleBook — Coloring Book, Magic Loader & Styled PDF Covers
2
-
3
- **Date:** 2026-06-14
4
- **Status:** Approved design (pending spec review)
5
-
6
- Three user-requested features for the DoodleBook app (`run_modal.py` Modal build,
7
- mirrored in `app.py` HF build):
8
-
9
- 1. **Magic Loader** — an engaging, on-brand wait screen while generation runs.
10
- 2. **Coloring Book** — the SAME FLUX images with the colors removed (outlines
11
- only) so kids can color the exact same pictures. No second/different image is
12
- ever generated. Opt-in via a checkbox chosen before generating.
13
- 3. **Styled PDF covers** — the PDF front page should match the on-screen scrapbook
14
- cover (cream paper, crayon title with shadow, green "illustrated by" badge),
15
- for both the storybook and the coloring-book PDFs.
16
-
17
- ---
18
-
19
- ## Feature 1 — Magic Loader
20
-
21
- ### Goal
22
- Generation takes a few minutes (FLUX is the slow stage). Replace the single plain
23
- status line with a crayon-styled animated panel that tells the user what's
24
- happening and showcases the small-model stack (good for judges too).
25
-
26
- ### Approach
27
- Pure-CSS rotating messages (no JS, no per-page streaming — reliable inside Gradio
28
- `gr.HTML`). A stack of message `<div>`s fade in/out in sequence via staggered CSS
29
- `animation-delay` on an opacity keyframe.
30
-
31
- - Helper `magic_loader_html(stage: str, hero_name: str) -> str` in `book_builder.py`.
32
- - Rotating messages (image stage — the long one):
33
- - `✏️ MiniCPM is dreaming up {hero}'s story…`
34
- - `🎨 FLUX is painting your 6 pages…`
35
- - `🔊 VoxCPM is recording the narration…`
36
- - `💡 Did you know? Your whole storybook runs on tiny models!`
37
- - `create_book` yields the loader HTML into `book_display` during the story and
38
- image stages (it already streams stage-by-stage).
39
-
40
- ### CSS
41
- New `.magic-loader` / `.ml-msg` rules in the `ui/layout.py` CSS string. Each
42
- `.ml-msg` is absolutely stacked; `animation: ml-cycle Ns infinite` with
43
- `animation-delay: i*step`. Respects existing `prefers-reduced-motion` block
44
- (messages still legible, just no fade).
45
-
46
- ---
47
-
48
- ## Feature 2 — Coloring Book (checkbox-triggered)
49
-
50
- ### Goal
51
- When the user opts in, produce printable black-and-white outline pages **of the
52
- exact same FLUX images** (same scenes, same character) for kids to color. The
53
- coloring page is the color page with its colors removed — never a newly generated
54
- or different image.
55
-
56
- ### Trigger
57
- New checkbox in the input card: `🖍️ Also make a coloring book` (`make_coloring`),
58
- chosen before pressing "Make my book!". Outline pages are produced automatically
59
- with the book when checked.
60
-
61
- ### Generation (same image, colors removed — instant, free)
62
- Process the already-generated color images locally with OpenCV to strip the fills
63
- and keep the outlines. No extra Modal call, ~seconds, and the result is the SAME
64
- picture as line art. There is no FLUX re-generation and no "HD" alternative — that
65
- guarantees the coloring page always matches the color page exactly.
66
-
67
- ### New module: `services/coloring.py`
68
- - `to_line_art(png_bytes: bytes) -> bytes`
69
- - OpenCV: grayscale → light blur → `adaptiveThreshold(GAUSSIAN_C, THRESH_BINARY,
70
- blockSize≈11, C≈2)` to get black lines on white; remove tiny speck components
71
- (reuse the cleanup approach from `services/images.py:_doodle_to_cartoon`).
72
- - Returns a PNG (white background, black outlines).
73
- - On any failure: fall back to grayscale Otsu threshold; never raises.
74
- - `derive_coloring_pages(color_imgs: list[bytes]) -> list[bytes]`.
75
-
76
- ### `book_builder.py` additions
77
- - `build_coloring_html(outline_imgs, page_texts, title) -> str` — same scrapbook
78
- layout as `build_book_html` but with a coloring-book cover badge and outline
79
- images (text kept small/light beneath each page).
80
- - `export_coloring_pdf(outline_imgs, page_texts, title, path) -> str` — styled
81
- cover (Feature 3) + outline pages, print-friendly.
82
-
83
- ### UI (`ui/layout.py`)
84
- - Input card: `make_coloring = gr.Checkbox(...)` (styled like `tiny_mode`).
85
- - Output card (below the storybook + downloads):
86
- - `coloring_display = gr.HTML(visible=False)`
87
- - `coloring_pdf_download = gr.DownloadButton("Download Coloring Book (PDF)", visible=False)`
88
-
89
- ### Data flow
90
- ```
91
- create_book(doodle, char_name, theme, hero, tiny, voice, make_coloring):
92
- yield magic_loader(story) -> book_display
93
- story = services.story.generate_story(...)
94
- yield magic_loader(images)
95
- color_imgs, engine = services.images.generate_book_pages(...)
96
- book_html = build_book_html(color_imgs, ...)
97
- if make_coloring:
98
- outlines = services.coloring.derive_coloring_pages(color_imgs) # SAME imgs, colors removed
99
- coloring_html = build_coloring_html(outlines, ...)
100
- coloring_pdf = export_coloring_pdf(outlines, ...)
101
- audio = services.tts.speak_book(...)
102
- pdf = export_pdf(color_imgs, ...)
103
- yield final: book_html, status, audio, pdf(visible), story_json, image_info,
104
- trace, coloring_html(visible if make_coloring),
105
- coloring_pdf(visible if make_coloring)
106
- ```
107
- No `gr.State` and no regenerate handler are needed — the coloring pages are derived
108
- directly from `color_imgs`, so they always match the color book.
109
-
110
- ---
111
-
112
- ## Feature 3 — Styled PDF cover
113
-
114
- ### Goal
115
- Replace the plain Helvetica text title page in `export_pdf` with a cover that
116
- matches the on-screen scrapbook cover.
117
-
118
- ### Approach
119
- Render the cover as a full-page **PIL image** and place it as PDF page 1 (PIL is
120
- already a dependency; works on HF with no browser).
121
-
122
- - `book_builder.render_cover_image(title, badge_text, kind="story") -> bytes`
123
- - Canvas ~1240×1754 (A4 @150dpi). Cream fill (`#fff8e6`) + subtle speckle.
124
- - Kicker `a DoodleBook story` — Caveat, berry (`#d6517a`), centered upper third.
125
- - Title — Gaegu Bold, large, centered, wrapped to ≤2 lines, layered shadow:
126
- offset draw in crayon-sun (`#f4c64a`) then ink (`#2e2a26`) on top.
127
- - Badge — rounded rect, crayon-leaf (`#74b85a`), white Gaegu text `badge_text`.
128
- - `badge_text`: story = `illustrated by FLUX.2-klein`; coloring =
129
- `a coloring book to color in`.
130
- - Fonts from `assets/fonts/Gaegu-Bold.ttf` + `Caveat.ttf` via
131
- `ImageFont.truetype`; fall back to `ImageFont.load_default()` if missing.
132
- - `export_pdf` and `export_coloring_pdf` use `render_cover_image(...)` for page 1.
133
-
134
- ### Fonts to bundle (OFL, free to redistribute)
135
- - `assets/fonts/Gaegu-Bold.ttf`
136
- - `assets/fonts/Caveat.ttf` (Regular or SemiBold)
137
-
138
- ---
139
-
140
- ## Error handling (never crash a generation)
141
- - `to_line_art`: OpenCV failure → Otsu threshold fallback → original image.
142
- - `render_cover_image`: missing font → default font; PIL failure → old text cover.
143
- - `create_book` body wrapped so exceptions yield an error state, not a throw.
144
-
145
- ## Testing
146
- - Unit:
147
- - `to_line_art` → output PNG is mostly white with a meaningful fraction of dark
148
- pixels (line art), valid dimensions.
149
- - `render_cover_image` → returns a valid PNG of expected size; runs with fonts
150
- present and with fonts removed (fallback path).
151
- - `export_coloring_pdf` / `export_pdf` → produce a non-empty PDF whose page 1 is
152
- the cover image.
153
- - Manual (run_modal.py): checkbox ON → loader animates → coloring section +
154
- "Download Coloring Book (PDF)" appear; the outline pages are visibly the SAME
155
- scenes as the color book; both PDFs open with the styled cover.
156
-
157
- ## Files
158
- - **New:** `services/coloring.py`; `assets/fonts/Gaegu-Bold.ttf`,
159
- `assets/fonts/Caveat.ttf`; this spec.
160
- - **Changed:** `ui/layout.py` (loader CSS, checkbox, coloring outputs, wiring);
161
- `run_modal.py` (`create_book`) is the primary target; `app.py` mirrors the same
162
- three features (loader, coloring from its own color images, styled cover) for HF
163
- parity; `book_builder.py` (`render_cover_image`, `build_coloring_html`,
164
- `export_coloring_pdf`, `magic_loader_html`, `export_pdf` cover).
165
- No Modal worker changes are needed.
166
-
167
- ## Out of scope (YAGNI)
168
- - Regenerating a separate line-art via FLUX. The user wants the EXACT same image,
169
- just without color — so the coloring page is always derived from the color image,
170
- never re-generated. (No "HD lines" button, no second FLUX run.)
171
- - Per-page progress bar (needs Modal streaming; loader covers the need).
172
- - Saving/downloading the narration audio (separate future ask).
173
- - Animated illustrated pipeline loader (chose the lighter CSS version).
174
- </content>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/superpowers/specs/2026-06-14-flux-lineart-coloring-design.md DELETED
@@ -1,93 +0,0 @@
1
- # FLUX-generated line art for coloring pages
2
-
3
- **Date:** 2026-06-14
4
- **Status:** Approved, pending implementation
5
-
6
- ## Problem
7
-
8
- Coloring pages are derived from the finished full-color crayon page by
9
- `services/coloring.py:to_line_art()` (OpenCV: bilateral filter → k-means
10
- quantize → region-boundary trace). On textured/busy backgrounds (sand, hills,
11
- crayon-shaded sky) the crayon strokes fragment into hundreds of stray edges, so
12
- the coloring page is speckled and uncolorable (see the "rolling sand dunes" and
13
- "deep breath" pages). The source was never line art, so tracing it cleanly is
14
- fundamentally hard.
15
-
16
- The full-COLOR storybook pages look good and are NOT changing.
17
-
18
- ## Approach (Option B — keep color pipeline, let FLUX draw the outline)
19
-
20
- Stop guessing outlines from pixels. Hand the finished color page back to FLUX
21
- and have it **redraw the scene as clean line art** via img2img. FLUX understands
22
- the scene semantically (character + clouds + hills) so it draws shape boundaries
23
- instead of tracing crayon texture.
24
-
25
- ### Pipeline
26
-
27
- ```
28
- canonical char ─┐
29
- ├─► render_page (color, A10G) ──► STORYBOOK page (unchanged)
30
- story beat ───┘ │
31
- └─► render_lineart (FLUX img2img, A10G)
32
-
33
- └─► threshold B/W + despeckle ──► COLORING page
34
- ```
35
-
36
- ### Components
37
-
38
- 1. **`modal_workers/modal_image_gen.py` — new `render_lineart(color_png) -> bytes`**
39
- - A10G FLUX function (uses shared `GPU_FN` decorator → A10G, 300s timeout,
40
- 120s scaledown).
41
- - img2img with the **color page as the image reference** (so the coloring
42
- page matches the story picture exactly: same pose, same composition).
43
- - Prompt: *"black and white coloring book line drawing, clean bold black
44
- outlines on a pure white background, no shading, no color, no crayon
45
- texture, simple shapes a child can color in."*
46
- - Denoising `strength` tuned high enough to redraw as line art but low enough
47
- to keep composition. **Exact param verified against the live
48
- Flux2KleinPipeline and tuned on a real render before sign-off.**
49
-
50
- 2. **`services/coloring.py` — cleanup + orchestration**
51
- - Keep a small local `_crispen(png)`: threshold the FLUX output to pure
52
- black-on-white + despeckle (FLUX may leave faint gray; coloring pages must
53
- be crisp). Fast, no GPU.
54
- - `derive_coloring_pages(color_imgs)`: fan the pages out via
55
- `modal.Function.from_name("doodlebook-image-gen", "render_lineart").starmap`
56
- (concurrent, like `render_page`), then `_crispen` each.
57
- - **Fallback:** if the Modal call fails, fall back to the existing OpenCV
58
- `to_line_art` (renamed `_to_line_art_opencv`) — so it degrades gracefully,
59
- never worse than today.
60
-
61
- ### Decisions (confirmed)
62
-
63
- - **Trace source:** the finished color page (matches the story picture).
64
- - **When it runs:** only when "Also make a coloring book" is checked — no extra
65
- cost/time otherwise. `run_modal.py:182` already gates `derive_coloring_pages`
66
- behind `make_coloring`, so no caller change needed.
67
-
68
- ### Cost / latency
69
-
70
- +1 A10G render per page (~6 extra renders/book), opt-in. Cheap on A10G.
71
-
72
- ## Ships with the GPU/deadlock fix
73
-
74
- Already edited (pending `modal deploy`): image-gen GPU A100→A10G,
75
- `scaledown_window` 1200→120s, per-call `timeout` 24h→300s. The deploy that ships
76
- line art also activates these. See the separate deadlock diagnosis (9 idle A100
77
- containers pinning the 10-GPU account quota → queued calls never timed out →
78
- app spun forever).
79
-
80
- ## Testing
81
-
82
- 1. Verify Flux2KleinPipeline img2img accepts a `strength`/denoising param;
83
- confirm the exact name.
84
- 2. Render ONE real book locally with coloring enabled; save the color page and
85
- the line-art coloring page side by side.
86
- 3. Confirm: clean colorable regions on the previously-bad busy pages (sand
87
- dunes, hills), character preserved, pure black-on-white, no speckle.
88
- 4. Tune `strength` if texture survives (raise) or composition drifts (lower).
89
-
90
- ## Out of scope
91
-
92
- - Changing the color storybook render (looks good).
93
- - Programmatic flat-fill colorizing (not needed; color pipeline stays).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora_finetune/dataset_prep.py DELETED
@@ -1,39 +0,0 @@
1
- """
2
- Dataset preparation for LoRA training.
3
-
4
- Prepares character images for DreamBooth-style fine-tuning.
5
- """
6
-
7
- import os
8
- from PIL import Image
9
-
10
-
11
- def prepare_training_data(
12
- input_dir: str,
13
- output_dir: str = "./training_data",
14
- target_size: int = 512,
15
- num_augmentations: int = 5
16
- ):
17
- """
18
- Prepare training images for LoRA fine-tuning.
19
-
20
- Steps:
21
- 1. Load original doodle images
22
- 2. Resize to target size
23
- 3. Create variations (flip, rotate, color shift)
24
- 4. Save with consistent naming
25
-
26
- Args:
27
- input_dir: Directory with original doodle images
28
- output_dir: Output directory for prepared data
29
- target_size: Target image size (512x512 for FLUX)
30
- num_augmentations: Number of augmented versions per image
31
- """
32
- os.makedirs(output_dir, exist_ok=True)
33
-
34
- # Phase 5: Full implementation with augmentations
35
- raise NotImplementedError("Phase 5: Dataset preparation")
36
-
37
-
38
- if __name__ == "__main__":
39
- print("Run this script to prepare training data for LoRA.")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lora_finetune/train_lora.py DELETED
@@ -1,287 +0,0 @@
1
- """
2
- FLUX LoRA Training Script — Phase 5
3
-
4
- Trains a crayon-style LoRA for character consistency on FLUX.2-klein.
5
- Uses DreamBooth-style fine-tuning with trigger token [DOODLECHAR].
6
-
7
- Usage:
8
- python train_lora.py --images_dir ./training_images --output_dir ./lora-weights
9
-
10
- Requirements:
11
- - diffusers>=0.28
12
- - peft
13
- - torch
14
- - accelerate
15
- """
16
-
17
- import argparse
18
- import os
19
- import json
20
- import logging
21
- from pathlib import Path
22
-
23
- logger = logging.getLogger(__name__)
24
-
25
-
26
- # ============================================================================
27
- # TRAINING CONFIGURATION
28
- # ============================================================================
29
-
30
- LORA_CONFIG = {
31
- "rank": 16,
32
- "alpha": 16,
33
- "target_modules": [
34
- "to_q", "to_k", "to_v", "to_out.0",
35
- "add_q_proj", "add_k_proj", "add_v_proj", "to_add_out"
36
- ],
37
- "instance_prompt": "photo of [DOODLECHAR] character, crayon drawing style",
38
- "class_prompt": "photo of a character",
39
- "pretrained_model": "black-forest-labs/FLUX.2-klein-4B",
40
- "resolution": 512,
41
- "train_batch_size": 1,
42
- "gradient_accumulation_steps": 4,
43
- "learning_rate": 1e-4,
44
- "max_train_steps": 300,
45
- "checkpointing_steps": 50,
46
- "seed": 42,
47
- }
48
-
49
-
50
- def train_lora(
51
- training_images: list[str],
52
- output_dir: str = "./lora-weights",
53
- num_steps: int = 300,
54
- learning_rate: float = 1e-4,
55
- rank: int = 16,
56
- alpha: int = 16,
57
- ):
58
- """
59
- Train LoRA on FLUX.2-klein for crayon-style character consistency.
60
-
61
- Uses DreamBooth-style fine-tuning:
62
- - Trigger token: [DOODLECHAR]
63
- - Target: Cross-attention layers in UNet
64
- - Loss: Cross-entropy with instance prompt
65
-
66
- Args:
67
- training_images: Paths to training images (10-15 images recommended)
68
- output_dir: Where to save LoRA weights
69
- num_steps: Training steps (200-400 recommended)
70
- learning_rate: Learning rate (1e-4 default)
71
- rank: LoRA rank (16 recommended)
72
- alpha: LoRA alpha (16 recommended, equals rank for scaling=1.0)
73
-
74
- Returns:
75
- Path to saved LoRA weights
76
- """
77
- import torch
78
- from diffusers import FluxPipeline
79
- from peft import LoraConfig, get_peft_model
80
- from torch.utils.data import Dataset, DataLoader
81
- from PIL import Image
82
- import torchvision.transforms as transforms
83
-
84
- # Create output directory
85
- os.makedirs(output_dir, exist_ok=True)
86
-
87
- logger.info(f"Training LoRA with {len(training_images)} images")
88
- logger.info(f"Config: rank={rank}, alpha={alpha}, steps={num_steps}, lr={learning_rate}")
89
-
90
- # Load FLUX pipeline
91
- logger.info("Loading FLUX.2-klein pipeline...")
92
- pipe = FluxPipeline.from_pretrained(
93
- LORA_CONFIG["pretrained_model"],
94
- torch_dtype=torch.bfloat16
95
- )
96
-
97
- # Configure LoRA
98
- lora_config = LoraConfig(
99
- r=rank,
100
- lora_alpha=alpha,
101
- target_modules=LORA_CONFIG["target_modules"],
102
- lora_dropout=0.0,
103
- bias="none",
104
- )
105
-
106
- # Apply LoRA to UNet
107
- logger.info("Applying LoRA to UNet...")
108
- pipe.unet = get_peft_model(pipe.unet, lora_config)
109
- pipe.unet.print_trainable_parameters()
110
-
111
- # Create dataset
112
- class CrayonDataset(Dataset):
113
- def __init__(self, image_paths, transform=None):
114
- self.image_paths = image_paths
115
- self.transform = transform or transforms.Compose([
116
- transforms.Resize((512, 512)),
117
- transforms.ToTensor(),
118
- transforms.Normalize([0.5], [0.5])
119
- ])
120
-
121
- def __len__(self):
122
- return len(self.image_paths)
123
-
124
- def __getitem__(self, idx):
125
- img = Image.open(self.image_paths[idx]).convert("RGB")
126
- return self.transform(img)
127
-
128
- dataset = CrayonDataset(training_images)
129
- dataloader = DataLoader(dataset, batch_size=1, shuffle=True)
130
-
131
- # Training loop
132
- logger.info("Starting training...")
133
- pipe.unet.train()
134
-
135
- optimizer = torch.optim.AdamW(pipe.unet.parameters(), lr=learning_rate)
136
-
137
- for step in range(num_steps):
138
- for batch in dataloader:
139
- # Forward pass with noise
140
- noise = torch.randn_like(batch)
141
- timesteps = torch.randint(0, 1000, (batch.shape[0],), device=batch.device)
142
-
143
- # Simple training step (simplified for demonstration)
144
- optimizer.zero_grad()
145
- loss = torch.tensor(0.0, requires_grad=True) # Placeholder
146
- loss.backward()
147
- optimizer.step()
148
-
149
- if (step + 1) % LORA_CONFIG["checkpointing_steps"] == 0:
150
- logger.info(f"Step {step + 1}/{num_steps}")
151
-
152
- # Save LoRA weights
153
- logger.info("Saving LoRA weights...")
154
- pipe.unet.save_pretrained(output_dir)
155
-
156
- # Save training config
157
- config_path = os.path.join(output_dir, "training_config.json")
158
- with open(config_path, "w") as f:
159
- json.dump({
160
- **LORA_CONFIG,
161
- "rank": rank,
162
- "alpha": alpha,
163
- "num_steps": num_steps,
164
- "learning_rate": learning_rate,
165
- "training_images": len(training_images),
166
- }, f, indent=2)
167
-
168
- logger.info(f"LoRA saved to: {output_dir}")
169
- return output_dir
170
-
171
-
172
- def publish_to_hf(
173
- local_path: str,
174
- repo_id: str = "build-small-hackathon/doodlebook-flux-lora"
175
- ):
176
- """Upload trained LoRA to HuggingFace Hub (Well-Tuned badge)."""
177
- from huggingface_hub import HfApi
178
-
179
- api = HfApi()
180
-
181
- # Create repo if it doesn't exist
182
- api.create_repo(repo_id, repo_type="model", exist_ok=True)
183
-
184
- # Upload files
185
- api.upload_folder(
186
- folder_path=local_path,
187
- repo_id=repo_id,
188
- repo_type="model",
189
- commit_message="Upload crayon-style LoRA for DoodleBook"
190
- )
191
-
192
- logger.info(f"Published LoRA to: https://huggingface.co/{repo_id}")
193
- return f"https://huggingface.co/{repo_id}"
194
-
195
-
196
- def prepare_training_data(
197
- input_dir: str,
198
- output_dir: str = "./training_data",
199
- target_size: int = 512,
200
- ):
201
- """
202
- Prepare training images for LoRA fine-tuning.
203
-
204
- Steps:
205
- 1. Load original doodle images
206
- 2. Resize to target size
207
- 3. Create augmentations (flip, rotate)
208
- 4. Save with consistent naming
209
- """
210
- from PIL import Image, ImageEnhance
211
- import random
212
-
213
- os.makedirs(output_dir, exist_ok=True)
214
-
215
- image_extensions = {'.jpg', '.jpeg', '.png', '.webp'}
216
- image_files = [
217
- f for f in Path(input_dir).iterdir()
218
- if f.suffix.lower() in image_extensions
219
- ]
220
-
221
- logger.info(f"Found {len(image_files)} images in {input_dir}")
222
-
223
- output_idx = 0
224
-
225
- for img_path in image_files:
226
- img = Image.open(img_path).convert("RGB")
227
- img = img.resize((target_size, target_size), Image.LANCZOS)
228
-
229
- # Save original
230
- img.save(os.path.join(output_dir, f"image_{output_idx:04d}.png"))
231
- output_idx += 1
232
-
233
- # Create augmented versions
234
- # Horizontal flip
235
- flipped = img.transpose(Image.FLIP_LEFT_RIGHT)
236
- flipped.save(os.path.join(output_dir, f"image_{output_idx:04d}.png"))
237
- output_idx += 1
238
-
239
- # Slight rotation
240
- rotated = img.rotate(random.uniform(-10, 10), fillcolor=(255, 255, 255))
241
- rotated.save(os.path.join(output_dir, f"image_{output_idx:04d}.png"))
242
- output_idx += 1
243
-
244
- # Color variation
245
- enhancer = ImageEnhance.Color(img)
246
- varied = enhancer.enhance(random.uniform(0.8, 1.2))
247
- varied.save(os.path.join(output_dir, f"image_{output_idx:04d}.png"))
248
- output_idx += 1
249
-
250
- logger.info(f"Prepared {output_idx} training images in {output_dir}")
251
- return output_idx
252
-
253
-
254
- if __name__ == "__main__":
255
- parser = argparse.ArgumentParser(description="Train crayon-style LoRA for DoodleBook")
256
- parser.add_argument("--images_dir", required=True, help="Directory with training images")
257
- parser.add_argument("--output_dir", default="./lora-weights", help="Output directory")
258
- parser.add_argument("--num_steps", type=int, default=300, help="Training steps")
259
- parser.add_argument("--learning_rate", type=float, default=1e-4, help="Learning rate")
260
- parser.add_argument("--rank", type=int, default=16, help="LoRA rank")
261
- parser.add_argument("--publish", action="store_true", help="Publish to HF Hub")
262
-
263
- args = parser.parse_args()
264
-
265
- logging.basicConfig(level=logging.INFO)
266
-
267
- # Prepare training data
268
- training_dir = "./training_data"
269
- prepare_training_data(args.images_dir, training_dir)
270
-
271
- # Get training images
272
- training_images = list(Path(training_dir).glob("*.png"))
273
-
274
- # Train LoRA
275
- output_dir = train_lora(
276
- [str(p) for p in training_images],
277
- args.output_dir,
278
- args.num_steps,
279
- args.learning_rate,
280
- args.rank
281
- )
282
-
283
- # Publish if requested
284
- if args.publish:
285
- publish_to_hf(output_dir)
286
-
287
- print("Training complete!")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
modal_workers/__init__.py DELETED
File without changes
modal_workers/modal_image_gen.py DELETED
@@ -1,332 +0,0 @@
1
- """
2
- Modal image generation — FLUX.2-klein-4B (verified API) on A100.
3
-
4
- Verified against the live model card (June 2026):
5
- from diffusers import Flux2KleinPipeline
6
- pipe(prompt=..., guidance_scale=1.0, num_inference_steps=4) # fast distilled
7
-
8
- Character consistency (C1): identical character_description on every page +
9
- locked seed S, page i uses S+i. The character_description is produced upstream
10
- by the vision worker (reads the child's doodle) so the hero matches their drawing.
11
- """
12
-
13
- import os
14
- import io
15
- import logging
16
-
17
- import modal
18
-
19
- logger = logging.getLogger(__name__)
20
-
21
- app = modal.App("doodlebook-image-gen")
22
-
23
- CACHE = "/cache"
24
- vol = modal.Volume.from_name("doodlebook-hf-cache", create_if_missing=True)
25
- HF_SECRET = modal.Secret.from_name("huggingface")
26
-
27
- flux_image = (
28
- modal.Image.debian_slim(python_version="3.11")
29
- .pip_install(
30
- "torch", "diffusers", "transformers", "accelerate",
31
- "sentencepiece", "pillow", "huggingface_hub",
32
- )
33
- .env({"HF_HOME": CACHE})
34
- )
35
-
36
- MIN_CONTAINERS = int(os.environ.get("DOODLEBOOK_KEEP_WARM", "0"))
37
- # FLUX.2-klein-4B is ~13GB in bf16 (see config.FLUX_MODEL.vram_gb) — fits an
38
- # A10G (24GB) with room to spare, ~4-5x cheaper than A100-40GB and far more
39
- # available, so books stop exhausting the scarce A100 pool.
40
- GPU = "A10G"
41
- # A real per-call timeout: if a render can't get a GPU slot (account GPU quota
42
- # exhausted) it FAILS instead of queuing for 24h, so services/images.py falls
43
- # back to the local sketch instead of the app spinning forever.
44
- RENDER_TIMEOUT = 300 # 5 min — generous for a cold start + 6-step render
45
- # Short scaledown so idle containers release their GPU quota quickly instead of
46
- # pinning it for 20 min and blocking the next book (this was the deadlock).
47
- SCALEDOWN = 120
48
- FLUX_ID = "black-forest-labs/FLUX.2-klein-4B"
49
- DEFAULT_ART_STYLE = (
50
- "children's crayon storybook illustration, bold black outlines, "
51
- "flat bright colors, simple shapes"
52
- )
53
- DEFAULT_COLORING_STYLE = (
54
- "children's coloring book page, pure black ink outlines on pure white paper, "
55
- "clean contour lines, no color, no gray, no shading, no texture, "
56
- "no hatching, no pencil marks, open spaces to color"
57
- )
58
- GPU_FN = dict( # shared decorator kwargs for the FLUX functions
59
- gpu=GPU, image=flux_image, volumes={CACHE: vol}, secrets=[HF_SECRET],
60
- timeout=RENDER_TIMEOUT, min_containers=MIN_CONTAINERS, scaledown_window=SCALEDOWN,
61
- )
62
-
63
- # loaded once per warm container, reused across calls
64
- _PIPE = None
65
-
66
-
67
- def _get_pipe():
68
- global _PIPE
69
- if _PIPE is None:
70
- import torch
71
- from diffusers import Flux2KleinPipeline
72
- logger.info("Loading FLUX.2-klein-4B…")
73
- _PIPE = Flux2KleinPipeline.from_pretrained(
74
- FLUX_ID, torch_dtype=torch.bfloat16, cache_dir=CACHE,
75
- )
76
- _PIPE.enable_model_cpu_offload()
77
- logger.info("FLUX ready.")
78
- return _PIPE
79
-
80
-
81
- @app.function(**GPU_FN)
82
- def generate_book_pages(
83
- character_desc: str,
84
- story_beats: list[str],
85
- doodle: bytes = None,
86
- art_style: str = "children's crayon storybook illustration, bold black outlines, flat bright colors, simple shapes",
87
- seed: int = 42,
88
- lora_repo: str = None,
89
- tiny: bool = False,
90
- ) -> list[bytes]:
91
- """
92
- Render all 6 pages so the hero MATCHES THE CHILD'S DRAWING.
93
-
94
- Two-stage when a doodle is provided (FLUX.2-klein image reference):
95
- Stage 1: doodle -> canonical full-body character (same creature, colorized)
96
- Stage 2: canonical -> the SAME character placed into each story scene
97
- Falls back to text2img from `character_desc` only when no doodle is given.
98
- """
99
- import torch
100
- from PIL import Image
101
-
102
- pipe = _get_pipe()
103
- if lora_repo:
104
- try:
105
- pipe.load_lora_weights(lora_repo)
106
- logger.info(f"LoRA loaded: {lora_repo}")
107
- except Exception as e:
108
- logger.warning(f"LoRA load failed ({e}); base model")
109
-
110
- steps = 4 if tiny else 6
111
-
112
- def _gen(image, prompt, s):
113
- kw = dict(prompt=prompt, height=768, width=768, guidance_scale=1.0,
114
- num_inference_steps=steps,
115
- generator=torch.Generator("cuda").manual_seed(s))
116
- if image is not None:
117
- kw["image"] = image
118
- return pipe(**kw).images[0]
119
-
120
- # --- Stage 1: canonical character from the actual drawing ---
121
- canonical = None
122
- if doodle:
123
- try:
124
- ref = Image.open(io.BytesIO(doodle)).convert("RGB")
125
- canonical = _gen(
126
- ref,
127
- ("Turn this child's drawing into a clean, friendly, full-body cartoon "
128
- "character for a children's storybook. Keep the EXACT same creature, "
129
- "face, and features as the drawing. " + art_style +
130
- ", plain white background, full character visible, centered."),
131
- seed,
132
- )
133
- logger.info("canonical character built from doodle")
134
- except Exception as e:
135
- logger.warning(f"canonical build failed ({e}); text2img fallback")
136
- canonical = None
137
-
138
- # --- Stage 2: place the SAME character into each scene ---
139
- pages = []
140
- for i, beat in enumerate(story_beats):
141
- if canonical is not None:
142
- prompt = (
143
- f"The same character. {beat}. {art_style}, "
144
- f"full colorful background scene, the character clearly visible."
145
- )
146
- img = _gen(canonical, prompt, seed + i + 1)
147
- else:
148
- prompt = (
149
- f"{character_desc}. Scene: {beat}. {art_style}, white background, "
150
- f"centered, full character visible, same character design throughout"
151
- )
152
- img = _gen(None, prompt, seed + i + 1)
153
- buf = io.BytesIO(); img.save(buf, format="PNG")
154
- pages.append(buf.getvalue())
155
- logger.info(f"page {i+1}/{len(story_beats)} done")
156
-
157
- if lora_repo:
158
- try:
159
- pipe.unload_lora_weights()
160
- except Exception:
161
- pass
162
- vol.commit()
163
- return pages
164
-
165
-
166
- # ============================================================================
167
- # PARALLEL PATH — split into canonical (1 call) + per-page (fan out via .starmap)
168
- # so the 6 scenes render concurrently across warm containers instead of one
169
- # container doing 7 inferences back-to-back. Orchestrated by services/images.py.
170
- # ============================================================================
171
-
172
- # canonical runs once per book, so keep at most ONE warm (don't double the bill)
173
- _CANON_FN = {**GPU_FN, "min_containers": min(1, MIN_CONTAINERS)}
174
-
175
-
176
- @app.function(**_CANON_FN)
177
- def build_canonical(
178
- doodle: bytes,
179
- art_style: str = DEFAULT_ART_STYLE,
180
- seed: int = 42,
181
- tiny: bool = False,
182
- ) -> bytes:
183
- """Stage 1: child's drawing -> canonical full-body character (PNG bytes).
184
- Returns b"" when no doodle is given (caller then renders text2img per page)."""
185
- if not doodle:
186
- return b""
187
- import io
188
- import torch
189
- from PIL import Image
190
-
191
- pipe = _get_pipe()
192
- ref = Image.open(io.BytesIO(doodle)).convert("RGB")
193
- img = pipe(
194
- prompt=("Turn this child's drawing into a clean, friendly, full-body cartoon "
195
- "character for a children's storybook. Keep the EXACT same creature, "
196
- "face, and features as the drawing. " + art_style +
197
- ", plain white background, full character visible, centered."),
198
- image=ref, height=768, width=768, guidance_scale=1.0,
199
- num_inference_steps=4 if tiny else 6,
200
- generator=torch.Generator("cuda").manual_seed(seed),
201
- ).images[0]
202
- buf = io.BytesIO(); img.save(buf, format="PNG")
203
- return buf.getvalue()
204
-
205
-
206
- @app.function(**GPU_FN)
207
- def render_page(
208
- canonical: bytes,
209
- character_desc: str,
210
- beat: str,
211
- art_style: str = DEFAULT_ART_STYLE,
212
- seed: int = 42,
213
- tiny: bool = False,
214
- ) -> bytes:
215
- """Stage 2: render ONE scene. Uses the canonical character as an image
216
- reference when provided (consistency), else text2img from character_desc."""
217
- import io
218
- import torch
219
- from PIL import Image
220
-
221
- pipe = _get_pipe()
222
- if canonical:
223
- ref = Image.open(io.BytesIO(canonical)).convert("RGB")
224
- prompt = (f"The same character. {beat}. {art_style}, "
225
- f"full colorful background scene, the character clearly visible.")
226
- kw = dict(prompt=prompt, image=ref)
227
- else:
228
- prompt = (f"{character_desc}. Scene: {beat}. {art_style}, white background, "
229
- f"centered, full character visible, same character design throughout")
230
- kw = dict(prompt=prompt)
231
- kw.update(height=768, width=768, guidance_scale=1.0,
232
- num_inference_steps=4 if tiny else 6,
233
- generator=torch.Generator("cuda").manual_seed(seed))
234
- img = pipe(**kw).images[0]
235
- buf = io.BytesIO(); img.save(buf, format="PNG")
236
- return buf.getvalue()
237
-
238
-
239
- @app.function(**GPU_FN)
240
- def render_coloring_page(
241
- canonical: bytes,
242
- character_desc: str,
243
- beat: str,
244
- art_style: str = DEFAULT_COLORING_STYLE,
245
- seed: int = 42,
246
- tiny: bool = False,
247
- ) -> bytes:
248
- """Stage 2 alternate render: same scene, but directly as clean line art."""
249
- import io
250
- import torch
251
- from PIL import Image
252
-
253
- pipe = _get_pipe()
254
- if canonical:
255
- ref = Image.open(io.BytesIO(canonical)).convert("RGB")
256
- prompt = (
257
- f"The same character. {beat}. {art_style}, simple clean background shapes, "
258
- f"same composition, thick readable outlines, no filled black areas, "
259
- f"no extra sketch marks."
260
- )
261
- kw = dict(prompt=prompt, image=ref)
262
- else:
263
- prompt = (
264
- f"{character_desc}. Scene: {beat}. {art_style}, white background, "
265
- f"centered, full character visible, same character design throughout"
266
- )
267
- kw = dict(prompt=prompt)
268
- kw.update(
269
- height=768,
270
- width=768,
271
- guidance_scale=1.0,
272
- num_inference_steps=4 if tiny else 6,
273
- generator=torch.Generator("cuda").manual_seed(seed),
274
- )
275
- img = pipe(**kw).images[0]
276
- buf = io.BytesIO()
277
- img.save(buf, format="PNG")
278
- return buf.getvalue()
279
-
280
-
281
- LINEART_PROMPT = (
282
- "black and white coloring book line art, clean bold contour lines only on a "
283
- "pure white background, no shading, no gray, no color, no fill, no crayon "
284
- "texture, no crosshatching, no tiny details, simple shapes a child can color"
285
- )
286
-
287
- LINEART_NEGATIVE_PROMPT = (
288
- "color, grayscale, shadows, shading, gradients, texture, speckles, noise, "
289
- "blur, sketch shading, hatch marks, crosshatching, filled shapes, busy background"
290
- )
291
-
292
-
293
- @app.function(**GPU_FN)
294
- def render_lineart(color_png: bytes, seed: int = 42) -> bytes:
295
- """Turn a finished COLOR page into clean coloring-book line art.
296
-
297
- img2img from the color page so the coloring page matches the story picture
298
- (same pose/composition), but FLUX REDRAWS it as outlines — it understands the
299
- scene semantically (kid + clouds + hills) and traces shape boundaries instead
300
- of the crayon texture that wrecked the old OpenCV edge-trace.
301
-
302
- `strength` controls how far it departs from the source: high enough to redraw
303
- as flat line art, low enough to keep the composition. Flux2KleinPipeline may
304
- not expose `strength` (it's a unified edit/reference pipeline), so we pass it
305
- when accepted and silently retry without it.
306
- """
307
- import io
308
- import torch
309
- from PIL import Image
310
-
311
- pipe = _get_pipe()
312
- ref = Image.open(io.BytesIO(color_png)).convert("RGB")
313
- base = dict(
314
- prompt=LINEART_PROMPT, image=ref, height=768, width=768,
315
- guidance_scale=1.0, num_inference_steps=6,
316
- generator=torch.Generator("cuda").manual_seed(seed),
317
- )
318
- try:
319
- img = pipe(**base, strength=0.68, negative_prompt=LINEART_NEGATIVE_PROMPT).images[0]
320
- except TypeError:
321
- logger.info("pipeline rejected `strength`; retrying without it")
322
- try:
323
- img = pipe(**base, negative_prompt=LINEART_NEGATIVE_PROMPT).images[0]
324
- except TypeError:
325
- img = pipe(**base).images[0]
326
- buf = io.BytesIO(); img.save(buf, format="PNG")
327
- return buf.getvalue()
328
-
329
-
330
- @app.function(image=flux_image, timeout=30)
331
- def health_check() -> str:
332
- return "image_gen_healthy"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
modal_workers/modal_story_gen.py DELETED
@@ -1,397 +0,0 @@
1
- """
2
- Modal story generation — MiniCPM5-1B on T4 GPU.
3
-
4
- C2 Compliance: 3-layer JSON parser + template fallback
5
- - Layer 1: Regex extraction of {...} block
6
- - Layer 2: json-repair / json5 parsing
7
- - Layer 3: Deterministic template fallback (NEVER crashes)
8
-
9
- Few-shot prompt with ONE full exemplar for reliable JSON output.
10
- Greedy decode (do_sample=False) for determinism.
11
- """
12
-
13
- import modal
14
- import re
15
- import json
16
- import logging
17
-
18
- logger = logging.getLogger(__name__)
19
-
20
- app = modal.App("doodlebook-story")
21
-
22
- story_env = modal.Image.debian_slim().pip_install(
23
- "transformers>=4.40", "torch", "accelerate", "sentencepiece"
24
- )
25
-
26
-
27
- # ============================================================================
28
- # FEW-SHOT EXEMPLAR
29
- # ============================================================================
30
-
31
- FEW_SHOT_EXEMPLAR = """
32
- Write a 6-page children's storybook for age 5 about Luna the cat with theme: brave adventure.
33
-
34
- Return ONLY valid JSON:
35
- {
36
- "title": "Luna's Brave Adventure",
37
- "character_description": "A small orange tabby cat named Luna with big green eyes, whiskers, and a tiny red scarf",
38
- "pages": [
39
- {"page": 1, "text": "Luna was a small orange cat who loved to explore.", "scene": "Luna sitting by the window looking outside"},
40
- {"page": 2, "text": "One sunny morning, Luna saw something sparkling in the forest.", "scene": "Luna spotting a glow in the trees"},
41
- {"page": 3, "text": "Bravely, Luna crept into the forest to investigate.", "scene": "Luna walking cautiously through trees"},
42
- {"page": 4, "text": "It was a tiny fairy stuck in a spider web!", "scene": "Luna discovering a fairy in trouble"},
43
- {"page": 5, "text": "Luna gently freed the fairy with her paw.", "scene": "Luna carefully helping the fairy"},
44
- {"page": 6, "text": "The fairy thanked Luna and they became friends forever.", "scene": "Luna and fairy playing together at sunset"}
45
- ]
46
- }
47
- """
48
-
49
-
50
- # ============================================================================
51
- # STORY GENERATION PROMPT
52
- # ============================================================================
53
-
54
- def build_prompt(hero_name: str, theme: str, age: int) -> str:
55
- """Build few-shot prompt for story generation."""
56
- return f"""{FEW_SHOT_EXEMPLAR}
57
-
58
- Write a 6-page children's storybook for age {age} about {hero_name} with theme: {theme}.
59
-
60
- Return ONLY valid JSON:
61
- """
62
-
63
-
64
- # ============================================================================
65
- # 3-LAYER JSON PARSER (C2)
66
- # ============================================================================
67
-
68
- def parse_story_json(raw_output: str) -> dict:
69
- """
70
- 3-layer parser: regex → json5/repair → template fallback.
71
-
72
- Layer 1: Extract {...} block with regex
73
- Layer 2: Parse with json.loads, repair common issues
74
- Layer 3: Return deterministic template (NEVER crashes)
75
-
76
- Args:
77
- raw_output: Raw model output string
78
-
79
- Returns:
80
- Parsed story dict with keys: title, character_description, pages
81
- """
82
- # Layer 1: Regex extraction
83
- story = _layer1_regex_extract(raw_output)
84
- if story:
85
- return story
86
-
87
- # Layer 2: JSON repair
88
- story = _layer2_json_repair(raw_output)
89
- if story:
90
- return story
91
-
92
- # Layer 3: Template fallback (NEVER crashes)
93
- logger.warning("All parsing failed, using template fallback")
94
- return _layer3_template_fallback(raw_output)
95
-
96
-
97
- def _layer1_regex_extract(text: str) -> dict | None:
98
- """Layer 1: Extract {...} block with regex."""
99
- try:
100
- # Find the outermost {...} block
101
- match = re.search(r'\{[\s\S]*\}', text)
102
- if not match:
103
- return None
104
-
105
- json_str = match.group(0)
106
- story = json.loads(json_str)
107
-
108
- # Validate structure
109
- if _validate_story_structure(story):
110
- return story
111
- return None
112
- except (json.JSONDecodeError, KeyError, TypeError):
113
- return None
114
-
115
-
116
- def _layer2_json_repair(text: str) -> dict | None:
117
- """Layer 2: Repair common JSON issues and parse."""
118
- try:
119
- # Find the {...} block
120
- match = re.search(r'\{[\s\S]*\}', text)
121
- if not match:
122
- return None
123
-
124
- json_str = match.group(0)
125
-
126
- # Common repairs
127
- json_str = _repair_json(json_str)
128
-
129
- story = json.loads(json_str)
130
-
131
- if _validate_story_structure(story):
132
- return story
133
- return None
134
- except (json.JSONDecodeError, KeyError, TypeError):
135
- return None
136
-
137
-
138
- def _repair_json(json_str: str) -> str:
139
- """Repair common JSON issues from 1B model output."""
140
- # Remove trailing commas before } or ] (with optional whitespace)
141
- json_str = re.sub(r',\s*([}\]])', r'\1', json_str)
142
-
143
- # Remove single-line // comments
144
- json_str = re.sub(r'//.*?$', '', json_str, flags=re.MULTILINE)
145
-
146
- # Remove multi-line comments /* ... */
147
- json_str = re.sub(r'/\*[\s\S]*?\*/', '', json_str)
148
-
149
- # Fix unescaped newlines in strings
150
- json_str = re.sub(r'(?<=")\n(?=")', '\\n', json_str)
151
-
152
- # Fix missing quotes around keys (word before colon)
153
- json_str = re.sub(r'(\s)(\w+)(\s*:)', r'\1"\2"\3', json_str)
154
-
155
- return json_str
156
-
157
-
158
- def _validate_story_structure(story: dict) -> bool:
159
- """Validate story has required structure."""
160
- required_keys = ["title", "character_description", "pages"]
161
- if not all(k in story for k in required_keys):
162
- return False
163
-
164
- pages = story.get("pages", [])
165
- if not isinstance(pages, list) or len(pages) < 1:
166
- return False
167
-
168
- # Check first page has required fields
169
- first_page = pages[0]
170
- if not all(k in first_page for k in ["page", "text", "scene"]):
171
- return False
172
-
173
- return True
174
-
175
-
176
- def _layer3_template_fallback(raw_output: str) -> dict:
177
- """
178
- Layer 3: Deterministic template fallback.
179
- NEVER crashes - always returns valid 6-page book.
180
- """
181
- # Try to extract any useful text from raw output
182
- extracted_text = raw_output[:200] if raw_output else "an adventure"
183
-
184
- return {
185
- "title": "A Wonderful Adventure",
186
- "character_description": f"A friendly character who went on {extracted_text}",
187
- "pages": [
188
- {"page": 1, "text": "Once upon a time, there was a character who loved adventures.", "scene": "Character introduction"},
189
- {"page": 2, "text": "One day, something exciting happened.", "scene": "Inciting incident"},
190
- {"page": 3, "text": "The character bravely faced the challenge.", "scene": "Rising action"},
191
- {"page": 4, "text": "With courage and kindness, the character succeeded.", "scene": "Climax"},
192
- {"page": 5, "text": "Friends gathered to celebrate the victory.", "scene": "Resolution"},
193
- {"page": 6, "text": "And they all lived happily ever after. The end.", "scene": "Happy ending"}
194
- ]
195
- }
196
-
197
-
198
- # ============================================================================
199
- # TEMPLATE STORY (for testing without Modal)
200
- # ============================================================================
201
-
202
- # ---------------------------------------------------------------------------
203
- # Local story generator — theme-accurate, character-aware, and VARIED.
204
- # Each theme has its own 6-beat arc; slots ({place}, {friend}, {thing}, {feeling})
205
- # are filled from per-theme word banks chosen by a seed derived from hero+theme,
206
- # so different heroes/themes produce different books (no more identical text).
207
- # ---------------------------------------------------------------------------
208
-
209
- _PLACES = ["whispering forest", "sunny meadow", "sparkling river", "cloud-top hill",
210
- "hidden garden", "snowy valley", "rolling sand dunes", "moonlit lake"]
211
- _FRIENDS = ["a shy little fox", "a lost baby bird", "a giggling firefly", "a sleepy turtle",
212
- "a tiny dragon", "a kind old owl", "a bouncing bunny", "a glowing jellyfish"]
213
- _THINGS = ["a glowing key", "a singing flower", "a map of stars", "a tiny golden bell",
214
- "a magic seed", "a shimmering shell", "a friendly lantern", "a curious door"]
215
-
216
-
217
- def _theme_arc(theme: str, hero: str, place: str, friend: str, thing: str) -> dict:
218
- """Return {title, pages[6]} for the given theme, with slots filled in."""
219
- T = {
220
- "brave adventure": {
221
- "title": f"{hero}'s Brave Adventure",
222
- "pages": [
223
- (f"{hero} woke up wanting to explore the world.", f"{hero} standing at the edge of a {place}"),
224
- (f"At the {place}, {hero} found {thing} glowing softly.", f"{hero} discovering {thing}"),
225
- (f"Taking a deep breath, {hero} bravely followed where it led.", f"{hero} walking bravely into the {place}"),
226
- (f"There, {friend} was stuck and a little scared.", f"{friend} in trouble, {hero} nearby"),
227
- (f"{hero} was brave and gently helped {friend} get free.", f"{hero} helping {friend}"),
228
- (f"Side by side they went home, and {hero} felt brave and proud.", f"{hero} and {friend} heading home at sunset"),
229
- ],
230
- },
231
- "making a new friend": {
232
- "title": f"{hero} Makes a Friend",
233
- "pages": [
234
- (f"{hero} was playing alone in the {place}.", f"{hero} playing alone in a {place}"),
235
- (f"Nearby, {friend} sat all by itself, looking lonely.", f"{friend} sitting alone"),
236
- (f"{hero} walked over and said a cheerful hello.", f"{hero} greeting {friend} with a wave"),
237
- (f"They shared {thing} and laughed together.", f"{hero} and {friend} sharing {thing}"),
238
- (f"All afternoon they played their favorite games.", f"{hero} and {friend} playing games"),
239
- (f"Now {hero} knew: a friend is just a hello away.", f"{hero} and {friend} smiling together"),
240
- ],
241
- },
242
- "overcoming a fear": {
243
- "title": f"{hero} and the Big Brave Day",
244
- "pages": [
245
- (f"{hero} felt scared of the dark {place}.", f"{hero} looking nervously at a dark {place}"),
246
- (f"But {friend} needed {thing} from inside it.", f"{friend} asking {hero} for help"),
247
- (f"{hero}'s tummy felt wobbly, but {hero} took one small step.", f"{hero} taking a brave first step into the {place}"),
248
- (f"One step, then another — it wasn't so scary after all.", f"{hero} walking carefully, growing braver"),
249
- (f"{hero} found {thing} and carried it back proudly.", f"{hero} holding {thing} triumphantly"),
250
- (f"{hero} learned that being brave means trying, even when you're scared.", f"{hero} and {friend} celebrating"),
251
- ],
252
- },
253
- "helping someone": {
254
- "title": f"{hero} Lends a Hand",
255
- "pages": [
256
- (f"One morning {hero} skipped through the {place}.", f"{hero} walking happily through a {place}"),
257
- (f"{hero} heard a tiny cry — it was {friend}!", f"{hero} noticing {friend} in need"),
258
- (f"{friend} had dropped {thing} and couldn't reach it.", f"{friend} reaching for {thing}"),
259
- (f"{hero} thought hard and came up with a clever plan.", f"{hero} thinking of a plan"),
260
- (f"Together they got {thing} back, and {friend} cheered.", f"{hero} and {friend} succeeding together"),
261
- (f"Helping others made {hero}'s heart feel warm and happy.", f"{hero} and {friend} hugging"),
262
- ],
263
- },
264
- "lost and found": {
265
- "title": f"{hero} and the Lost {thing.split()[-1].title()}",
266
- "pages": [
267
- (f"{hero} was playing when {thing} suddenly went missing.", f"{hero} searching for {thing}"),
268
- (f"{hero} looked all around the {place}.", f"{hero} looking around a {place}"),
269
- (f"Along the way, {hero} met {friend} who wanted to help.", f"{hero} meeting {friend}"),
270
- (f"They followed tiny clues together, step by step.", f"{hero} and {friend} following a trail"),
271
- (f"At last, {thing} was found tucked beneath a leaf!", f"{hero} finding {thing}"),
272
- (f"{hero} hugged {friend} and thanked them for never giving up.", f"{hero} and {friend} happy together"),
273
- ],
274
- },
275
- "learning something new": {
276
- "title": f"{hero} Learns to Soar",
277
- "pages": [
278
- (f"{hero} really wanted to learn something new today.", f"{hero} curious in a {place}"),
279
- (f"{friend} offered to teach {hero} a wonderful trick.", f"{friend} teaching {hero}"),
280
- (f"The first try wobbled and didn't work at all.", f"{hero} trying and stumbling"),
281
- (f"{hero} practiced again and again, never giving up.", f"{hero} practicing hard"),
282
- (f"Suddenly it worked, with {thing} sparkling in the air!", f"{hero} succeeding with {thing}"),
283
- (f"{hero} beamed — trying your best helps you grow.", f"{hero} and {friend} celebrating the win"),
284
- ],
285
- },
286
- }
287
- return T.get(theme, T["brave adventure"])
288
-
289
-
290
- def generate_story_local(hero_name: str, theme: str, age: int = 5) -> dict:
291
- """
292
- Theme-accurate, varied, character-aware story (no Modal/GPU required).
293
- Deterministic per (hero, theme) but different across heroes/themes.
294
- """
295
- import random
296
- hero = (hero_name or "Little Hero").strip()
297
- hero = hero[:1].upper() + hero[1:] if hero else "Little Hero"
298
- rng = random.Random(hash((hero.lower(), theme)) & 0xFFFFFFFF)
299
- place = rng.choice(_PLACES)
300
- friend = rng.choice(_FRIENDS)
301
- thing = rng.choice(_THINGS)
302
-
303
- arc = _theme_arc(theme, hero, place, friend, thing)
304
- pages = [{"page": i + 1, "text": t, "scene": s} for i, (t, s) in enumerate(arc["pages"])]
305
-
306
- return {
307
- "title": arc["title"],
308
- "character_description": (
309
- f"{hero}, a friendly storybook character, bright crayon colors, "
310
- f"bold outlines, simple children's-book style"
311
- ),
312
- "pages": pages,
313
- }
314
-
315
-
316
- # ============================================================================
317
- # MODAL FUNCTION
318
- # ============================================================================
319
-
320
- @app.function(gpu="T4", image=story_env, timeout=120)
321
- def generate_story(character_name: str, theme: str, age: int = 5) -> dict:
322
- """
323
- Generate a 6-page children's story via MiniCPM5-1B.
324
-
325
- C2 Compliance:
326
- - Few-shot prompt with ONE full exemplar
327
- - Greedy decode (do_sample=False)
328
- - 3-layer parser + template fallback
329
- - NEVER crashes on bad model output
330
-
331
- Args:
332
- character_name: Main character name
333
- theme: Story theme
334
- age: Target age
335
-
336
- Returns:
337
- dict with keys: title, character_description, pages[{page, text, scene}]
338
- """
339
- from transformers import AutoTokenizer, AutoModelForCausalLM
340
- import torch
341
-
342
- # Load model (MiniCPM ships custom modeling code -> trust_remote_code required)
343
- model_id = "openbmb/MiniCPM5-1B"
344
- tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
345
- model = AutoModelForCausalLM.from_pretrained(
346
- model_id, torch_dtype=torch.float16, trust_remote_code=True
347
- ).cuda().eval()
348
-
349
- # Build prompt and wrap in the model's chat template (it's an instruct model;
350
- # a raw prompt generates poorly). enable_thinking=False = no reasoning preamble.
351
- prompt = build_prompt(character_name, theme, age)
352
- inputs = tok.apply_chat_template(
353
- [{"role": "user", "content": prompt}],
354
- add_generation_prompt=True,
355
- enable_thinking=False,
356
- return_dict=True,
357
- return_tensors="pt",
358
- ).to("cuda")
359
-
360
- # Generate with greedy decode for determinism
361
- with torch.no_grad():
362
- out = model.generate(
363
- **inputs,
364
- max_new_tokens=800,
365
- do_sample=False, # Greedy for determinism
366
- )
367
-
368
- response = tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
369
-
370
- # Parse with 3-layer parser (NEVER crashes)
371
- story = parse_story_json(response)
372
-
373
- # Ensure pages list has exactly 6 entries
374
- while len(story.get("pages", [])) < 6:
375
- story.setdefault("pages", []).append({
376
- "page": len(story.get("pages", [])) + 1,
377
- "text": "And the adventure continued happily.",
378
- "scene": "Continuing adventure"
379
- })
380
-
381
- return story
382
-
383
-
384
- @app.function(gpu="T4", image=story_env, timeout=30)
385
- def health_check() -> str:
386
- """Quick health check for Modal function."""
387
- return "story_gen_healthy"
388
-
389
-
390
- # ============================================================================
391
- # CLI TEST
392
- # ============================================================================
393
-
394
- if __name__ == "__main__":
395
- # Test local generation
396
- story = generate_story_local("Ziggy", "brave adventure", 5)
397
- print(json.dumps(story, indent=2))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
modal_workers/modal_tts.py DELETED
@@ -1,235 +0,0 @@
1
- """
2
- Modal TTS — VoxCPM2 narration on T4 GPU.
3
-
4
- C5 Compliance: Fallback chain
5
- - Primary: VoxCPM2 (2B, Apache 2.0)
6
- - Fallback 1: Kokoro-82M (ultra-lightweight)
7
- - Fallback 2: MeloTTS (MIT license)
8
-
9
- Generates WAV audio for book narration.
10
- """
11
-
12
- import modal
13
- import io
14
- import os
15
- import logging
16
-
17
- logger = logging.getLogger(__name__)
18
-
19
- app = modal.App("doodlebook-tts")
20
-
21
- # Keep N containers always warm so the app is "live" with no cold start.
22
- # 0 = scale to zero when idle (cheap); 1 = always-on GPU (costs money 24/7).
23
- # Set at deploy time: DOODLEBOOK_KEEP_WARM=1 modal deploy modal_workers/modal_tts.py
24
- KEEP_WARM = int(os.environ.get("DOODLEBOOK_KEEP_WARM", "0"))
25
- NO_TIMEOUT = 86400 # Modal's max (24h) — effectively no per-call timeout
26
-
27
- CACHE = "/cache"
28
- vol = modal.Volume.from_name("doodlebook-hf-cache", create_if_missing=True)
29
- HF_SECRET = modal.Secret.from_name("huggingface")
30
-
31
- tts_env = (
32
- modal.Image.debian_slim(python_version="3.11")
33
- .apt_install("ffmpeg")
34
- .pip_install("voxcpm==2.0.3", "soundfile", "torch", "huggingface_hub")
35
- .env({"HF_HOME": CACHE})
36
- )
37
-
38
- # Child-friendly voices (VoxCPM2 "voice design" prefixes). The (parenthetical) is
39
- # interpreted as a voice instruction, not spoken aloud.
40
- # MIRROR of config.VOICE_PRESETS — kept inline so the Modal deploy stays
41
- # import-free. Keep the two in sync if you edit them. Default leans young.
42
- DEFAULT_VOICE = "kid"
43
- VOICE_DESIGN = {
44
- "kid": "(A sweet little girl around seven years old telling a story to her "
45
- "friends, bright high-pitched cheerful child's voice, playful, giggly "
46
- "and full of wonder)",
47
- "big_kid": "(A lively young girl about eleven years old reading a fun story "
48
- "aloud, bright youthful energetic voice, expressive and excited)",
49
- "playful": "(A cheerful, friendly young woman telling a fun children's story, "
50
- "bright, animated, smiling, expressive)",
51
- "storyteller": "(A warm, gentle female storyteller reading a bedtime story to a "
52
- "young child, soft, soothing, slow and expressive, kind and cozy)",
53
- "grandpa": "(A kind, gentle old grandfather telling a cozy bedtime story, warm, "
54
- "slow, soothing)",
55
- }
56
-
57
- _TTS = None
58
-
59
-
60
- def _get_tts():
61
- global _TTS
62
- if _TTS is None:
63
- from voxcpm import VoxCPM
64
- logger.info("Loading VoxCPM2…")
65
- _TTS = VoxCPM.from_pretrained("openbmb/VoxCPM2", load_denoiser=False)
66
- logger.info("VoxCPM2 ready.")
67
- return _TTS
68
-
69
-
70
- @app.function(
71
- gpu="A10G", image=tts_env, volumes={CACHE: vol}, secrets=[HF_SECRET],
72
- timeout=NO_TIMEOUT, scaledown_window=1200, min_containers=KEEP_WARM,
73
- )
74
- def speak_book(text: str, voice: str = DEFAULT_VOICE) -> bytes:
75
- """
76
- Narrate the book with VoxCPM2 using a child-friendly storyteller voice.
77
-
78
- Generates sentence-by-sentence with the SAME voice-design prefix for a
79
- consistent voice, then stitches with short pauses for natural pacing.
80
- """
81
- import re
82
- import numpy as np
83
- import soundfile as sf
84
-
85
- model = _get_tts()
86
- design = VOICE_DESIGN.get(voice, VOICE_DESIGN[DEFAULT_VOICE])
87
- sr = model.tts_model.sample_rate
88
-
89
- # split into sentences so long books stay stable; keep each chunk's voice fixed
90
- chunks = [s.strip() for s in re.split(r"(?<=[.!?])\s+", text) if s.strip()]
91
- if not chunks:
92
- chunks = [text.strip() or "The end."]
93
-
94
- pause = np.zeros(int(sr * 0.35), dtype=np.float32) # gentle gap between sentences
95
- pieces = []
96
- for i, sentence in enumerate(chunks):
97
- wav = model.generate(
98
- text=f"{design} {sentence}",
99
- cfg_value=2.0,
100
- inference_timesteps=10,
101
- )
102
- pieces.append(np.asarray(wav, dtype=np.float32))
103
- if i < len(chunks) - 1:
104
- pieces.append(pause)
105
- logger.info(f"narrated sentence {i+1}/{len(chunks)}")
106
-
107
- audio = np.concatenate(pieces)
108
- buf = io.BytesIO()
109
- sf.write(buf, audio, sr, format="WAV")
110
- vol.commit()
111
- return buf.getvalue()
112
-
113
-
114
- # ============================================================================
115
- # LOCAL TTS (FOR TESTING WITHOUT MODAL)
116
- # ============================================================================
117
-
118
- def speak_book_local(text: str, voice: str = DEFAULT_VOICE) -> bytes:
119
- """
120
- Local TTS for testing (no Modal/GPU required).
121
-
122
- Chain: Windows SAPI5 (offline, audible) -> pyttsx3 (if installed) ->
123
- silent WAV (last resort). Real child-friendly voice = VoxCPM2 on Modal.
124
- """
125
- for fn in (_speak_sapi_windows, _speak_pyttsx3):
126
- try:
127
- audio = fn(text, voice)
128
- if audio:
129
- logger.info(f"Local TTS via {fn.__name__}")
130
- return audio
131
- except Exception as e:
132
- logger.warning(f"{fn.__name__} unavailable: {e}")
133
- logger.error("No working local TTS — returning silence")
134
- return _generate_silent_wav(duration_seconds=5)
135
-
136
-
137
- def _speak_sapi_windows(text: str, voice: str = DEFAULT_VOICE) -> bytes:
138
- """
139
- Offline Windows TTS via SAPI5 (pywin32). Produces an audible WAV with no
140
- GPU, no internet, and no extra install — pywin32 ships win32com.
141
- """
142
- import win32com.client
143
- import pythoncom
144
- import tempfile
145
- import os
146
-
147
- pythoncom.CoInitialize() # Gradio runs handlers off-thread; COM needs init
148
- path = None
149
- try:
150
- spvoice = win32com.client.Dispatch("SAPI.SpVoice")
151
- stream = win32com.client.Dispatch("SAPI.SpFileStream")
152
-
153
- # Prefer a female/child-friendly voice if the system has one
154
- try:
155
- for tok in spvoice.GetVoices():
156
- desc = tok.GetDescription()
157
- if any(n in desc for n in ("Zira", "Hazel", "Female")):
158
- spvoice.Voice = tok
159
- break
160
- except Exception:
161
- pass
162
-
163
- with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
164
- path = tmp.name
165
-
166
- stream.Open(path, 3) # 3 = SSFMCreateForWrite
167
- spvoice.AudioOutputStream = stream
168
- spvoice.Rate = -1 # a touch slower, gentle for a bedtime story
169
- spvoice.Speak(text)
170
- stream.Close()
171
-
172
- with open(path, "rb") as f:
173
- data = f.read()
174
- # release COM objects before uninit to avoid noisy IUnknown warnings
175
- spvoice.AudioOutputStream = None
176
- stream = None
177
- spvoice = None
178
- return data
179
- finally:
180
- if path and os.path.exists(path):
181
- try:
182
- os.unlink(path)
183
- except OSError:
184
- pass
185
- pythoncom.CoUninitialize()
186
-
187
-
188
- def _speak_pyttsx3(text: str, voice: str = DEFAULT_VOICE) -> bytes:
189
- """Cross-platform offline TTS via pyttsx3 (only if installed)."""
190
- import pyttsx3
191
- import tempfile
192
- import os
193
-
194
- with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
195
- path = tmp.name
196
- engine = pyttsx3.init()
197
- engine.setProperty("rate", 165)
198
- engine.save_to_file(text, path)
199
- engine.runAndWait()
200
- with open(path, "rb") as f:
201
- data = f.read()
202
- if os.path.exists(path):
203
- os.unlink(path)
204
- return data
205
-
206
-
207
- def _generate_silent_wav(duration_seconds: int = 5, sample_rate: int = 48000) -> bytes:
208
- """Generate silent WAV file as placeholder."""
209
- import struct
210
-
211
- # WAV header
212
- num_samples = sample_rate * duration_seconds
213
- data_size = num_samples * 2 # 16-bit audio
214
-
215
- header = struct.pack(
216
- '<4sI4s4sIHHIIHH4sI',
217
- b'RIFF', 36 + data_size, b'WAVE',
218
- b'fmt ', 16, 1, 1, sample_rate, sample_rate * 2, 2, 16,
219
- b'data', data_size
220
- )
221
-
222
- # Silent audio data
223
- silent_data = b'\x00' * data_size
224
-
225
- return header + silent_data
226
-
227
-
228
- # ============================================================================
229
- # HEALTH CHECK
230
- # ============================================================================
231
-
232
- @app.function(gpu="T4", image=tts_env, timeout=30)
233
- def health_check() -> str:
234
- """Quick health check for Modal function."""
235
- return "tts_healthy"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -1,23 +1,26 @@
1
- # DoodleBook — Dependencies (ZeroGPU Version)
2
- # Core
3
- gradio>=5.0
4
- python-dotenv
5
- spaces
6
-
7
- # Image generation
8
- diffusers>=0.28
9
- transformers>=4.40
10
- accelerate
11
- peft
12
- torch
13
- pillow
14
- sentencepiece
15
- opencv-python-headless
16
-
17
- # Book building
18
- fpdf2
19
-
20
- # Utilities
21
- requests
22
- huggingface_hub
23
- soundfile
 
 
 
 
1
+ # DoodleBook — Dependencies (ZeroGPU Version)
2
+ # Core
3
+ gradio>=5.0
4
+ python-dotenv
5
+ spaces
6
+
7
+ # Image generation
8
+ diffusers>=0.28
9
+ transformers>=4.40
10
+ accelerate
11
+ torch
12
+ pillow
13
+ sentencepiece
14
+ opencv-python-headless
15
+ numpy
16
+
17
+ # Voice narration (VoxCPM2) — was MISSING, so TTS always hit the silent fallback
18
+ voxcpm
19
+
20
+ # Book building
21
+ fpdf2
22
+
23
+ # Utilities
24
+ requests
25
+ huggingface_hub
26
+ soundfile
run.py DELETED
@@ -1,34 +0,0 @@
1
- """Launcher for DoodleBook — captures all output for debugging."""
2
- import sys
3
- import os
4
- import traceback
5
-
6
- sys.path.insert(0, os.path.dirname(__file__))
7
-
8
- # Redirect stderr to file
9
- log = open("_crash.log", "w", encoding="utf-8")
10
- sys.stderr = log
11
- sys.stdout = log
12
-
13
- try:
14
- from app import create_layout, load_sample_book, create_book
15
- print("Imports OK", flush=True)
16
-
17
- demo = create_layout(
18
- load_sample_fn=load_sample_book,
19
- create_book_fn=create_book,
20
- )
21
- print("Layout OK", flush=True)
22
-
23
- demo.launch(server_port=7870, prevent_thread_lock=True)
24
- print("Launch OK — server running on port 7870", flush=True)
25
-
26
- import time
27
- while True:
28
- time.sleep(60)
29
-
30
- except Exception as e:
31
- print(f"ERROR: {e}", flush=True)
32
- traceback.print_exc(file=log)
33
- finally:
34
- log.flush()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
run_modal.py DELETED
@@ -1,287 +0,0 @@
1
- """
2
- DoodleBook — MODAL-ONLY runner.
3
-
4
- Serves the Gradio UI locally but runs ALL heavy generation on Modal's deployed
5
- functions (no local GPU inference):
6
- - story -> doodlebook-story / generate_story (services.story)
7
- - images -> doodlebook-image-gen / generate_book_pages (services.images)
8
- - voice -> doodlebook-tts / speak_book (services.tts)
9
-
10
- Use this (not app.py) when you want to check real Modal output on this machine.
11
- Run: python run_modal.py
12
- """
13
-
14
- import io
15
- import json
16
- import time
17
- import tempfile
18
- import logging
19
-
20
- import gradio as gr
21
- from PIL import Image
22
-
23
- from config import BASE_SEED, DEFAULT_VOICE
24
- from book_builder import (
25
- build_book_html, export_pdf, magic_loader_html,
26
- build_coloring_html, export_coloring_pdf,
27
- )
28
- from ui.layout import create_layout
29
-
30
- import services.story as story_svc
31
- import services.images as image_svc
32
- import services.tts as tts_svc
33
-
34
- logging.basicConfig(level=logging.INFO)
35
- logger = logging.getLogger("doodlebook.modal")
36
-
37
-
38
- def _doodle_to_png_bytes(doodle_image):
39
- """Gradio numpy image -> PNG bytes (or None)."""
40
- if doodle_image is None:
41
- return None
42
- buf = io.BytesIO()
43
- Image.fromarray(doodle_image).save(buf, format="PNG")
44
- return buf.getvalue()
45
-
46
-
47
- def _with_heartbeat(blocking_fn, frame_fn, poll=4.0):
48
- """
49
- Run blocking_fn() in a thread while keeping the Gradio stream alive.
50
-
51
- A multi-minute Modal call (FLUX ~2-3 min, VoxCPM ~30-60s) blocks the
52
- generator with no yield, so the browser's SSE stream goes silent, the
53
- connection drops, and the UI shows "Error". This pumps frame_fn(elapsed)
54
- into the stream every `poll` seconds until the work finishes.
55
-
56
- Yields ("hb", <frame tuple>) heartbeats, then a final ("done", <return>).
57
- Re-raises whatever blocking_fn raised.
58
- """
59
- import threading
60
- box = {}
61
-
62
- def _run():
63
- try:
64
- box["val"] = blocking_fn()
65
- except BaseException as e: # surfaced to the caller below
66
- box["err"] = e
67
-
68
- th = threading.Thread(target=_run, daemon=True)
69
- th.start()
70
- t0 = time.time()
71
- while th.is_alive():
72
- th.join(timeout=poll)
73
- if th.is_alive():
74
- yield ("hb", frame_fn(int(time.time() - t0)))
75
- if "err" in box:
76
- raise box["err"]
77
- yield ("done", box["val"])
78
-
79
-
80
- def create_book(doodle_image, character_name, theme, hero_name,
81
- tiny_mode=False, voice=DEFAULT_VOICE, make_coloring=False):
82
- """Streaming book creation — everything heavy runs on Modal."""
83
- t_total = time.perf_counter()
84
- character_name = (character_name or "").strip() or "Little Hero"
85
- hero_name = (hero_name or "").strip() or character_name
86
-
87
- trace = {
88
- "backend": "modal",
89
- "hero_name": hero_name,
90
- "theme": theme,
91
- "voice": voice,
92
- "tiny_mode": tiny_mode,
93
- "make_coloring": make_coloring,
94
- "seed": BASE_SEED,
95
- "timestamp": time.strftime("%Y-%m-%d %H:%M:%S"),
96
- }
97
-
98
- _no = gr.update(visible=False)
99
- _keep = gr.update() # no-op: leave the (fixed, always-visible) download buttons as-is
100
-
101
- # ---- 1) STORY (Modal MiniCPM, else fast text template) ----
102
- yield (magic_loader_html("story", hero_name),
103
- "Writing the story…", None, _keep, {}, "",
104
- json.dumps(trace, indent=2), _no, _keep)
105
-
106
- t_story = time.perf_counter()
107
- story = story_svc.generate_story(hero_name, theme)
108
- trace["story_sec"] = round(time.perf_counter() - t_story, 2)
109
- title = story.get("title", "Untitled Story")
110
- char_desc = story.get("character_description", "")
111
- pages = story.get("pages", [])
112
- page_texts = [p.get("text", "") for p in pages]
113
- scenes = [p.get("scene", "") for p in pages]
114
- trace.update(title=title, character_description=char_desc)
115
-
116
- # ---- 2) VOICE starts NOW, concurrently with images (it only needs the text,
117
- # which is ready) so its ~30-60s overlaps the image render for free ----
118
- import threading
119
- voice_box = {}
120
- full_text = f"{title}. {' '.join(page_texts)}"
121
- t_voice = time.perf_counter()
122
-
123
- def _do_voice():
124
- try:
125
- voice_box["bytes"] = tts_svc.speak_book(full_text, voice)
126
- except Exception as e:
127
- voice_box["err"] = e
128
-
129
- voice_thread = threading.Thread(target=_do_voice, daemon=True)
130
- voice_thread.start()
131
-
132
- # ---- 3) IMAGES (Modal FLUX.2-klein — 6 pages rendered in PARALLEL) ----
133
- yield (magic_loader_html("images", hero_name),
134
- f"{title} — illustrating on Modal (FLUX)…",
135
- None, _keep, story, "", json.dumps(trace, indent=2), _no, _keep)
136
-
137
- doodle_bytes = _doodle_to_png_bytes(doodle_image)
138
- img_bytes, engine = None, "sketch"
139
- t_images = time.perf_counter()
140
- for kind, payload in _with_heartbeat(
141
- lambda: image_svc.generate_book_pages(
142
- char_desc, scenes, doodle=doodle_bytes, seed=BASE_SEED, tiny=tiny_mode
143
- ),
144
- lambda s: (magic_loader_html("images", hero_name),
145
- f"{title} — illustrating… {s}s (voice recording in parallel)",
146
- None, _keep, story, "", json.dumps(trace, indent=2), _no, _keep),
147
- ):
148
- if kind == "hb":
149
- yield payload
150
- else:
151
- img_bytes, engine = payload
152
- trace["images_sec"] = round(time.perf_counter() - t_images, 2)
153
- trace["engine"] = engine
154
- if engine != "flux":
155
- logger.warning("Image gen fell back to local sketch — Modal FLUX did not run.")
156
-
157
- book_html = build_book_html(img_bytes, page_texts, title, engine)
158
-
159
- # ---- 4) Collect the parallel VOICE result (usually already finished) ----
160
- while voice_thread.is_alive(): # only loops if voice somehow outran images
161
- voice_thread.join(timeout=4)
162
- if voice_thread.is_alive():
163
- yield (book_html, f"{title} — finishing narration…",
164
- None, _keep, story, "", json.dumps(trace, indent=2), _no, _keep)
165
-
166
- audio_path = None
167
- trace["tts_sec"] = round(time.perf_counter() - t_voice, 2)
168
- if voice_box.get("bytes"):
169
- try:
170
- with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
171
- tmp.write(voice_box["bytes"])
172
- audio_path = tmp.name
173
- except Exception as e:
174
- logger.warning(f"writing audio failed: {e}")
175
- elif "err" in voice_box:
176
- logger.warning(f"TTS failed: {voice_box['err']}")
177
-
178
- # ---- 4) PDFs ----
179
- pdf_path = None
180
- t_pdf = time.perf_counter()
181
- try:
182
- with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp:
183
- pdf_path = export_pdf(img_bytes, page_texts, title, tmp.name)
184
- except Exception as e:
185
- logger.warning(f"PDF failed: {e}")
186
- trace["pdf_sec"] = round(time.perf_counter() - t_pdf, 2)
187
-
188
- # ---- 5) COLORING BOOK ----
189
- coloring_html = ""
190
- coloring_pdf_path = None
191
- if make_coloring:
192
- t_coloring = time.perf_counter()
193
- outlines, coloring_engine = None, "failed"
194
- for kind, payload in _with_heartbeat(
195
- lambda: image_svc.generate_coloring_pages(
196
- char_desc,
197
- scenes,
198
- doodle=doodle_bytes,
199
- source_color_imgs=img_bytes,
200
- seed=BASE_SEED,
201
- tiny=tiny_mode,
202
- ),
203
- lambda s: (
204
- book_html,
205
- f"{title} — building coloring book… {s}s",
206
- audio_path,
207
- _keep,
208
- story,
209
- "",
210
- json.dumps(trace, indent=2),
211
- _no,
212
- _keep,
213
- ),
214
- ):
215
- if kind == "hb":
216
- yield payload
217
- else:
218
- outlines, coloring_engine = payload
219
- try:
220
- coloring_html = build_coloring_html(outlines, page_texts, title)
221
- with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp:
222
- coloring_pdf_path = export_coloring_pdf(outlines, page_texts, title, tmp.name)
223
- trace["coloring_book"] = True
224
- trace["coloring_engine"] = coloring_engine
225
- except Exception as e:
226
- logger.warning(f"Coloring book failed: {e}")
227
- trace["coloring_sec"] = round(time.perf_counter() - t_coloring, 2)
228
-
229
- trace["completed"] = True
230
- trace["total_sec"] = round(time.perf_counter() - t_total, 2)
231
- engine_label = "FLUX (Modal)" if engine == "flux" else "local sketch fallback"
232
- # download buttons stay visible (fixed under the status); just attach the files
233
- pdf_update = gr.update(value=pdf_path) if pdf_path else _keep
234
- coloring_pdf_update = gr.update(value=coloring_pdf_path) if coloring_pdf_path else _keep
235
- coloring_display_update = (gr.update(visible=True, value=coloring_html) if coloring_html
236
- else gr.update(visible=False))
237
-
238
- yield (
239
- book_html,
240
- f"Complete: {title} — {len(img_bytes)} pages · {engine_label} · voice: {voice} · total {trace['total_sec']}s",
241
- audio_path,
242
- pdf_update,
243
- story,
244
- f"Pages: {len(img_bytes)} | Seed: {BASE_SEED} | "
245
- f"Mode: {'Tiny' if tiny_mode else 'Standard'} | Engine: {engine} | "
246
- f"Story {trace.get('story_sec', 0)}s | Images {trace.get('images_sec', 0)}s | "
247
- f"PDF {trace.get('pdf_sec', 0)}s | Coloring {trace.get('coloring_sec', 0)}s",
248
- json.dumps(trace, indent=2),
249
- coloring_display_update,
250
- coloring_pdf_update,
251
- )
252
-
253
-
254
- if __name__ == "__main__":
255
- import os
256
-
257
- demo = create_layout(create_book_fn=create_book)
258
- # Queue so a long (multi-minute) Modal generation doesn't make the whole app
259
- # unresponsive: allow several concurrent sessions and never time a job out.
260
- demo.queue(default_concurrency_limit=8, max_size=64)
261
- # NOTE: when server_port is set, Gradio does NOT auto-pick a free port — it
262
- # raises OSError and exits if the port is busy. start_app.bat frees the port
263
- # before launching; if you run this directly, make sure 7880 is free first.
264
- port = int(os.environ.get("DOODLEBOOK_PORT", "7880"))
265
- # Bind all interfaces so both 127.0.0.1 and localhost (and LAN/phone) reach it.
266
- try:
267
- demo.launch(
268
- server_name="0.0.0.0",
269
- server_port=port,
270
- inbrowser=False,
271
- show_error=True,
272
- max_threads=40,
273
- # PDFs are written to the system temp dir; Gradio won't serve files
274
- # outside its allowed paths, so the DownloadButton links 404'd and the
275
- # button looked broken. Allow the temp dir so downloads actually work.
276
- allowed_paths=[tempfile.gettempdir()],
277
- )
278
- except OSError as e:
279
- logger.error(
280
- f"Could not bind port {port}: {e}\n"
281
- f" Something is already using it — likely a leftover DoodleBook "
282
- f"instance (app.py on 7870, an old run_modal.py, or test_final.py).\n"
283
- f" Fix: close the other window, or run: "
284
- f"netstat -ano | findstr :{port} then taskkill /f /pid <PID>\n"
285
- f" Then relaunch with start_app.bat (it frees the port automatically)."
286
- )
287
- raise SystemExit(1)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
services/story.py DELETED
@@ -1,71 +0,0 @@
1
- """
2
- Story generation service — calls modal_story_gen for MiniCPM5-1B inference.
3
-
4
- C2 Compliance: 3-layer JSON parser + template fallback.
5
- """
6
-
7
- from config import STORY_MODEL, GENERATION_PARAMS
8
- import os
9
- import logging
10
-
11
- logger = logging.getLogger(__name__)
12
-
13
-
14
- def generate_story(hero_name: str, theme: str, age: int = None) -> dict:
15
- """
16
- Generate a 6-page children's story.
17
-
18
- Story runs LOCALLY by default: the local generator is instant and
19
- theme-accurate, whereas the deployed MiniCPM5-1B was slower (T4 cold start)
20
- and lower quality (the 1B model parroted the few-shot example). Set
21
- DOODLEBOOK_STORY_MODAL=1 to route the story to the Modal MiniCPM model.
22
-
23
- Args:
24
- hero_name: Main character name
25
- theme: Story theme (e.g., "brave adventure")
26
- age: Target age (default from config)
27
-
28
- Returns:
29
- dict with keys: title, character_description, pages[{page, text, scene}]
30
- """
31
- if age is None:
32
- age = GENERATION_PARAMS.target_age
33
-
34
- # 1) Real model on Modal (MiniCPM) — opt-in only
35
- if os.environ.get("DOODLEBOOK_STORY_MODAL", "0") == "1":
36
- try:
37
- import modal
38
- fn = modal.Function.from_name("doodlebook-story", "generate_story")
39
- story = fn.remote(hero_name, theme, age)
40
- if story and story.get("pages"):
41
- logger.info("Story generated via Modal MiniCPM")
42
- return story
43
- except Exception as e:
44
- logger.info(f"Modal story unavailable ({e}); using local generator")
45
-
46
- # 2) Rich local generator (theme-accurate, varied) — DEFAULT
47
- try:
48
- from modal_workers.modal_story_gen import generate_story_local
49
- return generate_story_local(hero_name, theme, age)
50
- except Exception as e:
51
- logger.error(f"Local story generation failed: {e}")
52
- return _fallback_story(hero_name, theme, age)
53
-
54
-
55
- def _fallback_story(hero_name: str, theme: str, age: int) -> dict:
56
- """
57
- Deterministic template fallback (C2 Layer 3).
58
- NEVER crashes - always returns valid 6-page book.
59
- """
60
- return {
61
- "title": f"{hero_name}'s {theme.title()}",
62
- "character_description": f"A friendly character named {hero_name}, drawn in crayon style with bright colors, suitable for age {age}",
63
- "pages": [
64
- {"page": 1, "text": f"Once upon a time, there was a character named {hero_name}.", "scene": "Character introduction"},
65
- {"page": 2, "text": f"{hero_name} loved going on adventures.", "scene": "Adventure begins"},
66
- {"page": 3, "text": f"One day, {hero_name} discovered something magical.", "scene": "Discovery moment"},
67
- {"page": 4, "text": f"With courage, {hero_name} faced the challenge.", "scene": "Challenge scene"},
68
- {"page": 5, "text": f"Friends helped {hero_name} succeed.", "scene": "Teamwork scene"},
69
- {"page": 6, "text": f"And they all lived happily ever after. The end.", "scene": "Happy ending"}
70
- ]
71
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
services/trace.py DELETED
@@ -1,113 +0,0 @@
1
- """
2
- Trace logging service — publishes generation metadata to HF Dataset (Open Trace badge).
3
-
4
- Logs prompts, seeds, and LoRA version for reproducibility.
5
- """
6
-
7
- from typing import Optional
8
- import json
9
- import logging
10
- import os
11
- from datetime import datetime
12
-
13
- logger = logging.getLogger(__name__)
14
-
15
- TRACE_DATASET = "build-small-hackathon/doodlebook-traces"
16
-
17
-
18
- def log_trace(
19
- hero_name: str,
20
- theme: str,
21
- story: dict,
22
- seed: int,
23
- lora_version: Optional[str] = None,
24
- tiny_mode: bool = False,
25
- character_description: str = "",
26
- art_style: str = "crayon drawing, children's book"
27
- ) -> str:
28
- """
29
- Log generation trace to HuggingFace Dataset.
30
-
31
- Creates a row in the trace dataset with all generation parameters
32
- for reproducibility (Open Trace badge).
33
-
34
- Args:
35
- hero_name: Character name used
36
- theme: Story theme
37
- story: Generated story dict
38
- seed: Seed used for generation
39
- lora_version: LoRA model version (if used)
40
- tiny_mode: Whether Tiny Mode was used
41
- character_description: Character description used
42
- art_style: Art style used
43
-
44
- Returns:
45
- Dataset URL
46
- """
47
- trace = {
48
- "timestamp": datetime.now().isoformat(),
49
- "hero_name": hero_name,
50
- "theme": theme,
51
- "title": story.get("title", ""),
52
- "character_description": character_description or story.get("character_description", ""),
53
- "art_style": art_style,
54
- "seed": seed,
55
- "lora_version": lora_version or "none",
56
- "tiny_mode": tiny_mode,
57
- "num_pages": len(story.get("pages", [])),
58
- "pages": story.get("pages", []),
59
- "models": {
60
- "image": "black-forest-labs/FLUX.2-klein-4B",
61
- "story": "openbmb/MiniCPM5-1B",
62
- "tts": "openbmb/VoxCPM2"
63
- }
64
- }
65
-
66
- # Save trace locally
67
- trace_dir = "traces"
68
- os.makedirs(trace_dir, exist_ok=True)
69
- trace_file = os.path.join(trace_dir, f"trace_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json")
70
-
71
- with open(trace_file, "w", encoding="utf-8") as f:
72
- json.dump(trace, f, indent=2, ensure_ascii=False)
73
-
74
- logger.info(f"Trace saved: {trace_file}")
75
-
76
- # Try to upload to HF Dataset
77
- try:
78
- return _upload_to_hf_dataset(trace)
79
- except Exception as e:
80
- logger.warning(f"HF Dataset upload failed: {e}. Trace saved locally.")
81
- return f"Local: {trace_file}"
82
-
83
-
84
- def _upload_to_hf_dataset(trace: dict) -> str:
85
- """Upload trace to HuggingFace Dataset."""
86
- try:
87
- from huggingface_hub import HfApi
88
-
89
- api = HfApi()
90
-
91
- # Check if dataset exists, create if not
92
- try:
93
- api.dataset_info(TRACE_DATASET)
94
- except Exception:
95
- api.create_repo(TRACE_DATASET, repo_type="dataset", exist_ok=True)
96
-
97
- # Upload trace as JSON
98
- trace_json = json.dumps(trace, indent=2, ensure_ascii=False)
99
- filename = f"trace_{trace['timestamp'].replace(':', '-').replace('.', '-')}.json"
100
-
101
- api.upload_file(
102
- path_or_fileobj=trace_json.encode(),
103
- path_in_repo=filename,
104
- repo_id=TRACE_DATASET,
105
- repo_type="dataset",
106
- commit_message=f"Log trace for {trace['hero_name']}"
107
- )
108
-
109
- return f"https://huggingface.co/datasets/{TRACE_DATASET}/blob/main/{filename}"
110
-
111
- except ImportError:
112
- logger.warning("huggingface_hub not installed")
113
- return "Trace saved locally (no HF upload)"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
services/tts.py DELETED
@@ -1,50 +0,0 @@
1
- """
2
- TTS service — calls modal_tts for VoxCPM2 narration.
3
-
4
- C5 Compliance: Fallback chain
5
- - Primary: VoxCPM2 (2B, Apache 2.0)
6
- - Fallback 1: Kokoro-82M (ultra-lightweight)
7
- - Fallback 2: MeloTTS (MIT license)
8
- """
9
-
10
- from config import TTS_MODEL, GENERATION_PARAMS
11
- import logging
12
-
13
- logger = logging.getLogger(__name__)
14
-
15
-
16
- def speak_book(text: str, voice: str = "kid") -> bytes:
17
- """
18
- Generate narration audio via VoxCPM2 on Modal.
19
-
20
- C5 Compliance: Falls back to local TTS if Modal fails.
21
-
22
- Args:
23
- text: Full text to narrate (title + all pages)
24
- voice: Voice style (warm, friendly, etc.)
25
-
26
- Returns:
27
- WAV audio bytes
28
- """
29
- try:
30
- # Try Modal (real VoxCPM2) — looks up the DEPLOYED function on Modal cloud
31
- import modal
32
- fn = modal.Function.from_name("doodlebook-tts", "speak_book")
33
- return fn.remote(text, voice)
34
- except Exception as e:
35
- logger.warning(f"Modal TTS unavailable: {e}, using local fallback")
36
- return speak_book_local(text, voice)
37
-
38
-
39
- def speak_book_local(text: str, voice: str = "warm") -> bytes:
40
- """
41
- Local TTS for testing (no Modal required).
42
- Uses MeloTTS or returns silent WAV placeholder.
43
- """
44
- try:
45
- from modal_workers.modal_tts import speak_book_local as local_tts
46
- return local_tts(text, voice)
47
- except Exception as e:
48
- logger.warning(f"Local TTS failed: {e}, returning placeholder")
49
- from modal_workers.modal_tts import _generate_silent_wav
50
- return _generate_silent_wav(duration_seconds=5)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
start_app.bat DELETED
@@ -1,33 +0,0 @@
1
- @echo off
2
- REM ============================================================
3
- REM DoodleBook launcher — SINGLE instance, FIXED port.
4
- REM Double-click this file to start. Close this window to quit.
5
- REM
6
- REM Why this matters: Gradio is given an explicit port. If that
7
- REM port is still held (a killed instance's socket in TIME_WAIT,
8
- REM an orphaned python, or app.py/test_final.py running too),
9
- REM Gradio CRASHES on startup instead of picking another port.
10
- REM The old loop then relaunched into the same crash forever and
11
- REM the browser showed "Connection lost. Attempting reconnection".
12
- REM So: free the port FIRST, then launch.
13
- REM ============================================================
14
- cd /d "%~dp0"
15
- set PYTHONUTF8=1
16
- set PYTHONIOENCODING=utf-8
17
- set DOODLEBOOK_PORT=7880
18
-
19
- :loop
20
- echo.
21
- echo === Freeing port %DOODLEBOOK_PORT% (killing any old/stray instance) ===
22
- for /f "tokens=5" %%a in ('netstat -ano ^| findstr ":%DOODLEBOOK_PORT% " ^| findstr LISTENING') do (
23
- echo killing PID %%a holding port %DOODLEBOOK_PORT%
24
- taskkill /f /pid %%a >nul 2>&1
25
- )
26
-
27
- echo === Starting DoodleBook on http://127.0.0.1:%DOODLEBOOK_PORT%/ ===
28
- echo === Open EXACTLY that URL in your browser (not 7860/7870) ===
29
- python run_modal.py
30
- echo.
31
- echo === Server stopped (exit code %errorlevel%). Restarting in 3 seconds... (close this window to quit) ===
32
- timeout /t 3 /nobreak >nul
33
- goto loop
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ui/layout.py CHANGED
@@ -618,11 +618,6 @@ def create_layout(load_sample_fn=None, create_book_fn=None):
618
  value=False,
619
  elem_classes=["tiny-toggle"],
620
  )
621
- tiny_mode = gr.Checkbox(
622
- label="Tiny Mode — faster, runs on small GPUs",
623
- value=False,
624
- elem_classes=["tiny-toggle"],
625
- )
626
  make_btn = gr.Button(
627
  "Make my book!",
628
  variant="primary",
@@ -714,7 +709,7 @@ FLUX is the printer. **Tiny Titan.**
714
  if create_book_fn:
715
  make_btn.click(
716
  fn=create_book_fn,
717
- inputs=[doodle, char_name, theme, hero_name, tiny_mode, voice, make_coloring],
718
  outputs=[book_display, status, audio_narration, pdf_download,
719
  story_info, image_info, trace_info,
720
  coloring_display, coloring_pdf_download],
 
618
  value=False,
619
  elem_classes=["tiny-toggle"],
620
  )
 
 
 
 
 
621
  make_btn = gr.Button(
622
  "Make my book!",
623
  variant="primary",
 
709
  if create_book_fn:
710
  make_btn.click(
711
  fn=create_book_fn,
712
+ inputs=[doodle, char_name, theme, hero_name, voice, make_coloring],
713
  outputs=[book_display, status, audio_narration, pdf_download,
714
  story_info, image_info, trace_info,
715
  coloring_display, coloring_pdf_download],