# WitnessBox — submission pack

Everything needed to submit to **Build Small** (HF × Gradio, models < 32B).
Track: 🍄 *An Adventure in Thousand Token Wood* · Primary target: **Best Use of Modal**.

---

## Status checklist
| # | Requirement | State |
|---|---|---|
| REQ-01 | Public app, models < 32B | ✅ MiniCPM4.1-8B (8.2B) + VoxCPM2 (2.3B) + Whisper-small (0.24B) ≈ 11B |
| REQ-02 | Gradio Space, public | ⏳ one command away — needs an HF write token (see below) |
| REQ-03 | Demo video (60–90s) | ⬜ you record — shotlist below; `scripts/demo_playthrough.py` is the dry-run |
| REQ-04 | Social post tagging sponsors | ⬜ you post — draft below |
| Modal | Genuine *platform* use | ✅ 3 GPU classes, scale-to-zero, keep-warm, parallel `.map()` pre-gen, Volume, snapshots — **proven live** |

**The one action only you can take:** paste a **write**-scoped HF token, then I run
`python3 scripts/deploy_space.py` and the Space is live (code pushed, Modal secrets
set, `WITNESSBOX_BACKEND=modal`). Get a token at https://huggingface.co/settings/tokens
— either `! hf auth login` in the prompt, or paste it and I'll use `HF_TOKEN=…`.

---

## Social post (REQ-04) — draft

**X / short form**
> ⚖️ I built **WitnessBox**: cross-examine a hostile AI witness — and your *voice*
> is the weapon. Sound confident and the CFO clams up; sound hesitant and he gets
> cocky and *leaks*. Catch 3 contradictions and his voice literally **cracks**.
>
> All open models < 32B, served on @modal_labs:
> 🧠 MiniCPM4.1-8B · 🗣️ VoxCPM2 · 👂 Whisper — @OpenBMB on @huggingface, built with @Gradio.
>
> #BuildSmall  [Space link]  [video link]

**LinkedIn / long form**
> Most "interrogate the witness" games are text-and-logic. WitnessBox makes your
> **delivery** the input. A librosa pass reads your *perceived* confidence — pauses
> and pace, never a lie detector — and steers the witness in real time. He answers
> in a voice that escalates from composed to cracking.
>
> Three open models, all under 32B, ~11B combined: MiniCPM4.1-8B is the witness's
> mind, VoxCPM2 is his voice (the style tag *is* the game state), Whisper hears you.
> All of it runs on Modal: three right-sized GPUs behind scale-to-zero classes,
> kept warm during an examination, with the dramatic "voice-crack" beats fanned
> across containers via parallel `.map()` and the best take cached on a Volume.
>
> Built for the Build Small hackathon (@Hugging Face × @Gradio). Models by @OpenBMB.
> Try it: [Space link] · 90-second demo: [video link]
>
> #BuildSmall #Modal #Gradio #OpenSource #AI

---

## Demo video shotlist (REQ-03) — ~80s

Record against the **live Space** (mic works in `modal` mode). `demo_playthrough.py`
is your scripted rehearsal — the three killer lines are in `SCRIPT` there.

| t | Shot | Notes |
|---|---|---|
| 0:00–0:08 | Title card + hook | "Cross-examine a hostile witness — with your voice." |
| 0:08–0:18 | Click **Call the witness** | Reid's composed opening line **plays** (instant, cached beat) |
| 0:18–0:34 | The mechanic, both ways | Ask **confidently** → he clams up (bar: CONFIDENT). Ask **hesitantly** → he overshares (bar: HESITANT). This is the moat — linger here. |
| 0:34–0:56 | Land the 3 contradictions | timeline → authorization → relationship. Show the **Contradiction Engine** verdict box firing each time. |
| 0:56–1:08 | **The break** | 3rd catch → Reid's voice **cracks** (best of 32 cached takes). Win banner. |
| 1:08–1:20 | Architecture card | "3 open models < 32B · Modal scale-to-zero · parallel `.map()` pre-gen · warm 5.3s reply / 8.6s voice." End on the Space URL. |

**Tips:** **warm the models first** — redeploy with `WITNESSBOX_KEEP_WARM=1 modal
deploy modal_app.py` ~5 min before recording (or take one throwaway turn; they stay
warm 5 min) so no shot waits on a cold start. Quiet room for the mic; do one confident
+ one hesitant ask back-to-back so the contrast is unmistakable; let the voice-crack
play fully — it's the payoff. Flip keep-warm back off afterward to stop idle spend.

---

## Best-Use-of-Modal talking points (for the writeup / description)
- **Not just hosting — the runtime.** Three models on three right-sized GPUs
  (A100 + 2×A10G), each a scale-to-zero `@app.cls`; idle → `$0`.
- **Honest latency, costed warmth:** scale-to-zero by default (`$0` idle). Opt into
  keep-warm (`WITNESSBOX_KEEP_WARM=1`) for a live session and a turn is ~5.3s (reply)
  + ~8.6s (voice), measured this deploy — text lands first.
- **Parallel `.map()` — verified:** 36 takes fanned across containers; workers write
  WAVs to the Volume and return only metadata; the best-cracking break take (pitch
  instability 70.3 vs a 61–69 field) is kept. Dramatic beats then play instantly.
- **Parallel `.map()`** fans the 32 voice-crack takes across containers and keeps the
  one that cracks most (librosa pitch-instability score), all at deploy time.
- **Volume** persists the designed CFO reference voice, the model cache, and the
  chosen beats across cold starts.
- **Memory snapshots** trim CPU-side init.
- Cost/latency are **measured**, not fabricated.