# WitnessBox โ€” submission pack Everything needed to submit to **Build Small** (HF ร— Gradio, models < 32B). Track: ๐Ÿ„ *An Adventure in Thousand Token Wood* ยท Primary target: **Best Use of Modal**. --- ## Status checklist | # | Requirement | State | |---|---|---| | REQ-01 | Public app, models < 32B | โœ… MiniCPM4.1-8B (8.2B) + VoxCPM2 (2.3B) + Whisper-small (0.24B) โ‰ˆ 11B | | REQ-02 | Gradio Space, public | โณ one command away โ€” needs an HF write token (see below) | | REQ-03 | Demo video (60โ€“90s) | โฌœ you record โ€” shotlist below; `scripts/demo_playthrough.py` is the dry-run | | REQ-04 | Social post tagging sponsors | โฌœ you post โ€” draft below | | Modal | Genuine *platform* use | โœ… 3 GPU classes, scale-to-zero, keep-warm, parallel `.map()` pre-gen, Volume, snapshots โ€” **proven live** | **The one action only you can take:** paste a **write**-scoped HF token, then I run `python3 scripts/deploy_space.py` and the Space is live (code pushed, Modal secrets set, `WITNESSBOX_BACKEND=modal`). Get a token at https://huggingface.co/settings/tokens โ€” either `! hf auth login` in the prompt, or paste it and I'll use `HF_TOKEN=โ€ฆ`. --- ## Social post (REQ-04) โ€” draft **X / short form** > โš–๏ธ I built **WitnessBox**: cross-examine a hostile AI witness โ€” and your *voice* > is the weapon. Sound confident and the CFO clams up; sound hesitant and he gets > cocky and *leaks*. Catch 3 contradictions and his voice literally **cracks**. > > All open models < 32B, served on @modal_labs: > ๐Ÿง  MiniCPM4.1-8B ยท ๐Ÿ—ฃ๏ธ VoxCPM2 ยท ๐Ÿ‘‚ Whisper โ€” @OpenBMB on @huggingface, built with @Gradio. > > #BuildSmall [Space link] [video link] **LinkedIn / long form** > Most "interrogate the witness" games are text-and-logic. WitnessBox makes your > **delivery** the input. A librosa pass reads your *perceived* confidence โ€” pauses > and pace, never a lie detector โ€” and steers the witness in real time. He answers > in a voice that escalates from composed to cracking. > > Three open models, all under 32B, ~11B combined: MiniCPM4.1-8B is the witness's > mind, VoxCPM2 is his voice (the style tag *is* the game state), Whisper hears you. > All of it runs on Modal: three right-sized GPUs behind scale-to-zero classes, > kept warm during an examination, with the dramatic "voice-crack" beats fanned > across containers via parallel `.map()` and the best take cached on a Volume. > > Built for the Build Small hackathon (@Hugging Face ร— @Gradio). Models by @OpenBMB. > Try it: [Space link] ยท 90-second demo: [video link] > > #BuildSmall #Modal #Gradio #OpenSource #AI --- ## Demo video shotlist (REQ-03) โ€” ~80s Record against the **live Space** (mic works in `modal` mode). `demo_playthrough.py` is your scripted rehearsal โ€” the three killer lines are in `SCRIPT` there. | t | Shot | Notes | |---|---|---| | 0:00โ€“0:08 | Title card + hook | "Cross-examine a hostile witness โ€” with your voice." | | 0:08โ€“0:18 | Click **Call the witness** | Reid's composed opening line **plays** (instant, cached beat) | | 0:18โ€“0:34 | The mechanic, both ways | Ask **confidently** โ†’ he clams up (bar: CONFIDENT). Ask **hesitantly** โ†’ he overshares (bar: HESITANT). This is the moat โ€” linger here. | | 0:34โ€“0:56 | Land the 3 contradictions | timeline โ†’ authorization โ†’ relationship. Show the **Contradiction Engine** verdict box firing each time. | | 0:56โ€“1:08 | **The break** | 3rd catch โ†’ Reid's voice **cracks** (best of 32 cached takes). Win banner. | | 1:08โ€“1:20 | Architecture card | "3 open models < 32B ยท Modal scale-to-zero ยท parallel `.map()` pre-gen ยท warm 5.3s reply / 8.6s voice." End on the Space URL. | **Tips:** **warm the models first** โ€” redeploy with `WITNESSBOX_KEEP_WARM=1 modal deploy modal_app.py` ~5 min before recording (or take one throwaway turn; they stay warm 5 min) so no shot waits on a cold start. Quiet room for the mic; do one confident + one hesitant ask back-to-back so the contrast is unmistakable; let the voice-crack play fully โ€” it's the payoff. Flip keep-warm back off afterward to stop idle spend. --- ## Best-Use-of-Modal talking points (for the writeup / description) - **Not just hosting โ€” the runtime.** Three models on three right-sized GPUs (A100 + 2ร—A10G), each a scale-to-zero `@app.cls`; idle โ†’ `$0`. - **Honest latency, costed warmth:** scale-to-zero by default (`$0` idle). Opt into keep-warm (`WITNESSBOX_KEEP_WARM=1`) for a live session and a turn is ~5.3s (reply) + ~8.6s (voice), measured this deploy โ€” text lands first. - **Parallel `.map()` โ€” verified:** 36 takes fanned across containers; workers write WAVs to the Volume and return only metadata; the best-cracking break take (pitch instability 70.3 vs a 61โ€“69 field) is kept. Dramatic beats then play instantly. - **Parallel `.map()`** fans the 32 voice-crack takes across containers and keeps the one that cracks most (librosa pitch-instability score), all at deploy time. - **Volume** persists the designed CFO reference voice, the model cache, and the chosen beats across cold starts. - **Memory snapshots** trim CPU-side init. - Cost/latency are **measured**, not fabricated.