Spaces:
Sleeping
Sleeping
| # WitnessBox — submission pack | |
| Everything needed to submit to **Build Small** (HF × Gradio, models < 32B). | |
| Track: 🍄 *An Adventure in Thousand Token Wood* · Primary target: **Best Use of Modal**. | |
| --- | |
| ## Status checklist | |
| | # | Requirement | State | | |
| |---|---|---| | |
| | REQ-01 | Public app, models < 32B | ✅ MiniCPM4.1-8B (8.2B) + VoxCPM2 (2.3B) + Whisper-small (0.24B) ≈ 11B | | |
| | REQ-02 | Gradio Space, public | ⏳ one command away — needs an HF write token (see below) | | |
| | REQ-03 | Demo video (60–90s) | ⬜ you record — shotlist below; `scripts/demo_playthrough.py` is the dry-run | | |
| | REQ-04 | Social post tagging sponsors | ⬜ you post — draft below | | |
| | Modal | Genuine *platform* use | ✅ 3 GPU classes, scale-to-zero, keep-warm, parallel `.map()` pre-gen, Volume, snapshots — **proven live** | | |
| **The one action only you can take:** paste a **write**-scoped HF token, then I run | |
| `python3 scripts/deploy_space.py` and the Space is live (code pushed, Modal secrets | |
| set, `WITNESSBOX_BACKEND=modal`). Get a token at https://huggingface.co/settings/tokens | |
| — either `! hf auth login` in the prompt, or paste it and I'll use `HF_TOKEN=…`. | |
| --- | |
| ## Social post (REQ-04) — draft | |
| **X / short form** | |
| > ⚖️ I built **WitnessBox**: cross-examine a hostile AI witness — and your *voice* | |
| > is the weapon. Sound confident and the CFO clams up; sound hesitant and he gets | |
| > cocky and *leaks*. Catch 3 contradictions and his voice literally **cracks**. | |
| > | |
| > All open models < 32B, served on @modal_labs: | |
| > 🧠 MiniCPM4.1-8B · 🗣️ VoxCPM2 · 👂 Whisper — @OpenBMB on @huggingface, built with @Gradio. | |
| > | |
| > #BuildSmall [Space link] [video link] | |
| **LinkedIn / long form** | |
| > Most "interrogate the witness" games are text-and-logic. WitnessBox makes your | |
| > **delivery** the input. A librosa pass reads your *perceived* confidence — pauses | |
| > and pace, never a lie detector — and steers the witness in real time. He answers | |
| > in a voice that escalates from composed to cracking. | |
| > | |
| > Three open models, all under 32B, ~11B combined: MiniCPM4.1-8B is the witness's | |
| > mind, VoxCPM2 is his voice (the style tag *is* the game state), Whisper hears you. | |
| > All of it runs on Modal: three right-sized GPUs behind scale-to-zero classes, | |
| > kept warm during an examination, with the dramatic "voice-crack" beats fanned | |
| > across containers via parallel `.map()` and the best take cached on a Volume. | |
| > | |
| > Built for the Build Small hackathon (@Hugging Face × @Gradio). Models by @OpenBMB. | |
| > Try it: [Space link] · 90-second demo: [video link] | |
| > | |
| > #BuildSmall #Modal #Gradio #OpenSource #AI | |
| --- | |
| ## Demo video shotlist (REQ-03) — ~80s | |
| Record against the **live Space** (mic works in `modal` mode). `demo_playthrough.py` | |
| is your scripted rehearsal — the three killer lines are in `SCRIPT` there. | |
| | t | Shot | Notes | | |
| |---|---|---| | |
| | 0:00–0:08 | Title card + hook | "Cross-examine a hostile witness — with your voice." | | |
| | 0:08–0:18 | Click **Call the witness** | Reid's composed opening line **plays** (instant, cached beat) | | |
| | 0:18–0:34 | The mechanic, both ways | Ask **confidently** → he clams up (bar: CONFIDENT). Ask **hesitantly** → he overshares (bar: HESITANT). This is the moat — linger here. | | |
| | 0:34–0:56 | Land the 3 contradictions | timeline → authorization → relationship. Show the **Contradiction Engine** verdict box firing each time. | | |
| | 0:56–1:08 | **The break** | 3rd catch → Reid's voice **cracks** (best of 32 cached takes). Win banner. | | |
| | 1:08–1:20 | Architecture card | "3 open models < 32B · Modal scale-to-zero · parallel `.map()` pre-gen · warm 5.3s reply / 8.6s voice." End on the Space URL. | | |
| **Tips:** **warm the models first** — redeploy with `WITNESSBOX_KEEP_WARM=1 modal | |
| deploy modal_app.py` ~5 min before recording (or take one throwaway turn; they stay | |
| warm 5 min) so no shot waits on a cold start. Quiet room for the mic; do one confident | |
| + one hesitant ask back-to-back so the contrast is unmistakable; let the voice-crack | |
| play fully — it's the payoff. Flip keep-warm back off afterward to stop idle spend. | |
| --- | |
| ## Best-Use-of-Modal talking points (for the writeup / description) | |
| - **Not just hosting — the runtime.** Three models on three right-sized GPUs | |
| (A100 + 2×A10G), each a scale-to-zero `@app.cls`; idle → `$0`. | |
| - **Honest latency, costed warmth:** scale-to-zero by default (`$0` idle). Opt into | |
| keep-warm (`WITNESSBOX_KEEP_WARM=1`) for a live session and a turn is ~5.3s (reply) | |
| + ~8.6s (voice), measured this deploy — text lands first. | |
| - **Parallel `.map()` — verified:** 36 takes fanned across containers; workers write | |
| WAVs to the Volume and return only metadata; the best-cracking break take (pitch | |
| instability 70.3 vs a 61–69 field) is kept. Dramatic beats then play instantly. | |
| - **Parallel `.map()`** fans the 32 voice-crack takes across containers and keeps the | |
| one that cracks most (librosa pitch-instability score), all at deploy time. | |
| - **Volume** persists the designed CFO reference voice, the model cache, and the | |
| chosen beats across cold starts. | |
| - **Memory snapshots** trim CPU-side init. | |
| - Cost/latency are **measured**, not fabricated. | |