WitnessBox / SUBMISSION.md
Farseen0's picture
Deploy WitnessBox
c519923 verified
|
Raw
History Blame Contribute Delete
5.18 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

WitnessBox — submission pack

Everything needed to submit to Build Small (HF × Gradio, models < 32B). Track: 🍄 An Adventure in Thousand Token Wood · Primary target: Best Use of Modal.


Status checklist

# Requirement State
REQ-01 Public app, models < 32B ✅ MiniCPM4.1-8B (8.2B) + VoxCPM2 (2.3B) + Whisper-small (0.24B) ≈ 11B
REQ-02 Gradio Space, public ⏳ one command away — needs an HF write token (see below)
REQ-03 Demo video (60–90s) ⬜ you record — shotlist below; scripts/demo_playthrough.py is the dry-run
REQ-04 Social post tagging sponsors ⬜ you post — draft below
Modal Genuine platform use ✅ 3 GPU classes, scale-to-zero, keep-warm, parallel .map() pre-gen, Volume, snapshots — proven live

The one action only you can take: paste a write-scoped HF token, then I run python3 scripts/deploy_space.py and the Space is live (code pushed, Modal secrets set, WITNESSBOX_BACKEND=modal). Get a token at https://huggingface.co/settings/tokens — either ! hf auth login in the prompt, or paste it and I'll use HF_TOKEN=….


Social post (REQ-04) — draft

X / short form

⚖️ I built WitnessBox: cross-examine a hostile AI witness — and your voice is the weapon. Sound confident and the CFO clams up; sound hesitant and he gets cocky and leaks. Catch 3 contradictions and his voice literally cracks.

All open models < 32B, served on @modal_labs: 🧠 MiniCPM4.1-8B · 🗣️ VoxCPM2 · 👂 Whisper — @OpenBMB on @huggingface, built with @Gradio.

#BuildSmall [Space link] [video link]

LinkedIn / long form

Most "interrogate the witness" games are text-and-logic. WitnessBox makes your delivery the input. A librosa pass reads your perceived confidence — pauses and pace, never a lie detector — and steers the witness in real time. He answers in a voice that escalates from composed to cracking.

Three open models, all under 32B, ~11B combined: MiniCPM4.1-8B is the witness's mind, VoxCPM2 is his voice (the style tag is the game state), Whisper hears you. All of it runs on Modal: three right-sized GPUs behind scale-to-zero classes, kept warm during an examination, with the dramatic "voice-crack" beats fanned across containers via parallel .map() and the best take cached on a Volume.

Built for the Build Small hackathon (@Hugging Face × @Gradio). Models by @OpenBMB. Try it: [Space link] · 90-second demo: [video link]

#BuildSmall #Modal #Gradio #OpenSource #AI


Demo video shotlist (REQ-03) — ~80s

Record against the live Space (mic works in modal mode). demo_playthrough.py is your scripted rehearsal — the three killer lines are in SCRIPT there.

t Shot Notes
0:00–0:08 Title card + hook "Cross-examine a hostile witness — with your voice."
0:08–0:18 Click Call the witness Reid's composed opening line plays (instant, cached beat)
0:18–0:34 The mechanic, both ways Ask confidently → he clams up (bar: CONFIDENT). Ask hesitantly → he overshares (bar: HESITANT). This is the moat — linger here.
0:34–0:56 Land the 3 contradictions timeline → authorization → relationship. Show the Contradiction Engine verdict box firing each time.
0:56–1:08 The break 3rd catch → Reid's voice cracks (best of 32 cached takes). Win banner.
1:08–1:20 Architecture card "3 open models < 32B · Modal scale-to-zero · parallel .map() pre-gen · warm 5.3s reply / 8.6s voice." End on the Space URL.

Tips: warm the models first — redeploy with WITNESSBOX_KEEP_WARM=1 modal deploy modal_app.py ~5 min before recording (or take one throwaway turn; they stay warm 5 min) so no shot waits on a cold start. Quiet room for the mic; do one confident

  • one hesitant ask back-to-back so the contrast is unmistakable; let the voice-crack play fully — it's the payoff. Flip keep-warm back off afterward to stop idle spend.

Best-Use-of-Modal talking points (for the writeup / description)

  • Not just hosting — the runtime. Three models on three right-sized GPUs (A100 + 2×A10G), each a scale-to-zero @app.cls; idle → $0.
  • Honest latency, costed warmth: scale-to-zero by default ($0 idle). Opt into keep-warm (WITNESSBOX_KEEP_WARM=1) for a live session and a turn is ~5.3s (reply)
    • ~8.6s (voice), measured this deploy — text lands first.
  • Parallel .map() — verified: 36 takes fanned across containers; workers write WAVs to the Volume and return only metadata; the best-cracking break take (pitch instability 70.3 vs a 61–69 field) is kept. Dramatic beats then play instantly.
  • Parallel .map() fans the 32 voice-crack takes across containers and keeps the one that cracks most (librosa pitch-instability score), all at deploy time.
  • Volume persists the designed CFO reference voice, the model cache, and the chosen beats across cold starts.
  • Memory snapshots trim CPU-side init.
  • Cost/latency are measured, not fabricated.