Spaces:

build-small-hackathon
/

WitnessBox

Sleeping

App Files Files Community

WitnessBox / SUBMISSION.md

Farseen0

Deploy WitnessBox

c519923 verified 16 days ago

preview code

Raw

History Blame Contribute Delete

5.18 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

WitnessBox — submission pack

Everything needed to submit to Build Small (HF × Gradio, models < 32B). Track: 🍄 An Adventure in Thousand Token Wood · Primary target: Best Use of Modal.

Status checklist

#	Requirement	State
REQ-01	Public app, models < 32B	✅ MiniCPM4.1-8B (8.2B) + VoxCPM2 (2.3B) + Whisper-small (0.24B) ≈ 11B
REQ-02	Gradio Space, public	⏳ one command away — needs an HF write token (see below)
REQ-03	Demo video (60–90s)	⬜ you record — shotlist below; `scripts/demo_playthrough.py` is the dry-run
REQ-04	Social post tagging sponsors	⬜ you post — draft below
Modal	Genuine platform use	✅ 3 GPU classes, scale-to-zero, keep-warm, parallel `.map()` pre-gen, Volume, snapshots — proven live

The one action only you can take: paste a write-scoped HF token, then I run python3 scripts/deploy_space.py and the Space is live (code pushed, Modal secrets set, WITNESSBOX_BACKEND=modal). Get a token at https://huggingface.co/settings/tokens — either ! hf auth login in the prompt, or paste it and I'll use HF_TOKEN=….

Social post (REQ-04) — draft

X / short form

⚖️ I built WitnessBox: cross-examine a hostile AI witness — and your voice is the weapon. Sound confident and the CFO clams up; sound hesitant and he gets cocky and leaks. Catch 3 contradictions and his voice literally cracks.

All open models < 32B, served on @modal_labs: 🧠 MiniCPM4.1-8B · 🗣️ VoxCPM2 · 👂 Whisper — @OpenBMB on @huggingface, built with @Gradio.

#BuildSmall [Space link] [video link]

LinkedIn / long form

Most "interrogate the witness" games are text-and-logic. WitnessBox makes your delivery the input. A librosa pass reads your perceived confidence — pauses and pace, never a lie detector — and steers the witness in real time. He answers in a voice that escalates from composed to cracking.

Three open models, all under 32B, ~11B combined: MiniCPM4.1-8B is the witness's mind, VoxCPM2 is his voice (the style tag is the game state), Whisper hears you. All of it runs on Modal: three right-sized GPUs behind scale-to-zero classes, kept warm during an examination, with the dramatic "voice-crack" beats fanned across containers via parallel .map() and the best take cached on a Volume.

Built for the Build Small hackathon (@Hugging Face × @Gradio). Models by @OpenBMB. Try it: [Space link] · 90-second demo: [video link]

#BuildSmall #Modal #Gradio #OpenSource #AI

Demo video shotlist (REQ-03) — ~80s

Record against the live Space (mic works in modal mode). demo_playthrough.py is your scripted rehearsal — the three killer lines are in SCRIPT there.

t	Shot	Notes
0:00–0:08	Title card + hook	"Cross-examine a hostile witness — with your voice."
0:08–0:18	Click Call the witness	Reid's composed opening line plays (instant, cached beat)
0:18–0:34	The mechanic, both ways	Ask confidently → he clams up (bar: CONFIDENT). Ask hesitantly → he overshares (bar: HESITANT). This is the moat — linger here.
0:34–0:56	Land the 3 contradictions	timeline → authorization → relationship. Show the Contradiction Engine verdict box firing each time.
0:56–1:08	The break	3rd catch → Reid's voice cracks (best of 32 cached takes). Win banner.
1:08–1:20	Architecture card	"3 open models < 32B · Modal scale-to-zero · parallel `.map()` pre-gen · warm 5.3s reply / 8.6s voice." End on the Space URL.

Tips: warm the models first — redeploy with WITNESSBOX_KEEP_WARM=1 modal deploy modal_app.py ~5 min before recording (or take one throwaway turn; they stay warm 5 min) so no shot waits on a cold start. Quiet room for the mic; do one confident

one hesitant ask back-to-back so the contrast is unmistakable; let the voice-crack play fully — it's the payoff. Flip keep-warm back off afterward to stop idle spend.

Best-Use-of-Modal talking points (for the writeup / description)

Not just hosting — the runtime. Three models on three right-sized GPUs (A100 + 2×A10G), each a scale-to-zero @app.cls; idle → $0.
Honest latency, costed warmth: scale-to-zero by default ($0 idle). Opt into keep-warm (WITNESSBOX_KEEP_WARM=1) for a live session and a turn is ~5.3s (reply)
- ~8.6s (voice), measured this deploy — text lands first.
Parallel .map() — verified: 36 takes fanned across containers; workers write WAVs to the Volume and return only metadata; the best-cracking break take (pitch instability 70.3 vs a 61–69 field) is kept. Dramatic beats then play instantly.
Parallel .map() fans the 32 voice-crack takes across containers and keeps the one that cracks most (librosa pitch-instability score), all at deploy time.
Volume persists the designed CFO reference voice, the model cache, and the chosen beats across cold starts.
Memory snapshots trim CPU-side init.
Cost/latency are measured, not fabricated.