Spaces:

build-small-hackathon
/

WitnessBox

Sleeping

App Files Files Community

WitnessBox / SUBMISSION.md

Farseen0

Deploy WitnessBox

c519923 verified 16 days ago

preview code

Raw

History Blame Contribute Delete

5.18 kB

	# WitnessBox — submission pack

	Everything needed to submit to Build Small (HF × Gradio, models < 32B).
	Track: 🍄 An Adventure in Thousand Token Wood · Primary target: Best Use of Modal.

	---

	## Status checklist
	\| # \| Requirement \| State \|
	\|---\|---\|---\|
	\| REQ-01 \| Public app, models < 32B \| ✅ MiniCPM4.1-8B (8.2B) + VoxCPM2 (2.3B) + Whisper-small (0.24B) ≈ 11B \|
	\| REQ-02 \| Gradio Space, public \| ⏳ one command away — needs an HF write token (see below) \|
	\| REQ-03 \| Demo video (60–90s) \| ⬜ you record — shotlist below; `scripts/demo_playthrough.py` is the dry-run \|
	\| REQ-04 \| Social post tagging sponsors \| ⬜ you post — draft below \|
	\| Modal \| Genuine platform use \| ✅ 3 GPU classes, scale-to-zero, keep-warm, parallel `.map()` pre-gen, Volume, snapshots — proven live \|

	The one action only you can take: paste a write-scoped HF token, then I run
	`python3 scripts/deploy_space.py` and the Space is live (code pushed, Modal secrets
	set, `WITNESSBOX_BACKEND=modal`). Get a token at https://huggingface.co/settings/tokens
	— either `! hf auth login` in the prompt, or paste it and I'll use `HF_TOKEN=…`.

	---

	## Social post (REQ-04) — draft

	X / short form
	> ⚖️ I built WitnessBox: cross-examine a hostile AI witness — and your voice
	> is the weapon. Sound confident and the CFO clams up; sound hesitant and he gets
	> cocky and leaks. Catch 3 contradictions and his voice literally cracks.
	>
	> All open models < 32B, served on @modal_labs:
	> 🧠 MiniCPM4.1-8B · 🗣️ VoxCPM2 · 👂 Whisper — @OpenBMB on @huggingface, built with @Gradio.
	>
	> #BuildSmall [Space link] [video link]

	LinkedIn / long form
	> Most "interrogate the witness" games are text-and-logic. WitnessBox makes your
	> delivery the input. A librosa pass reads your perceived confidence — pauses
	> and pace, never a lie detector — and steers the witness in real time. He answers
	> in a voice that escalates from composed to cracking.
	>
	> Three open models, all under 32B, ~11B combined: MiniCPM4.1-8B is the witness's
	> mind, VoxCPM2 is his voice (the style tag is the game state), Whisper hears you.
	> All of it runs on Modal: three right-sized GPUs behind scale-to-zero classes,
	> kept warm during an examination, with the dramatic "voice-crack" beats fanned
	> across containers via parallel `.map()` and the best take cached on a Volume.
	>
	> Built for the Build Small hackathon (@Hugging Face × @Gradio). Models by @OpenBMB.
	> Try it: [Space link] · 90-second demo: [video link]
	>
	> #BuildSmall #Modal #Gradio #OpenSource #AI

	---

	## Demo video shotlist (REQ-03) — ~80s

	Record against the live Space (mic works in `modal` mode). `demo_playthrough.py`
	is your scripted rehearsal — the three killer lines are in `SCRIPT` there.

	\| t \| Shot \| Notes \|
	\|---\|---\|---\|
	\| 0:00–0:08 \| Title card + hook \| "Cross-examine a hostile witness — with your voice." \|
	\| 0:08–0:18 \| Click Call the witness \| Reid's composed opening line plays (instant, cached beat) \|
	\| 0:18–0:34 \| The mechanic, both ways \| Ask confidently → he clams up (bar: CONFIDENT). Ask hesitantly → he overshares (bar: HESITANT). This is the moat — linger here. \|
	\| 0:34–0:56 \| Land the 3 contradictions \| timeline → authorization → relationship. Show the Contradiction Engine verdict box firing each time. \|
	\| 0:56–1:08 \| The break \| 3rd catch → Reid's voice cracks (best of 32 cached takes). Win banner. \|
	\| 1:08–1:20 \| Architecture card \| "3 open models < 32B · Modal scale-to-zero · parallel `.map()` pre-gen · warm 5.3s reply / 8.6s voice." End on the Space URL. \|

	Tips: warm the models first — redeploy with `WITNESSBOX_KEEP_WARM=1 modal
	deploy modal_app.py` ~5 min before recording (or take one throwaway turn; they stay
	warm 5 min) so no shot waits on a cold start. Quiet room for the mic; do one confident
	+ one hesitant ask back-to-back so the contrast is unmistakable; let the voice-crack
	play fully — it's the payoff. Flip keep-warm back off afterward to stop idle spend.

	---

	## Best-Use-of-Modal talking points (for the writeup / description)
	- Not just hosting — the runtime. Three models on three right-sized GPUs
	(A100 + 2×A10G), each a scale-to-zero `@app.cls`; idle → `$0`.
	- Honest latency, costed warmth: scale-to-zero by default (`$0` idle). Opt into
	keep-warm (`WITNESSBOX_KEEP_WARM=1`) for a live session and a turn is ~5.3s (reply)
	+ ~8.6s (voice), measured this deploy — text lands first.
	- Parallel `.map()` — verified: 36 takes fanned across containers; workers write
	WAVs to the Volume and return only metadata; the best-cracking break take (pitch
	instability 70.3 vs a 61–69 field) is kept. Dramatic beats then play instantly.
	- Parallel `.map()` fans the 32 voice-crack takes across containers and keeps the
	one that cracks most (librosa pitch-instability score), all at deploy time.
	- Volume persists the designed CFO reference voice, the model cache, and the
	chosen beats across cold starts.
	- Memory snapshots trim CPU-side init.
	- Cost/latency are measured, not fabricated.