Spaces:

build-small-hackathon
/

WitnessBox

Sleeping

App Files Files Community

WitnessBox / README.md

Farseen0

Tags: bare entries (validator uses naive line parser; comments break it)

3c2608c verified 13 days ago

preview code

Raw

History Blame Contribute Delete

5.59 kB

	---
	title: WitnessBox
	emoji: ⚖️
	colorFrom: yellow
	colorTo: red
	sdk: gradio
	sdk_version: 5.50.0
	app_file: app.py
	pinned: false
	license: mit
	tags:
	- track:wood
	- sponsor:modal
	- sponsor:openbmb
	- achievement:offbrand
	- build-small-hackathon
	- gradio
	- minicpm
	- voxcpm
	- modal
	- voice
	- game
	---

	# ⚖️ WitnessBox — cross-examine a hostile AI witness with your voice

	> Interrogate Marcus Reid, CFO of Halcyon Dynamics. He reads how you deliver
	> — sound confident and he clams up; sound hesitant and he gets cocky and
	> overshares. Surface three contradictions and his voice cracks as he breaks.
	>
	> Track: 🍄 An Adventure in Thousand Token Wood · Targeting: Best Use of Modal + Best MiniCPM Build

	---

	## Why it's different
	Every other "interrogate a witness" build in this jam is text-and-logic. WitnessBox
	is the only one where your vocal delivery is the input: a `librosa` pass reads
	your perceived confidence (pauses + pace) and steers the witness in real time,
	and the witness answers back in a voice that escalates from composed to
	cracking. The moat is the audio loop, not the puzzle.

	> *The delivery meter is perceived delivery, never a lie detector.* It reads
	> how you sound (pauses, pace, pitch steadiness) — not whether anything is true.

	## How a turn works
	```
	you speak ─┬─► Whisper ASR ───────────────► your question
	└─► librosa stance ─► CONFIDENT / NEUTRAL / HESITANT (steers the witness)
	your question ─► deterministic Contradiction Engine ─► catch? (reproducible verdict)
	persona + stance + tier + leak ─► MiniCPM4.1-8B ─► witness's line
	state ─► VoxCPM2 (voice style = game state) ─► audio (cached voice-crack on the win)
	```
	Hesitant delivery makes Reid leak a thread toward an uncaught lie. Confident
	delivery shuts him down. Catch all three (timeline · authorization · relationship)
	and he breaks; whiff too many and the bench excuses him — you lose.

	## Models — all <32B, ~11B combined
	\| Role \| Model \| Size \|
	\|---\|---\|---\|
	\| Witness brain \| `openbmb/MiniCPM4.1-8B` \| 8.2B \|
	\| Witness voice \| `openbmb/VoxCPM2` (style tag = game state) \| 2.3B \|
	\| Player ASR \| `openai/whisper-small` (deployed) — `nvidia/nemotron-…-0.6b` is a one-image-swap upgrade (NeMo-only) \| 0.24B \|
	\| Delivery stance \| `librosa` (no model) \| — \|

	## ⚙️ Best Use of Modal
	Modal is the runtime for all three GPU models and the beat pre-generator —
	used as a platform, not just a host (the prize counts "inference… all"):

	1. GPU inference behind `@app.cls`, scale-to-zero. Three models on three
	right-sized GPUs (A100 + 2×A10G); idle → `$0` via `scaledown_window`.
	2. Opt-in keep-warm. `min_containers` defaults to `0` — genuinely `$0`
	between examinations — and flips to `1` (`WITNESSBOX_KEEP_WARM=1`) for a live
	demo so turns don't eat a cold start. Scale-to-zero is the default; warmth is
	a deliberate, costed choice, not an always-on bill.
	3. Parallel `.map()` pre-generates every scripted beat at deploy time, fanning
	the 32 voice-crack takes across containers at once and keeping the best.
	4. Volume persists the designed CFO reference voice + model cache + chosen beats.
	5. Right-sized GPUs — an A100 only for the 8B witness brain; the 2B voice and
	the ASR ride cheaper A10Gs.

	Measured (warm, this deploy). A live dynamic turn is `MiniCPM4.1-8B` → 5.3s
	for the witness's reply, then `VoxCPM2` → 8.6s for ~4.5s of 48 kHz speech
	(RTF ≈ 1.9) — the line lands as text first, the voice follows. The five
	scripted beats (intro · opening · the voice-crack · win · lose) are pre-rendered
	by the parallel `.map()` pass and served straight from the Volume, so every
	dramatic moment plays instantly off the per-turn path. Idle containers →
	`$0` via `scaledown_window`. (Container-seconds / $-per-match read live from the
	Modal dashboard, not fabricated.)

	## 🧠 Best MiniCPM Build
	The witness is a MiniCPM model. `openbmb/MiniCPM4.1-8B` runs the entire persona —
	it reads the delivery stance, decides what Reid admits or hides, and leaks a thread
	toward an uncaught lie when you sound unsure — and `openbmb/VoxCPM2` gives him the
	voice that cracks on the break. The 8B brain is the **core of the experience, not a
	bolt-on**: every line Reid speaks is MiniCPM under a stance- and tier-conditioned
	system prompt, so the drama lives or dies on how well a small model holds a character
	under pressure.

	## Run it
	Offline (no GPU, no Modal — boots anywhere):
	```bash
	pip install -r requirements.txt
	python app.py # WITNESSBOX_BACKEND defaults to "mock"; type your questions
	```
	The full game loop — stance, the catch engine, state, win/lose, audio autoplay —
	runs locally against a rule-based mock witness, so the end-to-end flow is provable
	without a single GPU.

	Live (real models):
	```bash
	modal deploy modal_app.py # serves MiniCPM4.1-8B, VoxCPM2, Whisper ASR
	modal run modal_app.py # pre-generate the scripted beats (.map)
	WITNESSBOX_BACKEND=modal python app.py
	```
	On a Space, set `MODAL_TOKEN_ID` / `MODAL_TOKEN_SECRET` as secrets. Lookups are
	lazy and fall back to mock if Modal is unreachable, so the Space always boots.

	## Integrity
	Detection fires against three planted lies with concrete cues — reliable, not
	"magical." The model never grades itself. Cost/latency numbers are measured. No
	"only entry that…" claims about a moving field.