Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.19.0
title: WitnessBox
emoji: ⚖️
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
pinned: false
license: mit
tags:
- track:wood
- sponsor:modal
- sponsor:openbmb
- achievement:offbrand
- build-small-hackathon
- gradio
- minicpm
- voxcpm
- modal
- voice
- game
⚖️ WitnessBox — cross-examine a hostile AI witness with your voice
Interrogate Marcus Reid, CFO of Halcyon Dynamics. He reads how you deliver — sound confident and he clams up; sound hesitant and he gets cocky and overshares. Surface three contradictions and his voice cracks as he breaks.
Track: 🍄 An Adventure in Thousand Token Wood · Targeting: Best Use of Modal + Best MiniCPM Build
Why it's different
Every other "interrogate a witness" build in this jam is text-and-logic. WitnessBox
is the only one where your vocal delivery is the input: a librosa pass reads
your perceived confidence (pauses + pace) and steers the witness in real time,
and the witness answers back in a voice that escalates from composed to
cracking. The moat is the audio loop, not the puzzle.
The delivery meter is perceived delivery, never a lie detector. It reads how you sound (pauses, pace, pitch steadiness) — not whether anything is true.
How a turn works
you speak ─┬─► Whisper ASR ───────────────► your question
└─► librosa stance ─► CONFIDENT / NEUTRAL / HESITANT (steers the witness)
your question ─► deterministic Contradiction Engine ─► catch? (reproducible verdict)
persona + stance + tier + leak ─► MiniCPM4.1-8B ─► witness's line
state ─► VoxCPM2 (voice style = game state) ─► audio (cached voice-crack on the win)
Hesitant delivery makes Reid leak a thread toward an uncaught lie. Confident delivery shuts him down. Catch all three (timeline · authorization · relationship) and he breaks; whiff too many and the bench excuses him — you lose.
Models — all <32B, ~11B combined
| Role | Model | Size |
|---|---|---|
| Witness brain | openbmb/MiniCPM4.1-8B |
8.2B |
| Witness voice | openbmb/VoxCPM2 (style tag = game state) |
2.3B |
| Player ASR | openai/whisper-small (deployed) — nvidia/nemotron-…-0.6b is a one-image-swap upgrade (NeMo-only) |
0.24B |
| Delivery stance | librosa (no model) |
— |
⚙️ Best Use of Modal
Modal is the runtime for all three GPU models and the beat pre-generator — used as a platform, not just a host (the prize counts "inference… all"):
- GPU inference behind
@app.cls, scale-to-zero. Three models on three right-sized GPUs (A100 + 2×A10G); idle →$0viascaledown_window. - Opt-in keep-warm.
min_containersdefaults to0— genuinely$0between examinations — and flips to1(WITNESSBOX_KEEP_WARM=1) for a live demo so turns don't eat a cold start. Scale-to-zero is the default; warmth is a deliberate, costed choice, not an always-on bill. - Parallel
.map()pre-generates every scripted beat at deploy time, fanning the 32 voice-crack takes across containers at once and keeping the best. - Volume persists the designed CFO reference voice + model cache + chosen beats.
- Right-sized GPUs — an A100 only for the 8B witness brain; the 2B voice and the ASR ride cheaper A10Gs.
Measured (warm, this deploy). A live dynamic turn is MiniCPM4.1-8B → 5.3s
for the witness's reply, then VoxCPM2 → 8.6s for ~4.5s of 48 kHz speech
(RTF ≈ 1.9) — the line lands as text first, the voice follows. The five
scripted beats (intro · opening · the voice-crack · win · lose) are pre-rendered
by the parallel .map() pass and served straight from the Volume, so every
dramatic moment plays instantly off the per-turn path. Idle containers →
$0 via scaledown_window. (Container-seconds / $-per-match read live from the
Modal dashboard, not fabricated.)
🧠 Best MiniCPM Build
The witness is a MiniCPM model. openbmb/MiniCPM4.1-8B runs the entire persona —
it reads the delivery stance, decides what Reid admits or hides, and leaks a thread
toward an uncaught lie when you sound unsure — and openbmb/VoxCPM2 gives him the
voice that cracks on the break. The 8B brain is the core of the experience, not a
bolt-on: every line Reid speaks is MiniCPM under a stance- and tier-conditioned
system prompt, so the drama lives or dies on how well a small model holds a character
under pressure.
Run it
Offline (no GPU, no Modal — boots anywhere):
pip install -r requirements.txt
python app.py # WITNESSBOX_BACKEND defaults to "mock"; type your questions
The full game loop — stance, the catch engine, state, win/lose, audio autoplay — runs locally against a rule-based mock witness, so the end-to-end flow is provable without a single GPU.
Live (real models):
modal deploy modal_app.py # serves MiniCPM4.1-8B, VoxCPM2, Whisper ASR
modal run modal_app.py # pre-generate the scripted beats (.map)
WITNESSBOX_BACKEND=modal python app.py
On a Space, set MODAL_TOKEN_ID / MODAL_TOKEN_SECRET as secrets. Lookups are
lazy and fall back to mock if Modal is unreachable, so the Space always boots.
Integrity
Detection fires against three planted lies with concrete cues — reliable, not "magical." The model never grades itself. Cost/latency numbers are measured. No "only entry that…" claims about a moving field.