WitnessBox / README.md
Farseen0's picture
Tags: bare entries (validator uses naive line parser; comments break it)
3c2608c verified
|
Raw
History Blame Contribute Delete
5.59 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: WitnessBox
emoji: ⚖️
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
pinned: false
license: mit
tags:
  - track:wood
  - sponsor:modal
  - sponsor:openbmb
  - achievement:offbrand
  - build-small-hackathon
  - gradio
  - minicpm
  - voxcpm
  - modal
  - voice
  - game

⚖️ WitnessBox — cross-examine a hostile AI witness with your voice

Interrogate Marcus Reid, CFO of Halcyon Dynamics. He reads how you deliver — sound confident and he clams up; sound hesitant and he gets cocky and overshares. Surface three contradictions and his voice cracks as he breaks.

Track: 🍄 An Adventure in Thousand Token Wood · Targeting: Best Use of Modal + Best MiniCPM Build


Why it's different

Every other "interrogate a witness" build in this jam is text-and-logic. WitnessBox is the only one where your vocal delivery is the input: a librosa pass reads your perceived confidence (pauses + pace) and steers the witness in real time, and the witness answers back in a voice that escalates from composed to cracking. The moat is the audio loop, not the puzzle.

The delivery meter is perceived delivery, never a lie detector. It reads how you sound (pauses, pace, pitch steadiness) — not whether anything is true.

How a turn works

you speak ─┬─► Whisper ASR ───────────────► your question
           └─► librosa stance ─► CONFIDENT / NEUTRAL / HESITANT  (steers the witness)
your question ─► deterministic Contradiction Engine ─► catch?  (reproducible verdict)
persona + stance + tier + leak ─► MiniCPM4.1-8B ─► witness's line
state ─► VoxCPM2 (voice style = game state) ─► audio   (cached voice-crack on the win)

Hesitant delivery makes Reid leak a thread toward an uncaught lie. Confident delivery shuts him down. Catch all three (timeline · authorization · relationship) and he breaks; whiff too many and the bench excuses him — you lose.

Models — all <32B, ~11B combined

Role Model Size
Witness brain openbmb/MiniCPM4.1-8B 8.2B
Witness voice openbmb/VoxCPM2 (style tag = game state) 2.3B
Player ASR openai/whisper-small (deployed) — nvidia/nemotron-…-0.6b is a one-image-swap upgrade (NeMo-only) 0.24B
Delivery stance librosa (no model)

⚙️ Best Use of Modal

Modal is the runtime for all three GPU models and the beat pre-generator — used as a platform, not just a host (the prize counts "inference… all"):

  1. GPU inference behind @app.cls, scale-to-zero. Three models on three right-sized GPUs (A100 + 2×A10G); idle → $0 via scaledown_window.
  2. Opt-in keep-warm. min_containers defaults to 0 — genuinely $0 between examinations — and flips to 1 (WITNESSBOX_KEEP_WARM=1) for a live demo so turns don't eat a cold start. Scale-to-zero is the default; warmth is a deliberate, costed choice, not an always-on bill.
  3. Parallel .map() pre-generates every scripted beat at deploy time, fanning the 32 voice-crack takes across containers at once and keeping the best.
  4. Volume persists the designed CFO reference voice + model cache + chosen beats.
  5. Right-sized GPUs — an A100 only for the 8B witness brain; the 2B voice and the ASR ride cheaper A10Gs.

Measured (warm, this deploy). A live dynamic turn is MiniCPM4.1-8B → 5.3s for the witness's reply, then VoxCPM2 → 8.6s for ~4.5s of 48 kHz speech (RTF ≈ 1.9) — the line lands as text first, the voice follows. The five scripted beats (intro · opening · the voice-crack · win · lose) are pre-rendered by the parallel .map() pass and served straight from the Volume, so every dramatic moment plays instantly off the per-turn path. Idle containers → $0 via scaledown_window. (Container-seconds / $-per-match read live from the Modal dashboard, not fabricated.)

🧠 Best MiniCPM Build

The witness is a MiniCPM model. openbmb/MiniCPM4.1-8B runs the entire persona — it reads the delivery stance, decides what Reid admits or hides, and leaks a thread toward an uncaught lie when you sound unsure — and openbmb/VoxCPM2 gives him the voice that cracks on the break. The 8B brain is the core of the experience, not a bolt-on: every line Reid speaks is MiniCPM under a stance- and tier-conditioned system prompt, so the drama lives or dies on how well a small model holds a character under pressure.

Run it

Offline (no GPU, no Modal — boots anywhere):

pip install -r requirements.txt
python app.py            # WITNESSBOX_BACKEND defaults to "mock"; type your questions

The full game loop — stance, the catch engine, state, win/lose, audio autoplay — runs locally against a rule-based mock witness, so the end-to-end flow is provable without a single GPU.

Live (real models):

modal deploy modal_app.py            # serves MiniCPM4.1-8B, VoxCPM2, Whisper ASR
modal run modal_app.py               # pre-generate the scripted beats (.map)
WITNESSBOX_BACKEND=modal python app.py

On a Space, set MODAL_TOKEN_ID / MODAL_TOKEN_SECRET as secrets. Lookups are lazy and fall back to mock if Modal is unreachable, so the Space always boots.

Integrity

Detection fires against three planted lies with concrete cues — reliable, not "magical." The model never grades itself. Cost/latency numbers are measured. No "only entry that…" claims about a moving field.