--- title: WitnessBox emoji: ⚖️ colorFrom: yellow colorTo: red sdk: gradio sdk_version: 5.50.0 app_file: app.py pinned: false license: mit tags: - track:wood - sponsor:modal - sponsor:openbmb - achievement:offbrand - build-small-hackathon - gradio - minicpm - voxcpm - modal - voice - game --- # ⚖️ WitnessBox — cross-examine a hostile AI witness with your *voice* > Interrogate **Marcus Reid, CFO of Halcyon Dynamics**. He reads *how you deliver* > — sound confident and he clams up; sound hesitant and he gets cocky and > overshares. Surface **three contradictions** and his voice **cracks** as he breaks. > > **Track:** 🍄 An Adventure in Thousand Token Wood · **Targeting:** Best Use of Modal + Best MiniCPM Build --- ## Why it's different Every other "interrogate a witness" build in this jam is text-and-logic. WitnessBox is the only one where **your vocal delivery is the input**: a `librosa` pass reads your *perceived* confidence (pauses + pace) and steers the witness in real time, and the witness answers back in a **voice that escalates** from composed to cracking. The moat is the audio loop, not the puzzle. > **The delivery meter is *perceived delivery*, never a lie detector.** It reads > how you sound (pauses, pace, pitch steadiness) — not whether anything is true. ## How a turn works ``` you speak ─┬─► Whisper ASR ───────────────► your question └─► librosa stance ─► CONFIDENT / NEUTRAL / HESITANT (steers the witness) your question ─► deterministic Contradiction Engine ─► catch? (reproducible verdict) persona + stance + tier + leak ─► MiniCPM4.1-8B ─► witness's line state ─► VoxCPM2 (voice style = game state) ─► audio (cached voice-crack on the win) ``` Hesitant delivery makes Reid leak a thread toward an uncaught lie. Confident delivery shuts him down. Catch all three (timeline · authorization · relationship) and he breaks; whiff too many and the bench excuses him — you lose. ## Models — all <32B, ~11B combined | Role | Model | Size | |---|---|---| | Witness brain | `openbmb/MiniCPM4.1-8B` | 8.2B | | Witness voice | `openbmb/VoxCPM2` (style tag = game state) | 2.3B | | Player ASR | `openai/whisper-small` (deployed) — `nvidia/nemotron-…-0.6b` is a one-image-swap upgrade (NeMo-only) | 0.24B | | Delivery stance | `librosa` (no model) | — | ## ⚙️ Best Use of Modal Modal is the **runtime** for all three GPU models and the beat pre-generator — used as a *platform*, not just a host (the prize counts "inference… all"): 1. **GPU inference behind `@app.cls`, scale-to-zero.** Three models on three right-sized GPUs (A100 + 2×A10G); idle → `$0` via `scaledown_window`. 2. **Opt-in keep-warm.** `min_containers` defaults to `0` — genuinely `$0` between examinations — and flips to `1` (`WITNESSBOX_KEEP_WARM=1`) for a live demo so turns don't eat a cold start. Scale-to-zero is the default; warmth is a deliberate, costed choice, not an always-on bill. 3. **Parallel `.map()`** pre-generates every scripted beat at deploy time, fanning the **32 voice-crack takes across containers at once** and keeping the best. 4. **Volume** persists the designed CFO reference voice + model cache + chosen beats. 5. **Right-sized GPUs** — an A100 only for the 8B witness brain; the 2B voice and the ASR ride cheaper A10Gs. **Measured (warm, this deploy).** A live dynamic turn is `MiniCPM4.1-8B` **→ 5.3s** for the witness's reply, then `VoxCPM2` **→ 8.6s** for ~4.5s of 48 kHz speech (RTF ≈ 1.9) — the line lands as **text first**, the voice follows. The five **scripted beats** (intro · opening · the voice-crack · win · lose) are pre-rendered by the parallel `.map()` pass and served straight from the Volume, so every *dramatic* moment plays **instantly** off the per-turn path. Idle containers → `$0` via `scaledown_window`. (Container-seconds / $-per-match read live from the Modal dashboard, not fabricated.) ## 🧠 Best MiniCPM Build The witness *is* a MiniCPM model. `openbmb/MiniCPM4.1-8B` runs the entire persona — it reads the delivery stance, decides what Reid admits or hides, and leaks a thread toward an uncaught lie when you sound unsure — and `openbmb/VoxCPM2` gives him the voice that cracks on the break. The 8B brain is the **core of the experience, not a bolt-on**: every line Reid speaks is MiniCPM under a stance- and tier-conditioned system prompt, so the drama lives or dies on how well a small model holds a character under pressure. ## Run it **Offline (no GPU, no Modal — boots anywhere):** ```bash pip install -r requirements.txt python app.py # WITNESSBOX_BACKEND defaults to "mock"; type your questions ``` The full game loop — stance, the catch engine, state, win/lose, audio autoplay — runs locally against a rule-based mock witness, so the end-to-end flow is provable without a single GPU. **Live (real models):** ```bash modal deploy modal_app.py # serves MiniCPM4.1-8B, VoxCPM2, Whisper ASR modal run modal_app.py # pre-generate the scripted beats (.map) WITNESSBOX_BACKEND=modal python app.py ``` On a Space, set `MODAL_TOKEN_ID` / `MODAL_TOKEN_SECRET` as secrets. Lookups are lazy and fall back to mock if Modal is unreachable, so the Space always boots. ## Integrity Detection fires against three **planted** lies with concrete cues — reliable, not "magical." The model never grades itself. Cost/latency numbers are measured. No "only entry that…" claims about a moving field.