Spaces:
Sleeping
Sleeping
| title: WitnessBox | |
| emoji: ⚖️ | |
| colorFrom: yellow | |
| colorTo: red | |
| sdk: gradio | |
| sdk_version: 5.50.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| tags: | |
| - track:wood | |
| - sponsor:modal | |
| - sponsor:openbmb | |
| - achievement:offbrand | |
| - build-small-hackathon | |
| - gradio | |
| - minicpm | |
| - voxcpm | |
| - modal | |
| - voice | |
| - game | |
| # ⚖️ WitnessBox — cross-examine a hostile AI witness with your *voice* | |
| > Interrogate **Marcus Reid, CFO of Halcyon Dynamics**. He reads *how you deliver* | |
| > — sound confident and he clams up; sound hesitant and he gets cocky and | |
| > overshares. Surface **three contradictions** and his voice **cracks** as he breaks. | |
| > | |
| > **Track:** 🍄 An Adventure in Thousand Token Wood · **Targeting:** Best Use of Modal + Best MiniCPM Build | |
| --- | |
| ## Why it's different | |
| Every other "interrogate a witness" build in this jam is text-and-logic. WitnessBox | |
| is the only one where **your vocal delivery is the input**: a `librosa` pass reads | |
| your *perceived* confidence (pauses + pace) and steers the witness in real time, | |
| and the witness answers back in a **voice that escalates** from composed to | |
| cracking. The moat is the audio loop, not the puzzle. | |
| > **The delivery meter is *perceived delivery*, never a lie detector.** It reads | |
| > how you sound (pauses, pace, pitch steadiness) — not whether anything is true. | |
| ## How a turn works | |
| ``` | |
| you speak ─┬─► Whisper ASR ───────────────► your question | |
| └─► librosa stance ─► CONFIDENT / NEUTRAL / HESITANT (steers the witness) | |
| your question ─► deterministic Contradiction Engine ─► catch? (reproducible verdict) | |
| persona + stance + tier + leak ─► MiniCPM4.1-8B ─► witness's line | |
| state ─► VoxCPM2 (voice style = game state) ─► audio (cached voice-crack on the win) | |
| ``` | |
| Hesitant delivery makes Reid leak a thread toward an uncaught lie. Confident | |
| delivery shuts him down. Catch all three (timeline · authorization · relationship) | |
| and he breaks; whiff too many and the bench excuses him — you lose. | |
| ## Models — all <32B, ~11B combined | |
| | Role | Model | Size | | |
| |---|---|---| | |
| | Witness brain | `openbmb/MiniCPM4.1-8B` | 8.2B | | |
| | Witness voice | `openbmb/VoxCPM2` (style tag = game state) | 2.3B | | |
| | Player ASR | `openai/whisper-small` (deployed) — `nvidia/nemotron-…-0.6b` is a one-image-swap upgrade (NeMo-only) | 0.24B | | |
| | Delivery stance | `librosa` (no model) | — | | |
| ## ⚙️ Best Use of Modal | |
| Modal is the **runtime** for all three GPU models and the beat pre-generator — | |
| used as a *platform*, not just a host (the prize counts "inference… all"): | |
| 1. **GPU inference behind `@app.cls`, scale-to-zero.** Three models on three | |
| right-sized GPUs (A100 + 2×A10G); idle → `$0` via `scaledown_window`. | |
| 2. **Opt-in keep-warm.** `min_containers` defaults to `0` — genuinely `$0` | |
| between examinations — and flips to `1` (`WITNESSBOX_KEEP_WARM=1`) for a live | |
| demo so turns don't eat a cold start. Scale-to-zero is the default; warmth is | |
| a deliberate, costed choice, not an always-on bill. | |
| 3. **Parallel `.map()`** pre-generates every scripted beat at deploy time, fanning | |
| the **32 voice-crack takes across containers at once** and keeping the best. | |
| 4. **Volume** persists the designed CFO reference voice + model cache + chosen beats. | |
| 5. **Right-sized GPUs** — an A100 only for the 8B witness brain; the 2B voice and | |
| the ASR ride cheaper A10Gs. | |
| **Measured (warm, this deploy).** A live dynamic turn is `MiniCPM4.1-8B` **→ 5.3s** | |
| for the witness's reply, then `VoxCPM2` **→ 8.6s** for ~4.5s of 48 kHz speech | |
| (RTF ≈ 1.9) — the line lands as **text first**, the voice follows. The five | |
| **scripted beats** (intro · opening · the voice-crack · win · lose) are pre-rendered | |
| by the parallel `.map()` pass and served straight from the Volume, so every | |
| *dramatic* moment plays **instantly** off the per-turn path. Idle containers → | |
| `$0` via `scaledown_window`. (Container-seconds / $-per-match read live from the | |
| Modal dashboard, not fabricated.) | |
| ## 🧠 Best MiniCPM Build | |
| The witness *is* a MiniCPM model. `openbmb/MiniCPM4.1-8B` runs the entire persona — | |
| it reads the delivery stance, decides what Reid admits or hides, and leaks a thread | |
| toward an uncaught lie when you sound unsure — and `openbmb/VoxCPM2` gives him the | |
| voice that cracks on the break. The 8B brain is the **core of the experience, not a | |
| bolt-on**: every line Reid speaks is MiniCPM under a stance- and tier-conditioned | |
| system prompt, so the drama lives or dies on how well a small model holds a character | |
| under pressure. | |
| ## Run it | |
| **Offline (no GPU, no Modal — boots anywhere):** | |
| ```bash | |
| pip install -r requirements.txt | |
| python app.py # WITNESSBOX_BACKEND defaults to "mock"; type your questions | |
| ``` | |
| The full game loop — stance, the catch engine, state, win/lose, audio autoplay — | |
| runs locally against a rule-based mock witness, so the end-to-end flow is provable | |
| without a single GPU. | |
| **Live (real models):** | |
| ```bash | |
| modal deploy modal_app.py # serves MiniCPM4.1-8B, VoxCPM2, Whisper ASR | |
| modal run modal_app.py # pre-generate the scripted beats (.map) | |
| WITNESSBOX_BACKEND=modal python app.py | |
| ``` | |
| On a Space, set `MODAL_TOKEN_ID` / `MODAL_TOKEN_SECRET` as secrets. Lookups are | |
| lazy and fall back to mock if Modal is unreachable, so the Space always boots. | |
| ## Integrity | |
| Detection fires against three **planted** lies with concrete cues — reliable, not | |
| "magical." The model never grades itself. Cost/latency numbers are measured. No | |
| "only entry that…" claims about a moving field. | |