Spaces:

build-small-hackathon
/

JudgeGPT

Sleeping

@@ -1,105 +1,173 @@
----
-title: Judge-GPT
-emoji: ⚖️
-colorFrom: yellow
-colorTo: red
-sdk: gradio
-sdk_version: 6.17.3
-app_file: app.py
-pinned: false
-license: mit
-short_description: AI-native miniature trials under 32B.
----
-# Judge-GPT
-Judge-GPT is a cinematic Gradio Space for the Build Small Hackathon's Thousand Token Wood track. It runs two-minute AI-native miniature trials where small-model agents act as advocates, judge, jurors, clerk, and evidence auditor.
-The app is built to stay under the 32B named-model budget:
-- `openai/gpt-oss-20b` for primary legal reasoning.
-- `openbmb/AgentCPM-Explore` for clerk/stage/verdict style.
-- `nvidia/Nemotron-Orchestrator-8B` for juror and evidence-auditor review.
-Total named budget: 32B parameters.
-## What the app can do
-- Run cached trials for the Socrates and Barnaby demo cases without network search.
-- Run the Live Search Tribunal path, which builds a search packet from a user query and stops if live material is too weak to support a trial.
-- Add a hypothetical sidebar to shift the framing of a trial without editing cached case files.
-- Switch trial pacing between swift, measured, and ceremonial speeds.
-- Stage the courtroom with phase-specific visuals, agent puppets, evidence props, captions, and browser audio cues.
-- Show the Mind Layer as a compact JSON trace of agent turns and phase metadata.
-- Call a Modal streaming endpoint when `MODAL_TRIAL_URL` is configured. Endpoint or model failures stop the trial instead of substituting cached dialogue.
-- Retain decree and agent-trace export helpers in `sovereign_bench/export.py` for future UI restoration.
-## Limitations
-- Judge-GPT is not legal advice and should not be used for real legal decisions.
-- Live search snippets are not independently verified by the app.
-- Output quality depends on Modal GPU availability, token limits, and the configured Hugging Face models.
-- Model, Modal, or live retrieval failures stop the current trial rather than returning substitute courtroom dialogue.
-- Trial results are not persisted across sessions.
-- Export generation remains in the codebase, but the visible download UI is currently hidden.
-## Run locally
-```powershell
-python -m pip install -r requirements.txt
-python app.py
-```
-## Modal backend
-The Gradio app works locally without Modal. If `MODAL_TRIAL_URL` is set, the Space calls the Modal streaming endpoint and stops the trial if the endpoint is unavailable.
-The deployed Modal endpoint runs each role prompt through a GPU-backed vLLM class on H100 by default. Traces mark successful GPU calls with `runtime: modal-gpu-vllm`, `provider: modal-gpu-vllm`, and `gpu: H100`. If a GPU/model load fails, the trial stops; the app does not substitute provider or cached dialogue.
-```powershell
-python -m modal deploy modal_app.py
-```
-Keep the deployed endpoint URL as a Hugging Face Space variable named `MODAL_TRIAL_URL`.
-## Project targets
-Workspace connected to:
-- GitHub: `https://github.com/aliiqbal24/BuildSmallfinal.git`
-- Modal profile: `ali-j-iqbal24`
-- Hugging Face user: `AliIqbal05`
-## Secrets
-Credentials are not committed to this repo.
-- Local Hugging Face CLI auth is stored in the Hugging Face cache.
-- Modal auth is stored in the local Modal profile.
-- Modal has a secret named `huggingface` with `HF_TOKEN`.
-Use the Modal secret in functions like this:
-```python
-@app.function(secrets=[modal.Secret.from_name("huggingface")])
-def run_model():
-    token = os.getenv("HF_TOKEN")
-```
-## Developer guide
-- `app.py`: Gradio UI, CSS, JavaScript audio hooks, HTML renderers, and Modal/local streaming switch.
-- `sovereign_bench/engine.py`: trial phases, agent orchestration, verdict assembly, and trace construction.
-- `sovereign_bench/llm.py`: Hugging Face calls, strict model error handling, and prompt building.
-- `sovereign_bench/retrieval.py`: live search packet construction.
-- `sovereign_bench/models.py`: Pydantic schemas for cases, evidence, events, turns, votes, and verdicts.
-- `sovereign_bench/cases.py`: cached demo case packets.
-- `sovereign_bench/export.py`: dormant decree and trace writers.
-- `modal_app.py`: Modal deployment and GPU-backed streaming endpoint.
-- `tests/`: engine, case, and rendering regression coverage.
-## Verify Modal to Hugging Face
-```powershell
-python -m modal run modal_app.py
-```

+---
+title: Judge-GPT
+emoji: ⚖️
+colorFrom: yellow
+colorTo: red
+sdk: gradio
+sdk_version: 6.17.3
+app_file: app.py
+pinned: false
+license: mit
+short_description: AI-native miniature trials under 32B.
+tags:
+  - track:wood
+  - sponsor:openai
+  - sponsor:nvidia
+  - sponsor:modal
+  - achievement:offbrand
+  - achievement:fieldnotes
+---
+# Judge-GPT
+Judge-GPT is a cinematic Gradio courtroom for the Build Small Hackathon's Thousand Token Wood track. It turns a compact evidence packet into a two-minute AI-native trial: a clerk opens the docket, two lawyers argue opposite sides, Marcus Aurelius presides, six fixed-perspective jurors vote, and the court seals a verdict.
+The point is not legal advice. It is a small-model theater for structured disagreement: evidence is visible, roles are constrained, hidden reasoning is stripped, and every trial leaves a trace of which agent said what.
+## Submission Links
+- Hugging Face Space: https://huggingface.co/spaces/build-small-hackathon/JudgeGPT
+- Demo video: https://drive.google.com/drive/folders/10pWJ7NVCsnVV7wOlqm4MGWg4Kmh4rMY2?usp=sharing
+- Social post: TODO paste final public social post URL
+- GitHub repo: https://github.com/aliiqbal24/BuildSmallfinal
+- Field guide validator: https://build-small-hackathon-field-guide.hf.space/submit
+## What Judges Should Try
+1. Open the Space and keep the default `Trial of Socrates`.
+2. Click `Begin Trial`.
+3. Watch the courtroom progress from intake to verdict.
+4. Hover the judge, clerk, lawyers, and jurors to inspect model/agent threads.
+5. Open the `Evidence Drawer` and `Juror Panel` tabs after the verdict.
+6. Try `Greg Heffley vs Mom` for a lighter family-court case.
+7. Try `Custom` to write a short dispute and up to three pieces of evidence per side directly into the docket book.
+## Why It Fits Build Small
+- **Thousand Token Wood:** the app is whimsical, theatrical, and AI-native rather than a generic chatbot.
+- **Best Use of Codex:** Codex was used throughout implementation, debugging, UI iteration, tests, and commit prep in the connected GitHub repo.
+- **Nemotron Hardware Prize:** Nemotron is a core runtime model for the jury and juror vote generation.
+- **Best Use of Modal:** the Gradio Space delegates live model inference to a Modal GPU streaming endpoint.
+- **Off-Brand:** the UI pushes past stock Gradio with a custom courtroom, animated puppets, docket book, evidence props, audio cues, and verdict staging.
+- **Field Notes:** this README documents the build idea, model choices, runtime architecture, limitations, and submission checklist.
+## Small-Model Budget
+Every named model is under the 32B parameter cap.
+| Role | Model | Budgeted size | Used for |
+| --- | --- | ---: | --- |
+| Presiding advocate | `openai/gpt-oss-20b` | 20B | Judge, claimant lawyer, respondent lawyer, verdict voice |
+| Clerk of style | `openbmb/AgentCPM-Explore` | 4B | Clerk/stage voice |
+| Jury ring | `nvidia/Nemotron-Orchestrator-8B` | 8B | Jury panel and six juror votes |
+Displayed aggregate budget: 32B. The app does not use a model above 32B.
+## How It Works
+Judge-GPT runs a deterministic courtroom sequence over a `CasePacket`:
+1. Clerk opens the docket.
+2. Judge frames the dispute.
+3. Mike OSS argues for the claimant.
+4. Harvey Vector argues for the respondent.
+5. The evidence record is displayed without adding a third lawyer.
+6. The judge asks a hinge question.
+7. Each lawyer answers from their side.
+8. Nemotron Jury retires the panel.
+9. Six named jurors vote from distinct worldviews.
+10. The judge announces the final verdict.
+The shipped demo cases are:
+- `The Polis v. Socrates`
+- `Greg Heffley v. Mom`
+- `Custom`, built from the docket-book fields in the UI
+## Runtime Architecture
+- `app.py` renders the Gradio UI, courtroom HTML/CSS, audio hooks, case preview book, and live event stream.
+- `sovereign_bench/engine.py` orchestrates trial phases, model calls, evidence events, jury votes, verdict assembly, and trace metadata.
+- `sovereign_bench/llm.py` builds role prompts, calls Hugging Face-compatible chat models, and rejects hidden reasoning or instruction echoes.
+- `sovereign_bench/cases.py` contains the cached demo case packets.
+- `modal_app.py` hosts the GPU-backed streaming endpoint used by the Space.
+- `tests/` contains engine, case, and rendering regression tests.
+The Gradio app uses `MODAL_TRIAL_URL` when set, otherwise it uses the built-in deployed Modal endpoint. The Modal app owns the Hugging Face token through a Modal secret named `huggingface`; no real credentials are committed.
+## Run Locally
+```powershell
+python -m pip install -r requirements.txt
+python app.py
+```
+Open:
+```text
+http://127.0.0.1:7860
+```
+## Deploy Modal Backend
+```powershell
+python -m modal deploy modal_app.py
+```
+After deployment, pre-warm every configured courtroom model in the deployed `sovereign-bench` app so the first trial does not wait for all GPU containers to cold start. Run this after each deploy because deployments reset Modal autoscaler overrides:
+```powershell
+python -m modal run modal_app.py::warm_models
+```
+If the endpoint changes, set the Hugging Face Space variable:
+```text
+MODAL_TRIAL_URL=https://your-modal-endpoint.example
+```
+## Deploy Hugging Face Space
+Create or upload this repo as a Gradio Space inside the official Build Small org:
+```text
+build-small-hackathon/<your-space-name>
+```
+Space settings:
+- SDK: Gradio
+- App file: `app.py`
+- Python requirements: `requirements.txt`
+- Optional variable: `MODAL_TRIAL_URL`
+- No Space secret is required if using the hosted Modal endpoint.
+## Verification
+```powershell
+python -m pytest
+```
+Focused checks used during final prep:
+```powershell
+python -m pytest tests/test_engine.py tests/test_ui_rendering.py
+```
+## Limitations
+- Judge-GPT is not legal advice and should not be used for real legal decisions.
+- The demo packets are compact, staged evidence packets, not exhaustive source research.
+- Model, Modal, or retrieval failures stop the current trial instead of substituting fake dialogue.
+- Trial results are not persisted across sessions.
+- Custom trials require a short case context and evidence from both sides.
+## Final Submission Checklist
+- [ ] Push the repo to the Build Small Hugging Face org as a Gradio Space.
+- [ ] Confirm the Space launches and can complete `Trial of Socrates`.
+- [ ] Record a short demo video showing the trial flow and verdict.
+- [ ] Replace the `Demo video` TODO above with the final public URL.
+- [ ] Publish one social post about the app.
+- [ ] Replace the `Social post` TODO above with the final public URL.
+- [ ] Run the README through the Build Small validator.

app.py CHANGED Viewed

The diff for this file is too large to render. See raw diff

assets/ATTRIBUTION.md CHANGED Viewed

@@ -1,10 +1,10 @@
-# Asset Attribution
-## `courtroom-dickinson.jpg`
-- Source: https://commons.wikimedia.org/wiki/File:Dickinson_Law_Courtroom.jpg
-- Description: Penn State University, Dickinson School of Law courtroom
-- Author: Jeremy Hess Photography
-- License: Creative Commons CC0 1.0 Universal Public Domain Dedication
-- Local use: cinematic courtroom background for Sovereign Bench

+# Asset Attribution
+## `courtroom-dickinson.jpg`
+- Source: https://commons.wikimedia.org/wiki/File:Dickinson_Law_Courtroom.jpg
+- Description: Penn State University, Dickinson School of Law courtroom
+- Author: Jeremy Hess Photography
+- License: Creative Commons CC0 1.0 Universal Public Domain Dedication
+- Local use: cinematic courtroom background for Sovereign Bench

assets/audio/ATTRIBUTION.md CHANGED Viewed

@@ -1,51 +1,51 @@
-# Audio Attribution
-All selected audio is stored locally in `assets/audio/` for the animated courtroom episode.
-## Courtroom score and judgement sting
-- Files: `courtroom.ogg`, `Judgement.ogg`
-- Source: OpenGameArt, "Courtroom and Judgement"
-- Author: Spring Spring
-- License: CC0
-- URL: https://opengameart.org/content/courtroom-and-judgement
-## Courtroom chatter and crowd reaction
-- File: `crowd_shouting.ogg`
-- Source: OpenGameArt, "Crowd Shouting/Speaking Ambience"
-- Author: StarNinjas
-- License: CC0
-- URL: https://opengameart.org/content/crowd-shoutingspeaking-ambience
-## Gavel and wood hits
-- Files: `wood_hammer_01.ogg`, `wood_hit_03.ogg`
-- Source: OpenGameArt, "100 CC0 metal and wood SFX"
-- Author: rubberduck
-- License: CC0
-- URL: https://opengameart.org/content/100-cc0-metal-and-wood-sfx
-## Lawyer footsteps
-- File: `steps_in_wood_floor.wav`
-- Source: OpenGameArt, "Steps in wood floor"
-- Author: mikeask
-- License: CC0
-- URL: https://opengameart.org/content/steps-in-wood-floor
-## Book and paper movement
-- Files: `paper_sound_1.mp3`, `paper_sound_4.mp3`
-- Source: OpenGameArt, "Various Paper Sound Effects"
-- Author: Luckius
-- License: CC0
-- URL: https://opengameart.org/content/various-paper-sound-effects
-## Docket selection UI cue
-- File: `select_001.ogg`
-- Source: OpenGameArt, "Interface Sounds"
-- Author: Kenney
-- License: CC0
-- URL: https://opengameart.org/content/interface-sounds

+# Audio Attribution
+All selected audio is stored locally in `assets/audio/` for the animated courtroom episode.
+## Courtroom score and judgement sting
+- Files: `courtroom.ogg`, `Judgement.ogg`
+- Source: OpenGameArt, "Courtroom and Judgement"
+- Author: Spring Spring
+- License: CC0
+- URL: https://opengameart.org/content/courtroom-and-judgement
+## Courtroom chatter and crowd reaction
+- File: `crowd_shouting.ogg`
+- Source: OpenGameArt, "Crowd Shouting/Speaking Ambience"
+- Author: StarNinjas
+- License: CC0
+- URL: https://opengameart.org/content/crowd-shoutingspeaking-ambience
+## Gavel and wood hits
+- Files: `wood_hammer_01.ogg`, `wood_hit_03.ogg`
+- Source: OpenGameArt, "100 CC0 metal and wood SFX"
+- Author: rubberduck
+- License: CC0
+- URL: https://opengameart.org/content/100-cc0-metal-and-wood-sfx
+## Lawyer footsteps
+- File: `steps_in_wood_floor.wav`
+- Source: OpenGameArt, "Steps in wood floor"
+- Author: mikeask
+- License: CC0
+- URL: https://opengameart.org/content/steps-in-wood-floor
+## Book and paper movement
+- Files: `paper_sound_1.mp3`, `paper_sound_4.mp3`
+- Source: OpenGameArt, "Various Paper Sound Effects"
+- Author: Luckius
+- License: CC0
+- URL: https://opengameart.org/content/various-paper-sound-effects
+## Docket selection UI cue
+- File: `select_001.ogg`
+- Source: OpenGameArt, "Interface Sounds"
+- Author: Kenney
+- License: CC0
+- URL: https://opengameart.org/content/interface-sounds

assets/book/README.md CHANGED Viewed

@@ -1,14 +1,14 @@
-# Docket Book Assets
-These project-bound UI prop assets were generated with the built-in Codex image generation tool, then processed locally from a chroma-key background to transparent PNGs.
-- `docket-book-open.png`: open docket book used before the trial starts.
-- `docket-book-closed.png`: closed docket book used after the trial begins.
-- `docket-book-open-keyed.png`: preserved chroma-key source.
-- `docket-book-closed-keyed.png`: preserved chroma-key source.
-Generation prompt summary:
-- Antique legal docket book, warm parchment or dark leather, gold corner protectors, polished painterly game UI prop, centered with generous padding.
-- No text, no logos, no watermark, no hands, no pen.
-- Generated on a flat `#00ff00` background for local alpha extraction.

+# Docket Book Assets
+These project-bound UI prop assets were generated with the built-in Codex image generation tool, then processed locally from a chroma-key background to transparent PNGs.
+- `docket-book-open.png`: open docket book used before the trial starts.
+- `docket-book-closed.png`: closed docket book used after the trial begins.
+- `docket-book-open-keyed.png`: preserved chroma-key source.
+- `docket-book-closed-keyed.png`: preserved chroma-key source.
+Generation prompt summary:
+- Antique legal docket book, warm parchment or dark leather, gold corner protectors, polished painterly game UI prop, centered with generous padding.
+- No text, no logos, no watermark, no hands, no pen.
+- Generated on a flat `#00ff00` background for local alpha extraction.

data/README.md CHANGED Viewed

@@ -1,5 +1,5 @@
-# Sovereign Bench Agent Trace Sample
-This sample contains compact phase-level trace rows from the cached Barnaby Buttons trial. Runtime traces exported by the Gradio app include the full structured `TrialEvent` objects.
-The trace is synthetic and intended for hackathon demonstration, reproducibility, and UI testing.

+# Sovereign Bench Agent Trace Sample
+This sample contains compact phase-level trace rows from the cached Barnaby Buttons trial. Runtime traces exported by the Gradio app include the full structured `TrialEvent` objects.
+The trace is synthetic and intended for hackathon demonstration, reproducibility, and UI testing.

data/agent_trace_sample.json CHANGED Viewed

@@ -1,23 +1,23 @@
-[
-  {
-    "phase": "intake",
-    "case_id": "barnaby",
-    "agent": "Clerk Meridian",
-    "model": "openbmb/AgentCPM-Explore",
-    "summary": "Opened The People v. Barnaby Buttons and recorded source provenance for cached demo reliability."
-  },
-  {
-    "phase": "evidence",
-    "case_id": "barnaby",
-    "agent": "Auditor Prism",
-    "model": "nvidia/Nemotron-Orchestrator-8B",
-    "summary": "Scored ledger ink, crumb trail, calendar motive, and biscuit alibi as directional evidence with uncertainty."
-  },
-  {
-    "phase": "verdict",
-    "case_id": "barnaby",
-    "agent": "Marcus Aurelius",
-    "model": "openai/gpt-oss-20b",
-    "summary": "Issued a narrow claimant finding with cited evidence IDs and an explicit uncertainty warning."
-  }
-]

+[
+  {
+    "phase": "intake",
+    "case_id": "barnaby",
+    "agent": "Clerk Meridian",
+    "model": "openbmb/AgentCPM-Explore",
+    "summary": "Opened The People v. Barnaby Buttons and recorded source provenance for cached demo reliability."
+  },
+  {
+    "phase": "evidence",
+    "case_id": "barnaby",
+    "agent": "Auditor Prism",
+    "model": "nvidia/Nemotron-Orchestrator-8B",
+    "summary": "Scored ledger ink, crumb trail, calendar motive, and biscuit alibi as directional evidence with uncertainty."
+  },
+  {
+    "phase": "verdict",
+    "case_id": "barnaby",
+    "agent": "Marcus Aurelius",
+    "model": "openai/gpt-oss-20b",
+    "summary": "Issued a narrow claimant finding with cited evidence IDs and an explicit uncertainty warning."
+  }
+]

modal_app.py CHANGED Viewed

@@ -1,193 +1,212 @@
-import os
-import time
-import modal
-from sovereign_bench.engine import stream_trial_jsonl
-from sovereign_bench.llm import (
-    ModelCall,
-    ModelResult,
-    build_role_messages,
-    messages_hash,
-)
-from sovereign_bench.models import TrialRequest
-app = modal.App("sovereign-bench")
-GPU_NAME = "H100"
-GPU_TIMEOUT_SECONDS = 20 * 60
-HF_CACHE_DIR = "/root/.cache/huggingface"
-image = (
-    modal.Image.debian_slim(python_version="3.12")
-    .pip_install("fastapi", "huggingface_hub", "httpx", "pydantic")
-    .add_local_dir("sovereign_bench", remote_path="/root/sovereign_bench")
-)
-model_cache = modal.Volume.from_name("sovereign-bench-model-cache", create_if_missing=True)
-vllm_image = (
-    modal.Image.from_registry("nvidia/cuda:12.8.1-devel-ubuntu22.04", add_python="3.12")
-    .entrypoint([])
-    .uv_pip_install(
-        "vllm==0.18.1",
-        "huggingface_hub[hf_transfer]==0.36.0",
-        "transformers",
-        "httpx",
-        "pydantic",
-    )
-    .env(
-        {
-            "HF_HUB_ENABLE_HF_TRANSFER": "1",
-            "HF_HOME": HF_CACHE_DIR,
-            "VLLM_WORKER_MULTIPROC_METHOD": "spawn",
-            "VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8": "1",
-        }
-    )
-    .add_local_dir("sovereign_bench", remote_path="/root/sovereign_bench")
-)
-@app.cls(
-    image=vllm_image,
-    gpu=GPU_NAME,
-    secrets=[modal.Secret.from_name("huggingface")],
-    volumes={HF_CACHE_DIR: model_cache},
-    timeout=GPU_TIMEOUT_SECONDS,
-    scaledown_window=10 * 60,
-    max_containers=3,
-)
-class VllmModel:
-    model_id: str = modal.parameter()
-    @modal.enter()
-    def load(self) -> None:
-        from vllm import LLM, SamplingParams
-        self.SamplingParams = SamplingParams
-        self.llm = LLM(
-            model=self.model_id,
-            trust_remote_code=True,
-            max_model_len=4096,
-            gpu_memory_utilization=0.9,
-        )
-    @modal.method()
-    def generate(self, payload: dict) -> dict:
-        from sovereign_bench.llm import ModelCallError, clean_model_text
-        started = time.perf_counter()
-        messages = payload["messages"]
-        max_tokens = int(payload.get("max_tokens") or 120)
-        temperature = float(payload.get("temperature") or 0.45)
-        sampling_params = self.SamplingParams(
-            max_tokens=max_tokens,
-            temperature=temperature,
-            top_p=0.9,
-        )
-        retry_messages = messages + [
-            {
-                "role": "user",
-                "content": (
-                    "Your previous response did not include visible courtroom dialogue. "
-                    "Return only the final spoken dialogue now. Do not include <think>, analysis, reasoning, markdown, or notes. /no_think"
-                ),
-            }
-        ]
-        last_error: Exception | None = None
-        text = ""
-        for attempt_messages in (messages, retry_messages):
-            outputs = self.llm.chat(
-                [attempt_messages],
-                sampling_params=sampling_params,
-                use_tqdm=False,
-                chat_template_kwargs={"enable_thinking": False},
-            )
-            raw_text = outputs[0].outputs[0].text.strip()
-            try:
-                text = clean_model_text(raw_text)
-                break
-            except ModelCallError as exc:
-                last_error = exc
-        if not text and last_error:
-            raise last_error
-        return {
-            "text": text,
-            "latency_ms": int((time.perf_counter() - started) * 1000),
-        }
-def modal_gpu_enabled() -> bool:
-    return os.getenv("SOVEREIGN_DISABLE_MODAL_GPU", "").lower() not in {"1", "true", "yes"}
-def modal_gpu_runner(**kwargs) -> ModelResult:
-    messages = build_role_messages(
-        agent=kwargs["agent"],
-        role=kwargs["role"],
-        case_summary=kwargs["case_summary"],
-        task=kwargs["task"],
-        evidence_summary=kwargs["evidence_summary"],
-    )
-    requested_model = kwargs["model"]
-    prompt_hash = messages_hash(messages)
-    if modal_gpu_enabled():
-        output = VllmModel(model_id=requested_model).generate.remote(
-            {
-                "messages": messages,
-                "max_tokens": kwargs.get("max_tokens", 120),
-                "temperature": 0.45,
-            }
-        )
-        return ModelResult(
-            text=output["text"],
-            input_text="\n\n".join(f"{item.get('role', 'user').upper()}:\n{item.get('content', '')}" for item in messages)
-            + "\n\nASSISTANT:\n",
-            call=ModelCall(
-                model=requested_model,
-                provider="modal-gpu-vllm",
-                ok=True,
-                latency_ms=output["latency_ms"],
-                prompt_hash=prompt_hash,
-                requested_model=requested_model,
-                runtime="modal-gpu-vllm",
-                gpu=GPU_NAME,
-            ),
-        )
-    raise RuntimeError("Modal GPU is disabled; no provider fallback is allowed.")
-@app.function(image=image, secrets=[modal.Secret.from_name("huggingface")])
-def check_huggingface_connection() -> str:
-    token = os.getenv("HF_TOKEN")
-    if not token:
-        return "HF_TOKEN is not available inside Modal."
-    from huggingface_hub import HfApi
-    user = HfApi(token=token).whoami()["name"]
-    return f"Connected to Hugging Face as {user}."
-@app.function(
-    image=image,
-    secrets=[modal.Secret.from_name("huggingface")],
-    min_containers=1,
-    timeout=GPU_TIMEOUT_SECONDS,
-)
-@modal.fastapi_endpoint(method="POST", label="trial-stream")
-def trial_stream(payload: dict):
-    from fastapi.responses import StreamingResponse
-    request = TrialRequest.model_validate(payload)
-    delay = {"swift": 0.02, "measured": 0.12, "ceremonial": 0.25}[request.speed]
-    return StreamingResponse(
-        stream_trial_jsonl(request, delay=delay, model_runner=modal_gpu_runner),
-        media_type="application/x-ndjson",
-    )
-@app.local_entrypoint()
-def main():
-    print(check_huggingface_connection.remote())

+import os
+import time
+import modal
+from sovereign_bench.engine import MODEL_BUDGET, stream_trial_jsonl
+from sovereign_bench.llm import (
+    ModelCall,
+    ModelResult,
+    build_role_messages,
+    messages_hash,
+)
+from sovereign_bench.models import TrialRequest
+MODAL_APP_NAME = "sovereign-bench"
+app = modal.App(MODAL_APP_NAME)
+GPU_NAME = "H100"
+GPU_TIMEOUT_SECONDS = 20 * 60
+HF_CACHE_DIR = "/root/.cache/huggingface"
+USED_MODEL_IDS = tuple(dict.fromkeys(model for _, model, _ in MODEL_BUDGET))
+image = (
+    modal.Image.debian_slim(python_version="3.12")
+    .pip_install("fastapi", "huggingface_hub", "httpx", "pydantic")
+    .add_local_dir("sovereign_bench", remote_path="/root/sovereign_bench")
+)
+model_cache = modal.Volume.from_name("sovereign-bench-model-cache", create_if_missing=True)
+vllm_image = (
+    modal.Image.from_registry("nvidia/cuda:12.8.1-devel-ubuntu22.04", add_python="3.12")
+    .entrypoint([])
+    .uv_pip_install(
+        "vllm==0.18.1",
+        "huggingface_hub[hf_transfer]==0.36.0",
+        "transformers",
+        "httpx",
+        "pydantic",
+    )
+    .env(
+        {
+            "HF_HUB_ENABLE_HF_TRANSFER": "1",
+            "HF_HOME": HF_CACHE_DIR,
+            "VLLM_WORKER_MULTIPROC_METHOD": "spawn",
+            "VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8": "1",
+        }
+    )
+    .add_local_dir("sovereign_bench", remote_path="/root/sovereign_bench")
+)
+@app.cls(
+    image=vllm_image,
+    gpu=GPU_NAME,
+    secrets=[modal.Secret.from_name("huggingface")],
+    volumes={HF_CACHE_DIR: model_cache},
+    timeout=GPU_TIMEOUT_SECONDS,
+    scaledown_window=10 * 60,
+    max_containers=3,
+)
+class VllmModel:
+    model_id: str = modal.parameter()
+    @modal.enter()
+    def load(self) -> None:
+        from vllm import LLM, SamplingParams
+        self.SamplingParams = SamplingParams
+        self.llm = LLM(
+            model=self.model_id,
+            trust_remote_code=True,
+            max_model_len=4096,
+            gpu_memory_utilization=0.9,
+        )
+    @modal.method()
+    def generate(self, payload: dict) -> dict:
+        from sovereign_bench.llm import ModelCallError, clean_model_text
+        started = time.perf_counter()
+        messages = payload["messages"]
+        max_tokens = int(payload.get("max_tokens") or 120)
+        temperature = float(payload.get("temperature") or 0.45)
+        sampling_params = self.SamplingParams(
+            max_tokens=max_tokens,
+            temperature=temperature,
+            top_p=0.9,
+        )
+        retry_messages = messages + [
+            {
+                "role": "user",
+                "content": (
+                    "Your previous response did not include visible courtroom dialogue. "
+                    "Return only the final answer now. Do not mention prompts, tasks, requirements, or that you are following instructions. "
+                    "Do not include <think>, analysis, reasoning, markdown, narration, or notes. /no_think"
+                ),
+            }
+        ]
+        last_error: Exception | None = None
+        text = ""
+        for attempt_messages in (messages, retry_messages):
+            outputs = self.llm.chat(
+                [attempt_messages],
+                sampling_params=sampling_params,
+                use_tqdm=False,
+                chat_template_kwargs={"enable_thinking": False},
+            )
+            raw_text = outputs[0].outputs[0].text.strip()
+            try:
+                text = clean_model_text(raw_text)
+                break
+            except ModelCallError as exc:
+                last_error = exc
+        if not text and last_error:
+            raise last_error
+        return {
+            "text": text,
+            "latency_ms": int((time.perf_counter() - started) * 1000),
+        }
+    @modal.method()
+    def warm(self) -> dict:
+        return {"model": self.model_id, "status": "warm"}
+def modal_gpu_enabled() -> bool:
+    return os.getenv("SOVEREIGN_DISABLE_MODAL_GPU", "").lower() not in {"1", "true", "yes"}
+def modal_gpu_runner(**kwargs) -> ModelResult:
+    messages = build_role_messages(
+        agent=kwargs["agent"],
+        role=kwargs["role"],
+        case_summary=kwargs["case_summary"],
+        task=kwargs["task"],
+        evidence_summary=kwargs["evidence_summary"],
+        trial_history=kwargs.get("trial_history", ""),
+        persona=kwargs.get("persona", ""),
+        objective=kwargs.get("objective", ""),
+    )
+    requested_model = kwargs["model"]
+    prompt_hash = messages_hash(messages)
+    if modal_gpu_enabled():
+        output = VllmModel(model_id=requested_model).generate.remote(
+            {
+                "messages": messages,
+                "max_tokens": kwargs.get("max_tokens", 120),
+                "temperature": 0.45,
+            }
+        )
+        return ModelResult(
+            text=output["text"],
+            input_text="\n\n".join(f"{item.get('role', 'user').upper()}:\n{item.get('content', '')}" for item in messages)
+            + "\n\nASSISTANT:\n",
+            call=ModelCall(
+                model=requested_model,
+                provider="modal-gpu-vllm",
+                ok=True,
+                latency_ms=output["latency_ms"],
+                prompt_hash=prompt_hash,
+                requested_model=requested_model,
+                runtime="modal-gpu-vllm",
+                gpu=GPU_NAME,
+            ),
+        )
+    raise RuntimeError("Modal GPU is disabled; no provider fallback is allowed.")
+@app.function(image=image, secrets=[modal.Secret.from_name("huggingface")])
+def check_huggingface_connection() -> str:
+    token = os.getenv("HF_TOKEN")
+    if not token:
+        return "HF_TOKEN is not available inside Modal."
+    from huggingface_hub import HfApi
+    user = HfApi(token=token).whoami()["name"]
+    return f"Connected to Hugging Face as {user}."
+@app.function(
+    image=image,
+    secrets=[modal.Secret.from_name("huggingface")],
+    min_containers=1,
+    timeout=GPU_TIMEOUT_SECONDS,
+)
+@modal.fastapi_endpoint(method="POST", label="trial-stream")
+def trial_stream(payload: dict):
+    from fastapi.responses import StreamingResponse
+    request = TrialRequest.model_validate(payload)
+    delay = {"swift": 0.02, "measured": 0.12, "ceremonial": 0.25}[request.speed]
+    return StreamingResponse(
+        stream_trial_jsonl(request, delay=delay, model_runner=modal_gpu_runner),
+        media_type="application/x-ndjson",
+    )
+@app.local_entrypoint()
+def main():
+    print(check_huggingface_connection.remote())
+@app.local_entrypoint()
+def warm_models():
+    deployed_model = modal.Cls.from_name(MODAL_APP_NAME, "VllmModel")
+    for model_id in USED_MODEL_IDS:
+        model = deployed_model(model_id=model_id)
+        model.update_autoscaler(min_containers=1)
+        print(model.warm.remote())

requirements.txt CHANGED Viewed

@@ -1,7 +1,7 @@
-gradio
-huggingface_hub
-httpx
-modal
-pydantic
-pytest
-python-dotenv

+gradio
+huggingface_hub
+httpx
+modal
+pydantic
+pytest
+python-dotenv

sovereign_bench/__init__.py CHANGED Viewed

@@ -1,6 +1,6 @@
-"""Sovereign Bench trial engine package."""
-from .engine import run_trial, stream_trial
-from .models import TrialRequest
-__all__ = ["TrialRequest", "run_trial", "stream_trial"]

+"""Sovereign Bench trial engine package."""
+from .engine import run_trial, stream_trial
+from .models import TrialRequest
+__all__ = ["TrialRequest", "run_trial", "stream_trial"]

sovereign_bench/cases.py CHANGED Viewed

@@ -1,141 +1,274 @@
-from __future__ import annotations
-from .models import CasePacket, EvidenceItem
-SOCRATES = CasePacket(
-    id="socrates",
-    title="The Polis v. Socrates",
-    subtitle="A miniature retrial of impiety, civic anxiety, and troublesome questions.",
-    claimant="The Athenian Polis",
-    respondent="Socrates",
-    charge="Corrupting the youth and refusing the sanctioned gods of the city.",
-    setting="Athens, 399 BCE, reassembled inside a pocket tribunal.",
-    claimant_claim=(
-        "The city argues that Socrates trained young citizens to mock public authority "
-        "and placed private daimonion guidance above civic religion."
-    ),
-    respondent_claim=(
-        "Socrates answers that cross-examination was a public service, not corruption, "
-        "and that unpopular inquiry should not be confused with civic sabotage."
-    ),
-    source_note=(
-        "Cached public-domain style packet derived from Plato's Apology and Crito, "
-        "Xenophon's Apology, and common historical summaries. It is not a live scholarly edition."
-    ),
-    evidence=[
-        EvidenceItem(
-            id="SOC-E1",
-            title="The Oracle Burden",
-            source="Plato, Apology tradition",
-            excerpt=(
-                "Socrates describes testing reputedly wise citizens after a Delphic oracle "
-                "report, creating public embarrassment but framing the act as duty."
-            ),
-            supports="mixed",
-            reliability=0.78,
-            note="Shows both civic irritation and a claimed religious motivation.",
-        ),
-        EvidenceItem(
-            id="SOC-E2",
-            title="Youthful Imitators",
-            source="Plato, Apology tradition",
-            excerpt=(
-                "Young men with leisure reportedly followed Socrates and copied his questioning, "
-                "which angered the questioned citizens."
-            ),
-            supports="claimant",
-            reliability=0.68,
-            note="Supports social effect, but does not prove intentional corruption.",
-        ),
-        EvidenceItem(
-            id="SOC-E3",
-            title="No Fee, No School",
-            source="Ancient defense tradition",
-            excerpt=(
-                "Socrates distinguishes himself from paid teachers and denies promising technical "
-                "instruction or private doctrine."
-            ),
-            supports="respondent",
-            reliability=0.72,
-            note="Weakens the claim that he operated a formal corrupting academy.",
-        ),
-        EvidenceItem(
-            id="SOC-E4",
-            title="The Daimonion",
-            source="Ancient biographical tradition",
-            excerpt=(
-                "Socrates reports a private divine sign that restrains him from certain actions, "
-                "which the court may read as piety or heterodoxy."
-            ),
-            supports="mixed",
-            reliability=0.64,
-            note="Central ambiguity: private religious experience versus civic irreverence.",
-        ),
-    ],
-)
-BARNABY = CasePacket(
-    id="barnaby",
-    title="The People v. Barnaby Buttons",
-    subtitle="The last office mooncake, a tampered snack ledger, and crumbs shaped like intent.",
-    claimant="The Breakroom Commonwealth",
-    respondent="Barnaby Buttons",
-    charge="Theft of the final mooncake and alteration of the communal snack ledger.",
-    setting="A fluorescent office kitchen at 4:47 p.m., under the humming republic of the fridge.",
-    claimant_claim=(
-        "Barnaby removed the final mooncake, changed the snack ledger from '1 mooncake' "
-        "to '0 mooncakes', and left the team dessertless."
-    ),
-    respondent_claim=(
-        "Barnaby says the mooncake was already abandoned, the ledger pen skipped naturally, "
-        "and the crumbs came from an unrelated biscuit."
-    ),
-    source_note="Cached original whimsical packet made for reliable hackathon demos.",
-    evidence=[
-        EvidenceItem(
-            id="BTN-E1",
-            title="Ledger Ink Discontinuity",
-            source="Clerk's magnifying loupe",
-            excerpt="The zero in '0 mooncakes' uses a darker ink than the previous entries.",
-            supports="claimant",
-            reliability=0.82,
-            note="Strong tampering indicator, though pen swaps happen in offices.",
-        ),
-        EvidenceItem(
-            id="BTN-E2",
-            title="Crumb Constellation",
-            source="Breakroom floor survey",
-            excerpt="Sesame crumbs form a trail from the pantry shelf to Barnaby's keyboard.",
-            supports="claimant",
-            reliability=0.71,
-            note="Suggestive route evidence, vulnerable to shared-desk contamination.",
-        ),
-        EvidenceItem(
-            id="BTN-E3",
-            title="Calendar Entry",
-            source="Respondent's calendar",
-            excerpt="Barnaby had a 4:45 p.m. reminder titled 'Do not forget tea with lunar pastry'.",
-            supports="mixed",
-            reliability=0.76,
-            note="Shows desire and opportunity, but not necessarily theft.",
-        ),
-        EvidenceItem(
-            id="BTN-E4",
-            title="Biscuit Alibi",
-            source="Vending machine receipt",
-            excerpt="A receipt shows Barnaby bought a sesame biscuit at 4:39 p.m.",
-            supports="respondent",
-            reliability=0.67,
-            note="Explains crumbs but not ledger alteration.",
-        ),
-    ],
-)
-CASES = {case.id: case for case in (SOCRATES, BARNABY)}
-def get_case(case_id: str) -> CasePacket:
-    return CASES.get(case_id, SOCRATES)

+from __future__ import annotations
+from .models import CasePacket, EvidenceItem
+SOCRATES = CasePacket(
+    id="socrates",
+    title="The Polis v. Socrates",
+    subtitle="A miniature retrial of impiety, civic anxiety, and troublesome questions.",
+    claimant="The Athenian Polis",
+    respondent="Socrates",
+    charge="Corrupting the youth and refusing the sanctioned gods of the city.",
+    setting="Athens, 399 BCE, reassembled inside a pocket tribunal.",
+    context=(
+        "Athens has brought Socrates back before a civic court after years of public questioning, "
+        "youthful imitators, and anxiety about private religious claims. The city says his method "
+        "weakened civic order; Socrates says he served the public by exposing false wisdom."
+    ),
+    claimant_claim=(
+        "The city argues that Socrates trained young citizens to mock public authority "
+        "and placed private daimonion guidance above civic religion."
+    ),
+    respondent_claim=(
+        "Socrates answers that cross-examination was a public service, not corruption, "
+        "and that unpopular inquiry should not be confused with civic sabotage."
+    ),
+    source_note=(
+        "Cached public-domain style packet derived from Plato's Apology and Crito, "
+        "Xenophon's Apology, and common historical summaries. It is not a live scholarly edition."
+    ),
+    evidence=[
+        EvidenceItem(
+            id="SOC-F1",
+            title="Youthful Imitators",
+            source="Plato, Apology tradition",
+            excerpt=(
+                "Young men with leisure reportedly followed Socrates and copied his questioning, "
+                "which angered the questioned citizens."
+            ),
+            supports="claimant",
+            reliability=0.68,
+            note="Supports social effect, but does not prove intentional corruption.",
+        ),
+        EvidenceItem(
+            id="SOC-F2",
+            title="Public Embarrassment",
+            source="Ancient defense tradition",
+            excerpt=(
+                "Socrates describes testing reputedly wise citizens in public after hearing the "
+                "Delphic oracle report."
+            ),
+            supports="claimant",
+            reliability=0.74,
+            note="Shows a repeated practice that made civic leaders look foolish.",
+        ),
+        EvidenceItem(
+            id="SOC-F3",
+            title="The Daimonion Suspicion",
+            source="Ancient biographical tradition",
+            excerpt=(
+                "Socrates reports a private divine sign that restrains him from certain actions, "
+                "which civic accusers read as religious irregularity."
+            ),
+            supports="claimant",
+            reliability=0.64,
+            note="Supports the impiety theory if private revelation is treated as civic defiance.",
+        ),
+        EvidenceItem(
+            id="SOC-A1",
+            title="No Fee, No School",
+            source="Ancient defense tradition",
+            excerpt=(
+                "Socrates distinguishes himself from paid teachers and denies promising technical "
+                "instruction or private doctrine."
+            ),
+            supports="respondent",
+            reliability=0.72,
+            note="Weakens the claim that he operated a formal corrupting academy.",
+        ),
+        EvidenceItem(
+            id="SOC-A2",
+            title="Oracle as Duty",
+            source="Plato, Apology tradition",
+            excerpt=(
+                "Socrates frames his questioning as obedience to a divine puzzle rather than "
+                "contempt for religion."
+            ),
+            supports="respondent",
+            reliability=0.78,
+            note="Turns the impiety charge into a competing account of piety.",
+        ),
+        EvidenceItem(
+            id="SOC-A3",
+            title="Cross-Examination as Service",
+            source="Defense summary",
+            excerpt=(
+                "The defense treats uncomfortable questioning as civic improvement, not sabotage "
+                "or intentional corruption."
+            ),
+            supports="respondent",
+            reliability=0.7,
+            note="Gives the jury a public-interest reason to tolerate Socrates.",
+        ),
+    ],
+)
+GREG = CasePacket(
+    id="greg",
+    title="Greg Heffley v. Mom",
+    subtitle="A family-court argument over a diary, embarrassment, and parental good intentions.",
+    claimant="Greg Heffley",
+    respondent="Susan Heffley",
+    charge="Whether Mom wrongfully saddled Greg with an embarrassing diary instead of a normal journal.",
+    setting="The Heffley house on the eve of another middle-school year.",
+    context=(
+        "Greg receives a book from his mom meant to help him record his thoughts, but he objects "
+        "that the word diary makes him look childish and vulnerable at school. Mom treats the book "
+        "as a harmless tool for reflection; Greg treats it as social evidence waiting to be used "
+        "against him."
+    ),
+    claimant_claim=(
+        "Greg argues that Mom ignored the obvious social risk of handing a middle-school boy a diary "
+        "and failed to respect how easily classmates can turn an object into humiliation."
+    ),
+    respondent_claim=(
+        "Mom answers that the writing book is a constructive outlet, that Greg can choose how to use it, "
+        "and that parental encouragement is not social sabotage."
+    ),
+    source_note=(
+        "Cached demo packet using paraphrased context from the Diary of a Wimpy Kid setup. "
+        "No book text is quoted."
+    ),
+    evidence=[
+        EvidenceItem(
+            id="GRG-F1",
+            title="The Label Problem",
+            source="Greg's objection",
+            excerpt=(
+                "Greg objects that diary is the wrong label for a middle-school boy and could be "
+                "used to mock him."
+            ),
+            supports="claimant",
+            reliability=0.74,
+            note="Shows a foreseeable embarrassment risk from Greg's perspective.",
+        ),
+        EvidenceItem(
+            id="GRG-F2",
+            title="Middle-School Audience",
+            source="School context",
+            excerpt=(
+                "Greg's social world rewards status and punishes anything classmates can frame "
+                "as childish."
+            ),
+            supports="claimant",
+            reliability=0.7,
+            note="Makes the harm plausible even before anyone finds the book.",
+        ),
+        EvidenceItem(
+            id="GRG-F3",
+            title="Ignored Preference",
+            source="Family exchange summary",
+            excerpt=(
+                "Greg wanted distance from the diary framing, but Mom treated the gift as settled."
+            ),
+            supports="claimant",
+            reliability=0.66,
+            note="Supports Greg's autonomy argument, though parents often choose school supplies.",
+        ),
+        EvidenceItem(
+            id="GRG-A1",
+            title="Private Writing Tool",
+            source="Mom's purpose",
+            excerpt=(
+                "Mom intended the book as a private place for Greg to record his thoughts and school year."
+            ),
+            supports="respondent",
+            reliability=0.78,
+            note="Shows a constructive parental purpose rather than intent to embarrass.",
+        ),
+        EvidenceItem(
+            id="GRG-A2",
+            title="Greg Controls Disclosure",
+            source="Household facts",
+            excerpt=(
+                "The book is not inherently public; Greg can keep it private and decide what to write."
+            ),
+            supports="respondent",
+            reliability=0.68,
+            note="Weakens the claim that the gift itself creates inevitable harm.",
+        ),
+        EvidenceItem(
+            id="GRG-A3",
+            title="Reflection Has Value",
+            source="Parenting rationale",
+            excerpt=(
+                "A journal can help a student process school, family, and growing-up pressures."
+            ),
+            supports="respondent",
+            reliability=0.71,
+            note="Gives Mom a reasonable-benefit argument even if the branding is awkward.",
+        ),
+    ],
+)
+BARNABY = CasePacket(
+    id="barnaby",
+    title="The People v. Barnaby Buttons",
+    subtitle="The last office mooncake, a tampered snack ledger, and crumbs shaped like intent.",
+    claimant="The Breakroom Commonwealth",
+    respondent="Barnaby Buttons",
+    charge="Theft of the final mooncake and alteration of the communal snack ledger.",
+    setting="A fluorescent office kitchen at 4:47 p.m., under the humming republic of the fridge.",
+    context=(
+        "An office breakroom has lost its final mooncake after a suspicious ledger update and "
+        "a trail of crumbs. The commonwealth blames Barnaby Buttons; Barnaby says the evidence "
+        "is ordinary office mess and coincidence."
+    ),
+    claimant_claim=(
+        "Barnaby removed the final mooncake, changed the snack ledger from '1 mooncake' "
+        "to '0 mooncakes', and left the team dessertless."
+    ),
+    respondent_claim=(
+        "Barnaby says the mooncake was already abandoned, the ledger pen skipped naturally, "
+        "and the crumbs came from an unrelated biscuit."
+    ),
+    source_note="Cached original whimsical packet kept for compatibility with older tests.",
+    evidence=[
+        EvidenceItem(
+            id="BTN-E1",
+            title="Ledger Ink Discontinuity",
+            source="Clerk's magnifying loupe",
+            excerpt="The zero in '0 mooncakes' uses a darker ink than the previous entries.",
+            supports="claimant",
+            reliability=0.82,
+            note="Strong tampering indicator, though pen swaps happen in offices.",
+        ),
+        EvidenceItem(
+            id="BTN-E2",
+            title="Crumb Constellation",
+            source="Breakroom floor survey",
+            excerpt="Sesame crumbs form a trail from the pantry shelf to Barnaby's keyboard.",
+            supports="claimant",
+            reliability=0.71,
+            note="Suggestive route evidence, vulnerable to shared-desk contamination.",
+        ),
+        EvidenceItem(
+            id="BTN-E3",
+            title="Calendar Entry",
+            source="Respondent's calendar",
+            excerpt="Barnaby had a 4:45 p.m. reminder titled 'Do not forget tea with lunar pastry'.",
+            supports="mixed",
+            reliability=0.76,
+            note="Shows desire and opportunity, but not necessarily theft.",
+        ),
+        EvidenceItem(
+            id="BTN-E4",
+            title="Biscuit Alibi",
+            source="Vending machine receipt",
+            excerpt="A receipt shows Barnaby bought a sesame biscuit at 4:39 p.m.",
+            supports="respondent",
+            reliability=0.67,
+            note="Explains crumbs but not ledger alteration.",
+        ),
+    ],
+)
+CASES = {case.id: case for case in (SOCRATES, GREG, BARNABY)}
+def get_case(case_id: str) -> CasePacket:
+    return CASES.get(case_id, SOCRATES)

sovereign_bench/engine.py CHANGED Viewed

@@ -9,7 +9,7 @@ from collections.abc import Callable, Iterable
 from pydantic import ValidationError
 from .cases import get_case
-from .llm import ModelCall, ModelResult, call_small_model
 from .models import AgentTurn, CasePacket, JurorVote, TrialEvent, TrialRequest, Verdict
 from .retrieval import build_live_case
@@ -23,22 +23,22 @@ NEMOTRON_PROVIDER = "featherless-ai"
 MODEL_BUDGET = [
     ("Presiding Advocate", GPT_OSS_MODEL, 20.0),
     ("Clerk of Style", OPENBMB_MODEL, 4.0),
-    ("Juror/Auditor Ring", NEMOTRON_MODEL, 8.0),
 ]
-TOTAL_PARAMS_B = sum(item[2] for item in MODEL_BUDGET)
-JUDGE_NAME = "Marcus Aurelius"
-JUDGE_PERSONA = "Stoic duty, restraint, public reason, and disciplined judgment"
-JUROR_PERSONAS = {
-    "Karl Marx": "class power, material conditions, exploitation, institutional incentives",
-    "John Stuart Mill": "liberty, harm principle, utility, individual rights",
-    "Confucius": "social harmony, role duty, ritual order, moral cultivation",
-    "Cleopatra VII": "sovereign pragmatism, diplomacy, survival, legitimacy under pressure",
-    "Niccolo Machiavelli": "political realism, stability, power, consequences over ideals",
-    "Jensen Huang": "technological optimism, operator mindset, systems thinking, innovation tradeoffs",
-}
-JUROR_NAMES = list(JUROR_PERSONAS)
 class RequiredModelError(RuntimeError):
@@ -60,8 +60,10 @@ def _turn(agent: str, role: str, result: ModelResult, model: str, confidence: fl
 def _case_summary(packet: CasePacket) -> str:
     return (
         f"{packet.title}. Charge: {packet.charge}\n"
         f"Claimant: {packet.claimant_claim}\n"
         f"Respondent: {packet.respondent_claim}"
     )
@@ -79,12 +81,16 @@ def _call_trace(calls: list[ModelCall]) -> list[dict]:
 def resolve_case(request: TrialRequest) -> tuple[CasePacket, dict]:
     if request.case_id == "live":
-        packet = build_live_case(request.search_query, request.hypothetical)
-        if packet:
-            return packet, {"mode": "live"}
-        raise RuntimeError("Live retrieval produced too little usable evidence; no fallback case will be substituted.")
-    return get_case(request.case_id), {"mode": "cached"}
 def _generate_role(model_runner: ModelRunner | None = None, **kwargs) -> ModelResult:
@@ -93,15 +99,19 @@ def _generate_role(model_runner: ModelRunner | None = None, **kwargs) -> ModelRe
     return call_small_model(**kwargs)
-def _required_role(model_runner: ModelRunner | None, model_calls: list[ModelCall], **kwargs) -> ModelResult:
-    try:
-        result = _generate_role(model_runner, **kwargs)
-    except Exception as exc:
-        raise RequiredModelError(f"{kwargs.get('agent', 'Model')} unavailable: {exc}") from exc
-    model_calls.append(result.call)
     if not result.call.ok:
         error = result.call.error or "model call did not complete"
         raise RequiredModelError(f"{kwargs.get('agent', 'Model')} unavailable: {error}")
     if not result.text.strip():
         raise RequiredModelError(f"{kwargs.get('agent', 'Model')} returned an empty response.")
     return result
@@ -132,6 +142,43 @@ def _emit(
     return event
 def _extract_json(text: str) -> object:
     stripped = text.strip()
     if stripped.startswith("```"):
@@ -146,38 +193,34 @@ def _extract_json(text: str) -> object:
         return json.loads(match.group(1))
-def _parse_jury_votes(result: ModelResult, packet: CasePacket) -> list[JurorVote]:
     try:
         data = _extract_json(result.text)
     except json.JSONDecodeError as exc:
-        raise RequiredModelError(f"Nemotron Jury returned invalid JSON: {exc.msg}") from exc
-    raw_votes = data.get("votes") if isinstance(data, dict) else data
-    if not isinstance(raw_votes, list):
-        raise RequiredModelError("Nemotron Jury output must contain a votes list.")
-    if len(raw_votes) != len(JUROR_NAMES):
-        raise RequiredModelError("Nemotron Jury must return exactly six juror votes.")
-    known_evidence = {item.id for item in packet.evidence}
-    votes: list[JurorVote] = []
     try:
-        for item in raw_votes:
-            vote = JurorVote.model_validate(item)
-            votes.append(vote)
     except ValidationError as exc:
-        raise RequiredModelError(f"Nemotron Jury vote schema is invalid: {exc.errors()[0]['msg']}") from exc
-    if [vote.juror for vote in votes] != JUROR_NAMES:
-        raise RequiredModelError("Nemotron Jury must return votes in the fixed juror order.")
-    for vote in votes:
-        expected_persona = JUROR_PERSONAS[vote.juror]
-        if vote.persona.strip().lower() != expected_persona:
-            raise RequiredModelError(f"{vote.juror} persona must be '{expected_persona}'.")
-        if not vote.reason.strip():
-            raise RequiredModelError(f"{vote.juror} must include a rationale.")
-        if not vote.evidence_ids or any(evidence_id not in known_evidence for evidence_id in vote.evidence_ids):
-            raise RequiredModelError(f"{vote.juror} must cite known evidence IDs.")
-    return votes
 def _majority_finding(votes: list[JurorVote]) -> str:
@@ -227,16 +270,13 @@ def _verdict_from_votes(votes: list[JurorVote]) -> Verdict:
     )
-def _jury_task() -> str:
-    personas = "\n".join(f"- {name}: {persona}" for name, persona in JUROR_PERSONAS.items())
-    return (
-        "Return JSON only with a top-level 'votes' array. Create exactly one vote for each juror, in this order: "
-        f"{', '.join(JUROR_NAMES)}. Valid vote values are liable, not_liable, uncertain. Each item must contain "
-        "juror, persona, vote, reason, and evidence_ids. The persona value must exactly match the profile below. "
-        "Each reason should be one concise sentence and each evidence_ids list must cite evidence IDs from the record. "
-        "Vote through the named public-history worldview, not a generic juror role.\n"
-        f"{personas}"
-    )
 def run_trial(request: TrialRequest, model_runner: ModelRunner | None = None) -> list[TrialEvent]:
@@ -252,6 +292,7 @@ def stream_trial(
     case_summary = _case_summary(packet)
     evidence_summary = _evidence_summary(packet)
     model_calls: list[ModelCall] = []
     hypo = request.hypothetical.strip()
     hypo_line = f"\n\nUser hypothetical admitted as a blue-ribbon sidebar: {hypo}" if hypo else ""
@@ -263,11 +304,12 @@ def stream_trial(
         model=OPENBMB_MODEL,
         case_summary=case_summary,
         evidence_summary=evidence_summary,
-        task="Announce the case by name, identify the parties, and read the charge.",
         provider=OPENBMB_PROVIDER,
         max_tokens=110,
     )
-    yield _emit(
         packet,
         source_trace,
         model_calls,
@@ -281,48 +323,55 @@ def stream_trial(
         delay,
     )
-    judge_open = _required_role(
-        model_runner,
-        model_calls,
-        agent=JUDGE_NAME,
-        role="judge",
-        model=GPT_OSS_MODEL,
-        case_summary=case_summary,
-        evidence_summary=evidence_summary,
-        task=(
-            f"As {JUDGE_NAME}, a Stoic courtroom judge guided by {JUDGE_PERSONA}, explain the proceeding "
-            "and the burden of proof in one or two disciplined sentences."
-        ),
-        provider=OPENAI_PROVIDER,
-        max_tokens=110,
-    )
-    yield _emit(
         packet,
         source_trace,
         model_calls,
         TrialEvent(
-            phase="intake",
-            title="The Burden Is Set",
-            body="The bench defines how the miniature court will weigh the record.",
-            turns=[_turn(JUDGE_NAME, "judge", judge_open, GPT_OSS_MODEL, 0.88)],
-            evidence=packet.evidence,
-        ),
-        delay,
     )
     claimant_opening = _required_role(
         model_runner,
         model_calls,
-        agent="Advocate Auric",
         role="claimant advocate",
         model=GPT_OSS_MODEL,
         case_summary=case_summary,
         evidence_summary=evidence_summary,
-        task="Make the claimant's opening statement alone. Cite the strongest claimant-side exhibit.",
         provider=OPENAI_PROVIDER,
         max_tokens=130,
     )
-    yield _emit(
         packet,
         source_trace,
         model_calls,
@@ -330,7 +379,7 @@ def stream_trial(
             phase="claims",
             title="Claimant Opening",
             body=packet.claimant_claim,
-            turns=[_turn("Advocate Auric", "claimant advocate", claimant_opening, GPT_OSS_MODEL, 0.88)],
             evidence=packet.evidence,
         ),
         delay,
@@ -339,16 +388,19 @@ def stream_trial(
     respondent_opening = _required_role(
         model_runner,
         model_calls,
-        agent="Counsel Sable",
         role="respondent advocate",
         model=GPT_OSS_MODEL,
         case_summary=case_summary,
         evidence_summary=evidence_summary,
-        task="Make the respondent's opening statement alone. Emphasize uncertainty and cite a helpful exhibit.",
         provider=OPENAI_PROVIDER,
         max_tokens=130,
     )
-    yield _emit(
         packet,
         source_trace,
         model_calls,
@@ -356,80 +408,76 @@ def stream_trial(
             phase="opening",
             title="Respondent Opening",
             body=packet.respondent_claim,
-            turns=[_turn("Counsel Sable", "respondent advocate", respondent_opening, GPT_OSS_MODEL, 0.88)],
             evidence=packet.evidence,
         ),
         delay,
     )
-    auditor = _required_role(
-        model_runner,
-        model_calls,
-        agent="Auditor Prism",
-        role="evidence auditor",
-        model=NEMOTRON_MODEL,
-        case_summary=case_summary,
-        evidence_summary=evidence_summary,
-        task="Present the evidence record. Identify the strongest exhibit and the weakest inference.",
-        provider=NEMOTRON_PROVIDER,
-        max_tokens=150,
-    )
-    yield _emit(
         packet,
         source_trace,
         model_calls,
         TrialEvent(
             phase="evidence",
-            title="The Record Is Audited",
             body="\n".join(f"{item.id}: {item.title} | reliability {item.reliability:.2f} | {item.note}" for item in packet.evidence),
-            turns=[_turn("Auditor Prism", "evidence auditor", auditor, NEMOTRON_MODEL, 0.86)],
             evidence=packet.evidence,
         ),
         delay,
     )
-    judge_question = _required_role(
-        model_runner,
-        model_calls,
-        agent=JUDGE_NAME,
-        role="judge",
-        model=GPT_OSS_MODEL,
-        case_summary=case_summary,
-        evidence_summary=evidence_summary,
-        task=(
-            f"As {JUDGE_NAME}, ask one sharp hinge question that would change the outcome if answered. "
-            "Use Stoic restraint and public reason."
-        ),
-        provider=OPENAI_PROVIDER,
-        max_tokens=100,
-    )
-    yield _emit(
         packet,
         source_trace,
         model_calls,
         TrialEvent(
-            phase="questions",
-            title="The Hinge Question",
-            body="The bench asks the single question that could turn the record.",
-            turns=[_turn(JUDGE_NAME, "judge", judge_question, GPT_OSS_MODEL, 0.88)],
-            evidence=packet.evidence,
-        ),
-        delay,
     )
     claimant_answer = _required_role(
         model_runner,
         model_calls,
-        agent="Advocate Auric",
         role="claimant advocate",
-        model=GPT_OSS_MODEL,
-        case_summary=case_summary,
-        evidence_summary=evidence_summary,
-        task=f"Answer {JUDGE_NAME}'s hinge question for the claimant: {judge_question.text}",
-        provider=OPENAI_PROVIDER,
-        max_tokens=130,
-    )
-    yield _emit(
         packet,
         source_trace,
         model_calls,
@@ -437,7 +485,7 @@ def stream_trial(
             phase="questions",
             title="Claimant Answers the Bench",
             body="The claimant answers the hinge question.",
-            turns=[_turn("Advocate Auric", "claimant advocate", claimant_answer, GPT_OSS_MODEL, 0.88)],
             evidence=packet.evidence,
         ),
         delay,
@@ -446,16 +494,19 @@ def stream_trial(
     respondent_answer = _required_role(
         model_runner,
         model_calls,
-        agent="Counsel Sable",
         role="respondent advocate",
-        model=GPT_OSS_MODEL,
-        case_summary=case_summary,
-        evidence_summary=evidence_summary,
-        task=f"Answer {JUDGE_NAME}'s hinge question for the respondent: {judge_question.text}",
-        provider=OPENAI_PROVIDER,
-        max_tokens=130,
-    )
-    yield _emit(
         packet,
         source_trace,
         model_calls,
@@ -463,7 +514,7 @@ def stream_trial(
             phase="questions",
             title="Respondent Answers the Bench",
             body="The respondent answers the hinge question.",
-            turns=[_turn("Counsel Sable", "respondent advocate", respondent_answer, GPT_OSS_MODEL, 0.88)],
             evidence=packet.evidence,
         ),
         delay,
@@ -477,11 +528,14 @@ def stream_trial(
         model=NEMOTRON_MODEL,
         case_summary=case_summary,
         evidence_summary=evidence_summary,
-        task="Announce that the six named jurors retire to vote. Do not reveal the votes yet.",
         provider=NEMOTRON_PROVIDER,
         max_tokens=100,
     )
-    yield _emit(
         packet,
         source_trace,
         model_calls,
@@ -495,26 +549,32 @@ def stream_trial(
         delay,
     )
-    jury_votes_result = _required_role(
-        model_runner,
-        model_calls,
-        agent="Nemotron Jury",
-        role="juror vote generator",
-        model=NEMOTRON_MODEL,
-        case_summary=case_summary,
-        evidence_summary=evidence_summary,
-        task=_jury_task(),
-        provider=NEMOTRON_PROVIDER,
-        max_tokens=650,
-    )
-    votes = _parse_jury_votes(jury_votes_result, packet)
-    for vote in votes:
         juror_result = ModelResult(
-            text=f"{vote.vote.replace('_', ' ').title()}. {vote.reason}",
-            call=jury_votes_result.call,
-            input_text=jury_votes_result.input_text,
         )
-        yield _emit(
             packet,
             source_trace,
             model_calls,
@@ -530,23 +590,27 @@ def stream_trial(
         )
     verdict = _verdict_from_votes(votes)
-    verdict_voice = _required_role(
-        model_runner,
-        model_calls,
-        agent=JUDGE_NAME,
-        role="verdict writer",
-        model=GPT_OSS_MODEL,
-        case_summary=case_summary,
-        evidence_summary=evidence_summary,
-        task=(
-            f"As {JUDGE_NAME}, announce the final legal finding after the jury vote with Stoic restraint. "
-            f"Finding: {verdict.finding}. "
-            f"Jury rationale: {verdict.rationale} Remedy: {verdict.remedy}. Include uncertainty without disclaiming the role."
-        ),
         provider=OPENAI_PROVIDER,
         max_tokens=160,
     )
-    yield _emit(
         packet,
         source_trace,
         model_calls,
@@ -554,13 +618,13 @@ def stream_trial(
             phase="verdict",
             title="The Court Announces Judgment",
             body=f"{verdict_voice.text}\n\n{verdict.rationale}\n\nRemedy: {verdict.remedy}",
-            verdict=verdict,
-            votes=votes,
-            evidence=packet.evidence,
-            turns=[_turn(JUDGE_NAME, "verdict writer", verdict_voice, GPT_OSS_MODEL, 0.88)],
-        ),
-        delay,
-    )
 def stream_trial_jsonl(

 from pydantic import ValidationError
 from .cases import get_case
+from .llm import ModelCall, ModelCallError, ModelResult, call_small_model, clean_model_text
 from .models import AgentTurn, CasePacket, JurorVote, TrialEvent, TrialRequest, Verdict
 from .retrieval import build_live_case
 MODEL_BUDGET = [
     ("Presiding Advocate", GPT_OSS_MODEL, 20.0),
     ("Clerk of Style", OPENBMB_MODEL, 4.0),
+    ("Jury Ring", NEMOTRON_MODEL, 8.0),
 ]
+TOTAL_PARAMS_B = sum(item[2] for item in MODEL_BUDGET)
+JUDGE_NAME = "Marcus Aurelius"
+JUDGE_PERSONA = "Stoic duty, restraint, public reason, and disciplined judgment"
+JUROR_PERSONAS = {
+    "Karl Marx": "class power, material conditions, exploitation, institutional incentives",
+    "John Stuart Mill": "liberty, harm principle, utility, individual rights",
+    "Confucius": "social harmony, role duty, ritual order, moral cultivation",
+    "Cleopatra VII": "sovereign pragmatism, diplomacy, survival, legitimacy under pressure",
+    "Niccolo Machiavelli": "political realism, stability, power, consequences over ideals",
+    "Jensen Huang": "technological optimism, operator mindset, systems thinking, innovation tradeoffs",
+}
+JUROR_NAMES = list(JUROR_PERSONAS)
 class RequiredModelError(RuntimeError):
 def _case_summary(packet: CasePacket) -> str:
+    context = packet.context or packet.setting
     return (
         f"{packet.title}. Charge: {packet.charge}\n"
+        f"Context: {context}\n"
         f"Claimant: {packet.claimant_claim}\n"
         f"Respondent: {packet.respondent_claim}"
     )
 def resolve_case(request: TrialRequest) -> tuple[CasePacket, dict]:
+    if request.case_id == "custom":
+        if request.custom_case is None:
+            raise RuntimeError("Custom case requires trial details and evidence before the court can begin.")
+        return request.custom_case, {"mode": "custom"}
     if request.case_id == "live":
+        packet = build_live_case(request.search_query, request.hypothetical)
+        if packet:
+            return packet, {"mode": "live"}
+        raise RuntimeError("Live retrieval produced too little usable evidence; no fallback case will be substituted.")
+    return get_case(request.case_id), {"mode": "cached"}
 def _generate_role(model_runner: ModelRunner | None = None, **kwargs) -> ModelResult:
     return call_small_model(**kwargs)
+def _required_role(model_runner: ModelRunner | None, model_calls: list[ModelCall], **kwargs) -> ModelResult:
+    try:
+        result = _generate_role(model_runner, **kwargs)
+    except Exception as exc:
+        raise RequiredModelError(f"{kwargs.get('agent', 'Model')} unavailable: {exc}") from exc
+    model_calls.append(result.call)
     if not result.call.ok:
         error = result.call.error or "model call did not complete"
         raise RequiredModelError(f"{kwargs.get('agent', 'Model')} unavailable: {error}")
+    try:
+        result.text = clean_model_text(result.text)
+    except ModelCallError as exc:
+        raise RequiredModelError(f"{kwargs.get('agent', 'Model')} returned non-dialogue output: {exc}") from exc
     if not result.text.strip():
         raise RequiredModelError(f"{kwargs.get('agent', 'Model')} returned an empty response.")
     return result
     return event
+def _record_and_emit(
+    events: list[TrialEvent],
+    packet: CasePacket,
+    source_trace: dict,
+    model_calls: list[ModelCall],
+    event: TrialEvent,
+    delay: float,
+) -> TrialEvent:
+    emitted = _emit(packet, source_trace, model_calls, event, delay)
+    events.append(emitted)
+    return emitted
+def _compact(value: str, limit: int = 420) -> str:
+    text = " ".join(value.split())
+    return text if len(text) <= limit else text[: limit - 3].rstrip() + "..."
+def _trial_history(events: list[TrialEvent]) -> str:
+    if not events:
+        return "No trial statements have been made yet."
+    lines = []
+    for index, event in enumerate(events, start=1):
+        if event.turns:
+            turn = event.turns[0]
+            lines.append(
+                f"{index}. {event.phase} / {event.title} - {turn.agent} ({turn.role}): {_compact(turn.content)}"
+            )
+        elif event.body:
+            lines.append(f"{index}. {event.phase} / {event.title}: {_compact(event.body)}")
+        for vote in event.votes:
+            lines.append(
+                f"   Vote - {vote.juror}: {vote.vote}; reason: {_compact(vote.reason, 220)}; evidence: {', '.join(vote.evidence_ids)}"
+            )
+    return "\n".join(lines)
 def _extract_json(text: str) -> object:
     stripped = text.strip()
     if stripped.startswith("```"):
         return json.loads(match.group(1))
+def _parse_juror_vote(result: ModelResult, packet: CasePacket, juror: str) -> JurorVote:
     try:
         data = _extract_json(result.text)
     except json.JSONDecodeError as exc:
+        raise RequiredModelError(f"{juror} returned invalid JSON: {exc.msg}") from exc
+    if isinstance(data, dict) and isinstance(data.get("votes"), list):
+        if len(data["votes"]) != 1:
+            raise RequiredModelError(f"{juror} must return exactly one vote.")
+        data = data["votes"][0]
+    if not isinstance(data, dict):
+        raise RequiredModelError(f"{juror} vote output must be a JSON object.")
     try:
+        vote = JurorVote.model_validate(data)
     except ValidationError as exc:
+        raise RequiredModelError(f"{juror} vote schema is invalid: {exc.errors()[0]['msg']}") from exc
+    known_evidence = {item.id for item in packet.evidence}
+    expected_persona = JUROR_PERSONAS[juror]
+    if vote.juror != juror:
+        raise RequiredModelError(f"{juror} vote must use juror '{juror}'.")
+    if vote.persona.strip().lower() != expected_persona:
+        raise RequiredModelError(f"{juror} persona must be '{expected_persona}'.")
+    if not vote.reason.strip():
+        raise RequiredModelError(f"{juror} must include a rationale.")
+    if not vote.evidence_ids or any(evidence_id not in known_evidence for evidence_id in vote.evidence_ids):
+        raise RequiredModelError(f"{juror} must cite known evidence IDs.")
+    return vote
 def _majority_finding(votes: list[JurorVote]) -> str:
     )
+def _juror_task(juror: str, persona: str) -> str:
+    return (
+        f"After watching the trial, vote as {juror}. Your worldview is: {persona}. "
+        "Return exactly one JSON object with keys juror, persona, vote, reason, and evidence_ids. "
+        "Valid vote values are liable, not_liable, uncertain. The persona value must exactly match your worldview. "
+        "The reason must be one concise sentence grounded in your beliefs and the record. Cite evidence IDs from the record."
+    )
 def run_trial(request: TrialRequest, model_runner: ModelRunner | None = None) -> list[TrialEvent]:
     case_summary = _case_summary(packet)
     evidence_summary = _evidence_summary(packet)
     model_calls: list[ModelCall] = []
+    events: list[TrialEvent] = []
     hypo = request.hypothetical.strip()
     hypo_line = f"\n\nUser hypothetical admitted as a blue-ribbon sidebar: {hypo}" if hypo else ""
         model=OPENBMB_MODEL,
         case_summary=case_summary,
         evidence_summary=evidence_summary,
+        task="Begin with 'I call'. Announce the case by name, identify the parties, and read the charge.",
         provider=OPENBMB_PROVIDER,
         max_tokens=110,
     )
+    yield _record_and_emit(
+        events,
         packet,
         source_trace,
         model_calls,
         delay,
     )
+    judge_open = _required_role(
+        model_runner,
+        model_calls,
+        agent=JUDGE_NAME,
+        role="judge",
+        model=GPT_OSS_MODEL,
+        case_summary=case_summary,
+        evidence_summary=evidence_summary,
+        trial_history=_trial_history(events),
+        persona=JUDGE_PERSONA,
+        objective="Set a fair standard for hearing both sides.",
+        task=(
+            f"As {JUDGE_NAME}, a Stoic courtroom judge guided by {JUDGE_PERSONA}, explain the proceeding "
+            "and the burden of proof in one or two disciplined sentences using I or we."
+        ),
+        provider=OPENAI_PROVIDER,
+        max_tokens=110,
+    )
+    yield _record_and_emit(
+        events,
         packet,
         source_trace,
         model_calls,
         TrialEvent(
+            phase="intake",
+            title="The Burden Is Set",
+            body="The bench defines how the miniature court will weigh the record.",
+            turns=[_turn(JUDGE_NAME, "judge", judge_open, GPT_OSS_MODEL, 0.88)],
+            evidence=packet.evidence,
+        ),
+        delay,
     )
     claimant_opening = _required_role(
         model_runner,
         model_calls,
+        agent="Mike OSS",
         role="claimant advocate",
         model=GPT_OSS_MODEL,
         case_summary=case_summary,
         evidence_summary=evidence_summary,
+        trial_history=_trial_history(events),
+        objective="Win the case for the claimant using the strongest fair reading of the record.",
+        task="Make the claimant's opening statement alone, speaking as I for the claimant. Cite the strongest claimant-side exhibit.",
         provider=OPENAI_PROVIDER,
         max_tokens=130,
     )
+    yield _record_and_emit(
+        events,
         packet,
         source_trace,
         model_calls,
             phase="claims",
             title="Claimant Opening",
             body=packet.claimant_claim,
+            turns=[_turn("Mike OSS", "claimant advocate", claimant_opening, GPT_OSS_MODEL, 0.88)],
             evidence=packet.evidence,
         ),
         delay,
     respondent_opening = _required_role(
         model_runner,
         model_calls,
+        agent="Harvey Vector",
         role="respondent advocate",
         model=GPT_OSS_MODEL,
         case_summary=case_summary,
         evidence_summary=evidence_summary,
+        trial_history=_trial_history(events),
+        objective="Win the case for the respondent using doubt, context, and the strongest fair reading of the record.",
+        task="Make the respondent's opening statement alone, speaking as I for the respondent. Emphasize uncertainty and cite a helpful exhibit.",
         provider=OPENAI_PROVIDER,
         max_tokens=130,
     )
+    yield _record_and_emit(
+        events,
         packet,
         source_trace,
         model_calls,
             phase="opening",
             title="Respondent Opening",
             body=packet.respondent_claim,
+            turns=[_turn("Harvey Vector", "respondent advocate", respondent_opening, GPT_OSS_MODEL, 0.88)],
             evidence=packet.evidence,
         ),
         delay,
     )
+    yield _record_and_emit(
+        events,
         packet,
         source_trace,
         model_calls,
         TrialEvent(
             phase="evidence",
+            title="The Evidence Record",
             body="\n".join(f"{item.id}: {item.title} | reliability {item.reliability:.2f} | {item.note}" for item in packet.evidence),
+            turns=[],
             evidence=packet.evidence,
         ),
         delay,
     )
+    judge_question = _required_role(
+        model_runner,
+        model_calls,
+        agent=JUDGE_NAME,
+        role="judge",
+        model=GPT_OSS_MODEL,
+        case_summary=case_summary,
+        evidence_summary=evidence_summary,
+        trial_history=_trial_history(events),
+        persona=JUDGE_PERSONA,
+        objective="Ask the question most likely to reveal which side has met its burden.",
+        task=(
+            f"As {JUDGE_NAME}, ask one sharp hinge question that would change the outcome if answered. "
+            "Use Stoic restraint and public reason, speaking from the bench as I or we."
+        ),
+        provider=OPENAI_PROVIDER,
+        max_tokens=100,
+    )
+    yield _record_and_emit(
+        events,
         packet,
         source_trace,
         model_calls,
         TrialEvent(
+            phase="questions",
+            title="The Hinge Question",
+            body="The bench asks the single question that could turn the record.",
+            turns=[_turn(JUDGE_NAME, "judge", judge_question, GPT_OSS_MODEL, 0.88)],
+            evidence=packet.evidence,
+        ),
+        delay,
     )
     claimant_answer = _required_role(
         model_runner,
         model_calls,
+        agent="Mike OSS",
         role="claimant advocate",
+        model=GPT_OSS_MODEL,
+        case_summary=case_summary,
+        evidence_summary=evidence_summary,
+        trial_history=_trial_history(events),
+        objective="Answer the judge in the way most favorable to the claimant.",
+        task=f"Answer {JUDGE_NAME}'s hinge question as I for the claimant: {judge_question.text}",
+        provider=OPENAI_PROVIDER,
+        max_tokens=130,
+    )
+    yield _record_and_emit(
+        events,
         packet,
         source_trace,
         model_calls,
             phase="questions",
             title="Claimant Answers the Bench",
             body="The claimant answers the hinge question.",
+            turns=[_turn("Mike OSS", "claimant advocate", claimant_answer, GPT_OSS_MODEL, 0.88)],
             evidence=packet.evidence,
         ),
         delay,
     respondent_answer = _required_role(
         model_runner,
         model_calls,
+        agent="Harvey Vector",
         role="respondent advocate",
+        model=GPT_OSS_MODEL,
+        case_summary=case_summary,
+        evidence_summary=evidence_summary,
+        trial_history=_trial_history(events),
+        objective="Answer the judge in the way most favorable to the respondent.",
+        task=f"Answer {JUDGE_NAME}'s hinge question as I for the respondent: {judge_question.text}",
+        provider=OPENAI_PROVIDER,
+        max_tokens=130,
+    )
+    yield _record_and_emit(
+        events,
         packet,
         source_trace,
         model_calls,
             phase="questions",
             title="Respondent Answers the Bench",
             body="The respondent answers the hinge question.",
+            turns=[_turn("Harvey Vector", "respondent advocate", respondent_answer, GPT_OSS_MODEL, 0.88)],
             evidence=packet.evidence,
         ),
         delay,
         model=NEMOTRON_MODEL,
         case_summary=case_summary,
         evidence_summary=evidence_summary,
+        trial_history=_trial_history(events),
+        objective="Move the court from arguments into individual jury votes.",
+        task="Announce as we, the six named jurors, that we retire to vote. Do not reveal the votes yet.",
         provider=NEMOTRON_PROVIDER,
         max_tokens=100,
     )
+    yield _record_and_emit(
+        events,
         packet,
         source_trace,
         model_calls,
         delay,
     )
+    votes: list[JurorVote] = []
+    for juror, persona in JUROR_PERSONAS.items():
+        juror_vote_result = _required_role(
+            model_runner,
+            model_calls,
+            agent=juror,
+            role="juror",
+            model=NEMOTRON_MODEL,
+            case_summary=case_summary,
+            evidence_summary=evidence_summary,
+            trial_history=_trial_history(events),
+            persona=persona,
+            objective="Reach the verdict this historical worldview would consider right after watching the trial.",
+            task=_juror_task(juror, persona),
+            provider=NEMOTRON_PROVIDER,
+            max_tokens=220,
+        )
+        vote = _parse_juror_vote(juror_vote_result, packet, juror)
+        votes.append(vote)
         juror_result = ModelResult(
+            text=f"I vote {vote.vote.replace('_', ' ').title()}. {vote.reason}",
+            call=juror_vote_result.call,
+            input_text=juror_vote_result.input_text,
         )
+        yield _record_and_emit(
+            events,
             packet,
             source_trace,
             model_calls,
         )
     verdict = _verdict_from_votes(votes)
+    verdict_voice = _required_role(
+        model_runner,
+        model_calls,
+        agent=JUDGE_NAME,
+        role="verdict writer",
+        model=GPT_OSS_MODEL,
+        case_summary=case_summary,
+        evidence_summary=evidence_summary,
+        trial_history=_trial_history(events),
+        persona=JUDGE_PERSONA,
+        objective="Announce the jury result fairly, summarize both sides, and do not override the jury.",
+        task=(
+            f"As {JUDGE_NAME}, announce the final legal finding after the jury vote with Stoic restraint. "
+            f"Finding: {verdict.finding}. "
+            f"Jury rationale: {verdict.rationale} Remedy: {verdict.remedy}. Speak as I from the bench and include uncertainty without disclaiming the role."
+        ),
         provider=OPENAI_PROVIDER,
         max_tokens=160,
     )
+    yield _record_and_emit(
+        events,
         packet,
         source_trace,
         model_calls,
             phase="verdict",
             title="The Court Announces Judgment",
             body=f"{verdict_voice.text}\n\n{verdict.rationale}\n\nRemedy: {verdict.remedy}",
+            verdict=verdict,
+            votes=votes,
+            evidence=packet.evidence,
+            turns=[_turn(JUDGE_NAME, "verdict writer", verdict_voice, GPT_OSS_MODEL, 0.88)],
+        ),
+        delay,
+    )
 def stream_trial_jsonl(

sovereign_bench/export.py CHANGED Viewed

@@ -1,35 +1,35 @@
-from __future__ import annotations
-import json
-import tempfile
-from pathlib import Path
-from .models import TrialEvent
-def write_trace(events: list[TrialEvent]) -> str:
-    path = Path(tempfile.gettempdir()) / "sovereign_bench_trace.json"
-    path.write_text(
-        json.dumps([event.model_dump() for event in events], indent=2, ensure_ascii=True),
-        encoding="utf-8",
-    )
-    return str(path)
-def write_decree(events: list[TrialEvent]) -> str:
-    verdict_event = next((event for event in events if event.verdict), events[-1])
-    verdict = verdict_event.verdict
-    path = Path(tempfile.gettempdir()) / "sovereign_bench_decree.md"
-    if verdict is None:
-        text = "# Sovereign Bench Decree\n\nNo verdict was recorded."
-    else:
-        text = (
-            "# Sovereign Bench Decree\n\n"
-            f"## Finding\n{verdict.finding}\n\n"
-            f"## Decree\n{verdict.decree}\n\n"
-            f"## Rationale\n{verdict.rationale}\n\n"
-            f"## Remedy\n{verdict.remedy}\n\n"
-            f"## Uncertainty\n{verdict.uncertainty}\n"
-        )
-    path.write_text(text, encoding="utf-8")
-    return str(path)

+from __future__ import annotations
+import json
+import tempfile
+from pathlib import Path
+from .models import TrialEvent
+def write_trace(events: list[TrialEvent]) -> str:
+    path = Path(tempfile.gettempdir()) / "sovereign_bench_trace.json"
+    path.write_text(
+        json.dumps([event.model_dump() for event in events], indent=2, ensure_ascii=True),
+        encoding="utf-8",
+    )
+    return str(path)
+def write_decree(events: list[TrialEvent]) -> str:
+    verdict_event = next((event for event in events if event.verdict), events[-1])
+    verdict = verdict_event.verdict
+    path = Path(tempfile.gettempdir()) / "sovereign_bench_decree.md"
+    if verdict is None:
+        text = "# Sovereign Bench Decree\n\nNo verdict was recorded."
+    else:
+        text = (
+            "# Sovereign Bench Decree\n\n"
+            f"## Finding\n{verdict.finding}\n\n"
+            f"## Decree\n{verdict.decree}\n\n"
+            f"## Rationale\n{verdict.rationale}\n\n"
+            f"## Remedy\n{verdict.remedy}\n\n"
+            f"## Uncertainty\n{verdict.uncertainty}\n"
+        )
+    path.write_text(text, encoding="utf-8")
+    return str(path)

sovereign_bench/llm.py CHANGED Viewed

@@ -1,209 +1,296 @@
-from __future__ import annotations
-import os
-import re
-import time
-from dataclasses import dataclass
-from hashlib import sha256
-@dataclass
-class ModelCall:
-    model: str
-    provider: str
-    ok: bool
-    latency_ms: int
-    prompt_hash: str
-    error: str | None = None
-    requested_model: str | None = None
-    runtime: str | None = None
-    gpu: str | None = None
-@dataclass
-class ModelResult:
-    text: str
-    call: ModelCall
-    input_text: str = ""
-class ModelCallError(RuntimeError):
-    pass
-def _short_error(exc: Exception) -> str:
-    message = str(exc).replace("\n", " ").strip()
-    return f"{exc.__class__.__name__}: {message[:220]}"
-def messages_hash(messages: list[dict[str, str]]) -> str:
-    joined = "\n".join(f"{item.get('role', '')}:{item.get('content', '')}" for item in messages)
-    return sha256(joined.encode("utf-8")).hexdigest()[:16]
-def _prompt_from_messages(messages: list[dict[str, str]]) -> str:
-    return "\n\n".join(f"{item.get('role', 'user').upper()}:\n{item.get('content', '')}" for item in messages) + "\n\nASSISTANT:\n"
-def _response_text(response: object) -> str:
-    choice = response.choices[0]  # type: ignore[attr-defined]
-    message = choice.message
-    for attr in ("content", "reasoning_content", "reasoning"):
-        value = getattr(message, attr, None)
-        if isinstance(value, str) and value.strip():
-            return value.strip()
-        if isinstance(value, list):
-            pieces = []
-            for item in value:
-                text = getattr(item, "text", None) or (item.get("text") if isinstance(item, dict) else None)
-                if text:
-                    pieces.append(str(text))
-            if pieces:
-                return " ".join(pieces).strip()
-    if hasattr(message, "model_dump"):
-        data = message.model_dump()
-        for key in ("content", "reasoning_content", "reasoning"):
-            value = data.get(key)
-            if isinstance(value, str) and value.strip():
-                return value.strip()
-    return ""
-def clean_model_text(text: str) -> str:
-    cleaned = re.sub(r"(?is)<think>.*?</think>", "", text).strip()
-    if re.search(r"(?i)<think>", cleaned):
-        raise ModelCallError("model returned unclosed hidden reasoning")
-    cleaned = re.sub(r"(?is)<analysis>.*?</analysis>", "", cleaned).strip()
-    cleaned = re.sub(r"(?is)<reasoning>.*?</reasoning>", "", cleaned).strip()
-    cleaned = cleaned.replace("</think>", "").strip()
-    if not cleaned:
-        raise ModelCallError("model returned no visible output")
-    return cleaned
-def model_enabled() -> bool:
-    return os.getenv("SOVEREIGN_DISABLE_LIVE_MODELS", "").lower() not in {"1", "true", "yes"}
-def call_hf_chat_model(
-    *,
-    model: str,
-    messages: list[dict[str, str]],
-    provider: str = "auto",
-    max_tokens: int = 140,
-    temperature: float = 0.45,
-) -> ModelResult:
-    prompt_hash = messages_hash(messages)
-    started = time.perf_counter()
-    token = os.getenv("HF_TOKEN")
-    if not token or not model_enabled():
-        raise ModelCallError("HF_TOKEN missing or live models disabled")
-    try:
-        from huggingface_hub import InferenceClient
-        client = InferenceClient(model=model, provider=provider, token=token, timeout=45.0)
-        retry_messages = messages + [
-            {
-                "role": "user",
-                "content": (
-                    "Your previous response did not include visible courtroom dialogue. "
-                    "Return only the final spoken dialogue now. Do not include <think>, analysis, reasoning, markdown, or notes. /no_think"
-                ),
-            }
-        ]
-        last_error: Exception | None = None
-        text = ""
-        for attempt_messages in (messages, retry_messages):
-            try:
-                response = client.chat_completion(
-                    messages=attempt_messages,
-                    max_tokens=max_tokens,
-                    temperature=temperature,
-                    top_p=0.9,
-                )
-                raw_text = _response_text(response)
-            except Exception as chat_exc:
-                prompt = _prompt_from_messages(attempt_messages)
-                generated = client.text_generation(
-                    prompt,
-                    max_new_tokens=max_tokens,
-                    temperature=temperature,
-                    top_p=0.9,
-                    return_full_text=False,
-                )
-                raw_text = str(generated).strip()
-                if not raw_text:
-                    raise chat_exc
-            try:
-                text = clean_model_text(raw_text)
-                break
-            except ModelCallError as exc:
-                last_error = exc
-        if not text:
-            raise last_error or RuntimeError("empty model response")
-        return ModelResult(
-            text=text,
-            call=ModelCall(
-                model=model,
-                provider=provider,
-                ok=True,
-                latency_ms=int((time.perf_counter() - started) * 1000),
-                prompt_hash=prompt_hash,
-            ),
-        )
-    except Exception as exc:
-        raise ModelCallError(
-            f"{model} via {provider} failed after {int((time.perf_counter() - started) * 1000)}ms: {_short_error(exc)}"
-        ) from exc
-def call_small_model(
-    *,
-    agent: str,
-    role: str,
-    model: str,
-    case_summary: str,
-    task: str,
-    evidence_summary: str,
-    provider: str = "auto",
-    max_tokens: int = 120,
-) -> ModelResult:
-    messages = build_role_messages(
-        agent=agent,
-        role=role,
-        case_summary=case_summary,
-        task=task,
-        evidence_summary=evidence_summary,
-    )
-    result = call_hf_chat_model(
-        model=model,
-        provider=provider,
-        messages=messages,
-        max_tokens=max_tokens,
-    )
-    result.input_text = _prompt_from_messages(messages)
-    return result
-def build_role_messages(
-    *,
-    agent: str,
-    role: str,
-    case_summary: str,
-    task: str,
-    evidence_summary: str,
-) -> list[dict[str, str]]:
-    system = (
-        "You are one AI character in Sovereign Bench, a miniature virtual courtroom. "
-        "Write concise courtroom dialogue only. Cite evidence IDs when relevant. "
-        "Do not claim certainty beyond the record. Do not add markdown. "
-        "Return final spoken dialogue only; never reveal hidden reasoning, analysis, or <think> text. "
-        "Do not use thinking mode."
-    )
-    user = (
-        f"Agent: {agent}\nRole: {role}\nCase:\n{case_summary}\n\n"
-        f"Evidence:\n{evidence_summary}\n\nTask: {task}\n"
-        "Answer in 1-3 sentences, theatrical but clear.\n/no_think"
-    )
-    return [{"role": "system", "content": system}, {"role": "user", "content": user}]

+from __future__ import annotations
+import os
+import re
+import time
+from dataclasses import dataclass
+from hashlib import sha256
+@dataclass
+class ModelCall:
+    model: str
+    provider: str
+    ok: bool
+    latency_ms: int
+    prompt_hash: str
+    error: str | None = None
+    requested_model: str | None = None
+    runtime: str | None = None
+    gpu: str | None = None
+@dataclass
+class ModelResult:
+    text: str
+    call: ModelCall
+    input_text: str = ""
+class ModelCallError(RuntimeError):
+    pass
+def _short_error(exc: Exception) -> str:
+    message = str(exc).replace("\n", " ").strip()
+    return f"{exc.__class__.__name__}: {message[:220]}"
+def messages_hash(messages: list[dict[str, str]]) -> str:
+    joined = "\n".join(f"{item.get('role', '')}:{item.get('content', '')}" for item in messages)
+    return sha256(joined.encode("utf-8")).hexdigest()[:16]
+def _prompt_from_messages(messages: list[dict[str, str]]) -> str:
+    return "\n\n".join(f"{item.get('role', 'user').upper()}:\n{item.get('content', '')}" for item in messages) + "\n\nASSISTANT:\n"
+def _response_text(response: object) -> str:
+    choice = response.choices[0]  # type: ignore[attr-defined]
+    message = choice.message
+    for attr in ("content", "reasoning_content", "reasoning"):
+        value = getattr(message, attr, None)
+        if isinstance(value, str) and value.strip():
+            return value.strip()
+        if isinstance(value, list):
+            pieces = []
+            for item in value:
+                text = getattr(item, "text", None) or (item.get("text") if isinstance(item, dict) else None)
+                if text:
+                    pieces.append(str(text))
+            if pieces:
+                return " ".join(pieces).strip()
+    if hasattr(message, "model_dump"):
+        data = message.model_dump()
+        for key in ("content", "reasoning_content", "reasoning"):
+            value = data.get(key)
+            if isinstance(value, str) and value.strip():
+                return value.strip()
+    return ""
+INSTRUCTION_ECHO_RE = re.compile(
+    r"(?is)\b("
+    r"as requested|"
+    r"first[- ]person|"
+    r"pronoun|"
+    r"1\s*-\s*3 sentences|"
+    r"theatrical but clear|"
+    r"i will speak as|"
+    r"i will now (?:announce|answer|respond|deliver|speak)|"
+    r"as the assigned agent|"
+    r"the task"
+    r")\b"
+)
+def clean_model_text(text: str) -> str:
+    cleaned = re.sub(r"(?is)<think>.*?</think>", "", text).strip()
+    if re.search(r"(?i)<think>", cleaned):
+        raise ModelCallError("model returned unclosed hidden reasoning")
+    cleaned = re.sub(r"(?is)<analysis>.*?</analysis>", "", cleaned).strip()
+    cleaned = re.sub(r"(?is)<reasoning>.*?</reasoning>", "", cleaned).strip()
+    cleaned = cleaned.replace("</think>", "").strip()
+    channel_match = re.search(r"(?ims)^\s*(?:final|assistant_final)\s*:?\s*(.+)\Z", cleaned)
+    if channel_match:
+        cleaned = channel_match.group(1).strip()
+    else:
+        final_after_analysis = re.search(
+            r"(?ims)^\s*(?:analysis|reasoning|assistant_analysis)\s*:?.*?^\s*(?:final|assistant_final)\s*:?\s*(.+)\Z",
+            cleaned,
+        )
+        if final_after_analysis:
+            cleaned = final_after_analysis.group(1).strip()
+        elif re.search(r"(?im)^\s*(?:analysis|reasoning|assistant_analysis)\s*:?", cleaned):
+            raise ModelCallError("model returned hidden analysis instead of courtroom dialogue")
+    if re.search(r"(?i)\b(?:analysis|reasoning)\s*:", cleaned[:80]):
+        raise ModelCallError("model returned hidden analysis instead of courtroom dialogue")
+    if INSTRUCTION_ECHO_RE.search(cleaned[:420]):
+        pieces = [piece.strip() for piece in re.split(r"\n\s*\n", cleaned) if piece.strip()]
+        dialogue_pieces = [piece for piece in pieces if not INSTRUCTION_ECHO_RE.search(piece)]
+        if not dialogue_pieces:
+            raise ModelCallError("model echoed instructions instead of courtroom dialogue")
+        cleaned = "\n\n".join(dialogue_pieces).strip()
+    if not cleaned:
+        raise ModelCallError("model returned no visible output")
+    return cleaned
+def model_enabled() -> bool:
+    return os.getenv("SOVEREIGN_DISABLE_LIVE_MODELS", "").lower() not in {"1", "true", "yes"}
+def call_hf_chat_model(
+    *,
+    model: str,
+    messages: list[dict[str, str]],
+    provider: str = "auto",
+    max_tokens: int = 140,
+    temperature: float = 0.45,
+) -> ModelResult:
+    prompt_hash = messages_hash(messages)
+    started = time.perf_counter()
+    token = os.getenv("HF_TOKEN")
+    if not token or not model_enabled():
+        raise ModelCallError("HF_TOKEN missing or live models disabled")
+    try:
+        from huggingface_hub import InferenceClient
+        client = InferenceClient(model=model, provider=provider, token=token, timeout=45.0)
+        retry_messages = messages + [
+            {
+                "role": "user",
+                "content": (
+                    "Your previous response did not include visible courtroom dialogue. "
+                    "Return only the final first-person spoken dialogue now, as the assigned agent. "
+                    "Do not mention prompts, tasks, requirements, pronouns, sentence counts, or that you are following instructions. "
+                    "Do not include <think>, analysis, reasoning, markdown, narration, or notes. /no_think"
+                ),
+            }
+        ]
+        last_error: Exception | None = None
+        text = ""
+        for attempt_messages in (messages, retry_messages):
+            try:
+                response = client.chat_completion(
+                    messages=attempt_messages,
+                    max_tokens=max_tokens,
+                    temperature=temperature,
+                    top_p=0.9,
+                )
+                raw_text = _response_text(response)
+            except Exception as chat_exc:
+                prompt = _prompt_from_messages(attempt_messages)
+                generated = client.text_generation(
+                    prompt,
+                    max_new_tokens=max_tokens,
+                    temperature=temperature,
+                    top_p=0.9,
+                    return_full_text=False,
+                )
+                raw_text = str(generated).strip()
+                if not raw_text:
+                    raise chat_exc
+            try:
+                text = clean_model_text(raw_text)
+                break
+            except ModelCallError as exc:
+                last_error = exc
+        if not text:
+            raise last_error or RuntimeError("empty model response")
+        return ModelResult(
+            text=text,
+            call=ModelCall(
+                model=model,
+                provider=provider,
+                ok=True,
+                latency_ms=int((time.perf_counter() - started) * 1000),
+                prompt_hash=prompt_hash,
+            ),
+        )
+    except Exception as exc:
+        raise ModelCallError(
+            f"{model} via {provider} failed after {int((time.perf_counter() - started) * 1000)}ms: {_short_error(exc)}"
+        ) from exc
+def call_small_model(
+    *,
+    agent: str,
+    role: str,
+    model: str,
+    case_summary: str,
+    task: str,
+    evidence_summary: str,
+    trial_history: str = "",
+    persona: str = "",
+    objective: str = "",
+    provider: str = "auto",
+    max_tokens: int = 120,
+) -> ModelResult:
+    messages = build_role_messages(
+        agent=agent,
+        role=role,
+        case_summary=case_summary,
+        task=task,
+        evidence_summary=evidence_summary,
+        trial_history=trial_history,
+        persona=persona,
+        objective=objective,
+    )
+    result = call_hf_chat_model(
+        model=model,
+        provider=provider,
+        messages=messages,
+        max_tokens=max_tokens,
+    )
+    result.input_text = _prompt_from_messages(messages)
+    return result
+def build_role_messages(
+    *,
+    agent: str,
+    role: str,
+    case_summary: str,
+    task: str,
+    evidence_summary: str,
+    trial_history: str = "",
+    persona: str = "",
+    objective: str = "",
+) -> list[dict[str, str]]:
+    vote_role = role == "juror"
+    dialogue_role = not vote_role
+    system = (
+        "You are one AI character in Sovereign Bench, a miniature virtual courtroom. "
+        "Stay fully in character as the assigned Agent and Role. "
+        "Use the case facts and evidence provided below; cite evidence IDs when relevant. "
+        "Do not claim certainty beyond the record. Do not add markdown. "
+        "Never reveal hidden reasoning, analysis, or <think> text. "
+        "Do not use thinking mode."
+    )
+    if role in {"claimant advocate", "respondent advocate"}:
+        system += (
+            " You are a lawyer trying to win for your side. Use the evidence, the other side's claims, "
+            "and the trial record to make the strongest fair argument available."
+        )
+    elif role in {"judge", "verdict writer"}:
+        system += (
+            " You are a fair judge. Consider both sides, the evidence, and the trial record. "
+            "At verdict, announce and contextualize the jury result rather than replacing it with your own preferred outcome."
+        )
+    elif role == "juror":
+        system += (
+            " You are an individual juror. Decide through your named worldview and the trial transcript, "
+            "not a generic juror role. Output only valid JSON for your vote."
+        )
+    elif role == "juror panel":
+        system += " You speak for the jury panel procedurally; do not reveal votes before deliberation."
+    elif role == "clerk":
+        system += " You are a procedural courtroom role; present the record clearly without deciding the verdict."
+    if dialogue_role:
+        system += (
+            " Output only the words this character says aloud in court. "
+            "Use I, me, my, we, or our naturally when the role calls for it. "
+            "Do not narrate about yourself in the third person. Do not summarize what the agent would say."
+        )
+        answer_instruction = (
+            f"Speak as {agent}. Give only the in-scene court line, 1-3 concise sentences."
+        )
+    else:
+        answer_instruction = (
+            "Return only the requested JSON object. "
+            "Do not add dialogue, markdown, or commentary."
+        )
+    persona_block = f"\nPersona / worldview:\n{persona}\n" if persona else ""
+    objective_block = f"\nObjective:\n{objective}\n" if objective else ""
+    history_block = f"\nTrial history so far:\n{trial_history}\n" if trial_history else ""
+    user = (
+        f"Agent: {agent}\nRole: {role}\nCase:\n{case_summary}\n\n"
+        f"Evidence:\n{evidence_summary}\n"
+        f"{persona_block}{objective_block}{history_block}\nTask: {task}\n"
+        f"{answer_instruction}\n/no_think"
+    )
+    return [{"role": "system", "content": system}, {"role": "user", "content": user}]

sovereign_bench/models.py CHANGED Viewed

@@ -1,86 +1,88 @@
-from __future__ import annotations
-from typing import Literal
-from pydantic import BaseModel, Field
-TrialPhase = Literal[
-    "intake",
-    "claims",
-    "opening",
-    "evidence",
-    "questions",
-    "deliberation",
-    "verdict",
-    "appeal",
-]
-class EvidenceItem(BaseModel):
-    id: str
-    title: str
-    source: str
-    excerpt: str
-    supports: Literal["claimant", "respondent", "mixed", "context"]
-    reliability: float = Field(ge=0.0, le=1.0)
-    note: str
-class CasePacket(BaseModel):
-    id: str
-    title: str
-    subtitle: str
-    claimant: str
-    respondent: str
-    charge: str
-    setting: str
-    claimant_claim: str
-    respondent_claim: str
-    source_note: str
-    evidence: list[EvidenceItem]
-class TrialRequest(BaseModel):
-    case_id: str = "socrates"
-    search_query: str = ""
-    hypothetical: str = ""
-    speed: Literal["swift", "measured", "ceremonial"] = "swift"
-    mind_layer: bool = True
-class AgentTurn(BaseModel):
-    agent: str
-    role: str
-    content: str
-    model: str
-    confidence: float = Field(ge=0.0, le=1.0)
-    input: str = ""
-class JurorVote(BaseModel):
-    juror: str
-    persona: str = ""
-    vote: Literal["liable", "not_liable", "uncertain"]
-    reason: str
-    evidence_ids: list[str]
-class Verdict(BaseModel):
-    finding: Literal["liable", "not_liable", "mixed", "uncertain"]
-    decree: str
-    rationale: str
-    evidence_ids: list[str]
-    uncertainty: str
-    remedy: str
-class TrialEvent(BaseModel):
-    phase: TrialPhase
-    title: str
-    body: str
-    turns: list[AgentTurn] = Field(default_factory=list)
-    evidence: list[EvidenceItem] = Field(default_factory=list)
-    votes: list[JurorVote] = Field(default_factory=list)
-    verdict: Verdict | None = None
-    trace: dict = Field(default_factory=dict)

+from __future__ import annotations
+from typing import Literal
+from pydantic import BaseModel, Field
+TrialPhase = Literal[
+    "intake",
+    "claims",
+    "opening",
+    "evidence",
+    "questions",
+    "deliberation",
+    "verdict",
+    "appeal",
+]
+class EvidenceItem(BaseModel):
+    id: str
+    title: str
+    source: str
+    excerpt: str
+    supports: Literal["claimant", "respondent", "mixed", "context"]
+    reliability: float = Field(ge=0.0, le=1.0)
+    note: str
+class CasePacket(BaseModel):
+    id: str
+    title: str
+    subtitle: str
+    claimant: str
+    respondent: str
+    charge: str
+    setting: str
+    context: str = ""
+    claimant_claim: str
+    respondent_claim: str
+    source_note: str
+    evidence: list[EvidenceItem]
+class TrialRequest(BaseModel):
+    case_id: str = "socrates"
+    search_query: str = ""
+    hypothetical: str = ""
+    custom_case: CasePacket | None = None
+    speed: Literal["swift", "measured", "ceremonial"] = "swift"
+    mind_layer: bool = True
+class AgentTurn(BaseModel):
+    agent: str
+    role: str
+    content: str
+    model: str
+    confidence: float = Field(ge=0.0, le=1.0)
+    input: str = ""
+class JurorVote(BaseModel):
+    juror: str
+    persona: str = ""
+    vote: Literal["liable", "not_liable", "uncertain"]
+    reason: str
+    evidence_ids: list[str]
+class Verdict(BaseModel):
+    finding: Literal["liable", "not_liable", "mixed", "uncertain"]
+    decree: str
+    rationale: str
+    evidence_ids: list[str]
+    uncertainty: str
+    remedy: str
+class TrialEvent(BaseModel):
+    phase: TrialPhase
+    title: str
+    body: str
+    turns: list[AgentTurn] = Field(default_factory=list)
+    evidence: list[EvidenceItem] = Field(default_factory=list)
+    votes: list[JurorVote] = Field(default_factory=list)
+    verdict: Verdict | None = None
+    trace: dict = Field(default_factory=dict)

sovereign_bench/retrieval.py CHANGED Viewed

@@ -1,70 +1,70 @@
-from __future__ import annotations
-import re
-from urllib.parse import quote_plus
-import httpx
-from .models import CasePacket, EvidenceItem
-def _plain_text(html: str) -> str:
-    html = re.sub(r"(?is)<script.*?</script>|<style.*?</style>", " ", html)
-    html = re.sub(r"(?s)<[^>]+>", " ", html)
-    html = re.sub(r"\s+", " ", html)
-    return html.strip()
-def build_live_case(query: str, hypothetical: str = "") -> CasePacket | None:
-    clean_query = " ".join(query.split())
-    if len(clean_query) < 8:
-        return None
-    try:
-        url = f"https://r.jina.ai/http://r.jina.ai/http://duckduckgo.com/html/?q={quote_plus(clean_query)}"
-        response = httpx.get(url, timeout=8.0, follow_redirects=True)
-        text = _plain_text(response.text)
-    except Exception:
-        return None
-    snippets = [
-        segment.strip()
-        for segment in re.split(r"(?<=[.!?])\s+", text)
-        if 80 <= len(segment.strip()) <= 320 and "http" not in segment[:20].lower()
-    ]
-    unique: list[str] = []
-    for snippet in snippets:
-        if snippet.lower() not in {item.lower() for item in unique}:
-            unique.append(snippet)
-        if len(unique) == 4:
-            break
-    if len(unique) < 2:
-        return None
-    evidence = [
-        EvidenceItem(
-            id=f"WEB-E{i}",
-            title=f"Retrieved fragment {i}",
-            source=f"Web retrieval for: {clean_query}",
-            excerpt=snippet,
-            supports="context" if i == 1 else "mixed",
-            reliability=max(0.45, 0.72 - (i * 0.06)),
-            note="Live retrieval fragment; the court treats it as context until corroborated.",
-        )
-        for i, snippet in enumerate(unique, start=1)
-    ]
-    framing = hypothetical.strip() or "the parties dispute how the retrieved facts should be interpreted"
-    return CasePacket(
-        id="live",
-        title=f"Live Search Tribunal: {clean_query[:58]}",
-        subtitle="A search-fed miniature proceeding with uncertainty kept visible.",
-        claimant="The Search Record",
-        respondent="The Counter-Interpretation",
-        charge=f"Whether {framing}.",
-        setting="A temporary court assembled from retrieved public web fragments.",
-        claimant_claim="The retrieved record supports a coherent claim that should be credited.",
-        respondent_claim="The retrieved record is incomplete, ambiguous, or overread by the claimant.",
-        source_note="Live web retrieval via public search snippets. Treat as unverified context, not ground truth.",
-        evidence=evidence,
-    )

+from __future__ import annotations
+import re
+from urllib.parse import quote_plus
+import httpx
+from .models import CasePacket, EvidenceItem
+def _plain_text(html: str) -> str:
+    html = re.sub(r"(?is)<script.*?</script>|<style.*?</style>", " ", html)
+    html = re.sub(r"(?s)<[^>]+>", " ", html)
+    html = re.sub(r"\s+", " ", html)
+    return html.strip()
+def build_live_case(query: str, hypothetical: str = "") -> CasePacket | None:
+    clean_query = " ".join(query.split())
+    if len(clean_query) < 8:
+        return None
+    try:
+        url = f"https://r.jina.ai/http://r.jina.ai/http://duckduckgo.com/html/?q={quote_plus(clean_query)}"
+        response = httpx.get(url, timeout=8.0, follow_redirects=True)
+        text = _plain_text(response.text)
+    except Exception:
+        return None
+    snippets = [
+        segment.strip()
+        for segment in re.split(r"(?<=[.!?])\s+", text)
+        if 80 <= len(segment.strip()) <= 320 and "http" not in segment[:20].lower()
+    ]
+    unique: list[str] = []
+    for snippet in snippets:
+        if snippet.lower() not in {item.lower() for item in unique}:
+            unique.append(snippet)
+        if len(unique) == 4:
+            break
+    if len(unique) < 2:
+        return None
+    evidence = [
+        EvidenceItem(
+            id=f"WEB-E{i}",
+            title=f"Retrieved fragment {i}",
+            source=f"Web retrieval for: {clean_query}",
+            excerpt=snippet,
+            supports="context" if i == 1 else "mixed",
+            reliability=max(0.45, 0.72 - (i * 0.06)),
+            note="Live retrieval fragment; the court treats it as context until corroborated.",
+        )
+        for i, snippet in enumerate(unique, start=1)
+    ]
+    framing = hypothetical.strip() or "the parties dispute how the retrieved facts should be interpreted"
+    return CasePacket(
+        id="live",
+        title=f"Live Search Tribunal: {clean_query[:58]}",
+        subtitle="A search-fed miniature proceeding with uncertainty kept visible.",
+        claimant="The Search Record",
+        respondent="The Counter-Interpretation",
+        charge=f"Whether {framing}.",
+        setting="A temporary court assembled from retrieved public web fragments.",
+        claimant_claim="The retrieved record supports a coherent claim that should be credited.",
+        respondent_claim="The retrieved record is incomplete, ambiguous, or overread by the claimant.",
+        source_note="Live web retrieval via public search snippets. Treat as unverified context, not ground truth.",
+        evidence=evidence,
+    )

tests/test_cases.py CHANGED Viewed

@@ -1,8 +1,16 @@
-from sovereign_bench.cases import CASES
-def test_cached_cases_have_evidence():
-    assert {"socrates", "barnaby"} <= set(CASES)
-    for case in CASES.values():
-        assert len(case.evidence) >= 4
-        assert all(item.id and item.excerpt for item in case.evidence)

+from sovereign_bench.cases import CASES
+def test_cached_cases_have_evidence():
+    assert {"socrates", "greg", "barnaby"} <= set(CASES)
+    for case in CASES.values():
+        assert len(case.evidence) >= 4
+        assert all(item.id and item.excerpt for item in case.evidence)
+def test_demo_cases_have_book_context_and_three_items_per_side():
+    for case_id in ["socrates", "greg"]:
+        case = CASES[case_id]
+        assert case.context
+        assert len([item for item in case.evidence if item.supports == "claimant"]) >= 3
+        assert len([item for item in case.evidence if item.supports == "respondent"]) >= 3

tests/test_engine.py CHANGED Viewed

@@ -1,149 +1,326 @@
-import json
-import re
-import pytest
-from sovereign_bench.engine import JUDGE_NAME, JUROR_PERSONAS, RequiredModelError, run_trial
-from sovereign_bench.llm import ModelCall, ModelResult
-from sovereign_bench.models import TrialRequest
-def _jury_json(evidence_summary: str, vote: str = "liable") -> str:
-    evidence_ids = re.findall(r"^([A-Z]+-E\d+):", evidence_summary, flags=re.M)
-    evidence_ids = (evidence_ids or ["SOC-E1"]) * 6
-    return json.dumps(
-        {
-            "votes": [
-                {
-                    "juror": name,
-                    "persona": persona,
-                    "vote": vote if idx < 4 else "not_liable",
-                    "reason": f"{name} applies a {persona} lens to exhibit {evidence_ids[idx]}.",
-                    "evidence_ids": [evidence_ids[idx]],
-                }
-                for idx, (name, persona) in enumerate(JUROR_PERSONAS.items())
-            ]
-        }
-    )
-def fake_model_runner(**kwargs):
-    text = (
-        _jury_json(kwargs["evidence_summary"])
-        if kwargs["role"] == "juror vote generator"
-        else f"{kwargs['agent']} responds to: {kwargs['task']}"
-    )
-    prompt = (
-        f"SYSTEM:\nFake live model for tests.\n\nUSER:\n"
-        f"Agent: {kwargs['agent']}\nRole: {kwargs['role']}\nTask: {kwargs['task']}\n\nASSISTANT:\n"
-    )
-    return ModelResult(
-        text=text,
-        input_text=prompt,
-        call=ModelCall(
-            model=kwargs["model"],
-            provider=kwargs.get("provider", "test"),
-            ok=True,
-            latency_ms=1,
-            prompt_hash="test-prompt",
-        ),
-    )
-def test_cached_cases_emit_sequential_speaker_order():
-    expected_speakers = [
-        "Clerk Meridian",
-        JUDGE_NAME,
-        "Advocate Auric",
-        "Counsel Sable",
-        "Auditor Prism",
-        JUDGE_NAME,
-        "Advocate Auric",
-        "Counsel Sable",
-        "Nemotron Jury",
-        *list(JUROR_PERSONAS),
-        JUDGE_NAME,
-    ]
-    for case_id in ["socrates", "barnaby"]:
-        events = run_trial(TrialRequest(case_id=case_id), model_runner=fake_model_runner)
-        assert [event.turns[0].agent for event in events] == expected_speakers
-        assert [event.phase for event in events].count("deliberation") == 7
-        assert events[0].turns[0].input
-        assert "SYSTEM:" in events[0].turns[0].input
-        assert events[-1].verdict is not None
-        assert events[-1].votes and len(events[-1].votes) == 6
-        assert "uncertainty" in events[-1].verdict.uncertainty.lower()
-def test_no_event_contains_both_lawyers_speaking_together():
-    events = run_trial(TrialRequest(case_id="socrates"), model_runner=fake_model_runner)
-    for event in events:
-        agents = {turn.agent for turn in event.turns}
-        assert not {"Advocate Auric", "Counsel Sable"}.issubset(agents)
-def test_juror_vote_events_have_fixed_personas_and_evidence():
-    events = run_trial(TrialRequest(case_id="socrates"), model_runner=fake_model_runner)
-    juror_events = [event for event in events if event.turns[0].agent in JUROR_PERSONAS]
-    assert len(juror_events) == 6
-    for event in juror_events:
-        vote = event.votes[0]
-        assert vote.juror == event.turns[0].agent
-        assert vote.persona == JUROR_PERSONAS[vote.juror]
-        assert vote.vote in {"liable", "not_liable", "uncertain"}
-        assert vote.reason
-        assert vote.evidence_ids
-    final = events[-1]
-    assert final.phase == "verdict"
-    assert [vote.juror for vote in final.votes] == list(JUROR_PERSONAS)
-def test_jury_contract_uses_public_history_personas():
-    assert JUDGE_NAME == "Marcus Aurelius"
-    assert JUROR_PERSONAS == {
-        "Karl Marx": "class power, material conditions, exploitation, institutional incentives",
-        "John Stuart Mill": "liberty, harm principle, utility, individual rights",
-        "Confucius": "social harmony, role duty, ritual order, moral cultivation",
-        "Cleopatra VII": "sovereign pragmatism, diplomacy, survival, legitimacy under pressure",
-        "Niccolo Machiavelli": "political realism, stability, power, consequences over ideals",
-        "Jensen Huang": "technological optimism, operator mindset, systems thinking, innovation tradeoffs",
-    }
-def test_required_model_failure_stops_trial_without_canned_dialogue():
-    def failing_runner(**kwargs):
-        return ModelResult(
-            text="",
-            input_text="SYSTEM:\nfailed",
-            call=ModelCall(
-                model=kwargs["model"],
-                provider=kwargs.get("provider", "test"),
-                ok=False,
-                latency_ms=1,
-                prompt_hash="test-prompt",
-                error="offline",
-            ),
-        )
-    with pytest.raises(RequiredModelError, match="unavailable"):
-        run_trial(TrialRequest(case_id="socrates"), model_runner=failing_runner)
-def test_invalid_jury_output_stops_trial_without_fallback_votes():
-    def invalid_jury_runner(**kwargs):
-        result = fake_model_runner(**kwargs)
-        if kwargs["role"] == "juror vote generator":
-            result.text = "the jury refuses structured output"
-        return result
-    with pytest.raises(RequiredModelError, match="invalid JSON"):
-        run_trial(TrialRequest(case_id="socrates"), model_runner=invalid_jury_runner)
-def test_live_search_stops_when_query_is_weak():
-    with pytest.raises(RuntimeError, match="no fallback case"):
-        run_trial(TrialRequest(case_id="live", search_query="x"), model_runner=fake_model_runner)

+import json
+import re
+import pytest
+from sovereign_bench.engine import JUDGE_NAME, JUROR_PERSONAS, RequiredModelError, run_trial, stream_trial
+from sovereign_bench.llm import ModelCall, ModelResult, build_role_messages, clean_model_text
+from sovereign_bench.models import CasePacket, EvidenceItem, TrialRequest
+def _juror_json(kwargs, vote: str = "liable") -> str:
+    evidence_ids = re.findall(r"^([A-Z]+-[A-Z]\d+):", kwargs["evidence_summary"], flags=re.M)
+    evidence_id = (evidence_ids or ["SOC-E1"])[0]
+    return json.dumps(
+        {
+            "juror": kwargs["agent"],
+            "persona": kwargs["persona"],
+            "vote": vote,
+            "reason": f"{kwargs['agent']} applies {kwargs['persona']} to exhibit {evidence_id}.",
+            "evidence_ids": [evidence_id],
+        }
+    )
+def fake_model_runner(**kwargs):
+    text = (
+        _juror_json(kwargs, vote="liable" if list(JUROR_PERSONAS).index(kwargs["agent"]) < 4 else "not_liable")
+        if kwargs["role"] == "juror"
+        else f"{kwargs['agent']} responds to: {kwargs['task']}"
+    )
+    prompt = (
+        f"SYSTEM:\nFake live model for tests.\n\nUSER:\n"
+        f"Agent: {kwargs['agent']}\nRole: {kwargs['role']}\n"
+        f"Persona: {kwargs.get('persona', '')}\nObjective: {kwargs.get('objective', '')}\n"
+        f"History: {kwargs.get('trial_history', '')}\nTask: {kwargs['task']}\n\nASSISTANT:\n"
+    )
+    return ModelResult(
+        text=text,
+        input_text=prompt,
+        call=ModelCall(
+            model=kwargs["model"],
+            provider=kwargs.get("provider", "test"),
+            ok=True,
+            latency_ms=1,
+            prompt_hash="test-prompt",
+        ),
+    )
+def test_cached_cases_emit_sequential_speaker_order():
+    expected_speakers = [
+        "Clerk Meridian",
+        JUDGE_NAME,
+        "Mike OSS",
+        "Harvey Vector",
+        JUDGE_NAME,
+        "Mike OSS",
+        "Harvey Vector",
+        "Nemotron Jury",
+        *list(JUROR_PERSONAS),
+        JUDGE_NAME,
+    ]
+    for case_id in ["socrates", "barnaby"]:
+        events = run_trial(TrialRequest(case_id=case_id), model_runner=fake_model_runner)
+        assert [event.turns[0].agent for event in events if event.turns] == expected_speakers
+        evidence_event = next(event for event in events if event.phase == "evidence")
+        assert evidence_event.title == "The Evidence Record"
+        assert evidence_event.turns == []
+        assert [event.phase for event in events].count("deliberation") == 7
+        assert events[0].turns[0].input
+        assert "SYSTEM:" in events[0].turns[0].input
+        assert events[-1].verdict is not None
+        assert events[-1].votes and len(events[-1].votes) == 6
+        assert "uncertainty" in events[-1].verdict.uncertainty.lower()
+def test_no_event_contains_both_lawyers_speaking_together():
+    events = run_trial(TrialRequest(case_id="socrates"), model_runner=fake_model_runner)
+    for event in events:
+        agents = {turn.agent for turn in event.turns}
+        assert not {"Mike OSS", "Harvey Vector"}.issubset(agents)
+def test_juror_vote_events_have_fixed_personas_and_evidence():
+    events = run_trial(TrialRequest(case_id="socrates"), model_runner=fake_model_runner)
+    juror_events = [event for event in events if event.turns and event.turns[0].agent in JUROR_PERSONAS]
+    assert len(juror_events) == 6
+    for event in juror_events:
+        vote = event.votes[0]
+        assert vote.juror == event.turns[0].agent
+        assert vote.persona == JUROR_PERSONAS[vote.juror]
+        assert vote.vote in {"liable", "not_liable", "uncertain"}
+        assert event.turns[0].content.startswith("I vote ")
+        assert vote.reason
+        assert vote.evidence_ids
+    final = events[-1]
+    assert final.phase == "verdict"
+    assert [vote.juror for vote in final.votes] == list(JUROR_PERSONAS)
+def test_jurors_are_called_independently_with_personas_and_trial_history():
+    calls = []
+    def recording_runner(**kwargs):
+        calls.append(kwargs.copy())
+        return fake_model_runner(**kwargs)
+    run_trial(TrialRequest(case_id="socrates"), model_runner=recording_runner)
+    juror_calls = [call for call in calls if call["role"] == "juror"]
+    assert [call["agent"] for call in juror_calls] == list(JUROR_PERSONAS)
+    assert len(juror_calls) == 6
+    for call in juror_calls:
+        assert call["persona"] == JUROR_PERSONAS[call["agent"]]
+        assert "Claimant Opening" in call["trial_history"]
+        assert "Respondent Opening" in call["trial_history"]
+        assert "The Evidence Record" in call["trial_history"]
+        assert "historical worldview" in call["objective"]
+def test_lawyers_and_judge_receive_trial_history_and_objectives():
+    calls = []
+    def recording_runner(**kwargs):
+        calls.append(kwargs.copy())
+        return fake_model_runner(**kwargs)
+    run_trial(TrialRequest(case_id="socrates"), model_runner=recording_runner)
+    claimant_answer = next(call for call in calls if call["agent"] == "Mike OSS" and "hinge question" in call["task"])
+    respondent_answer = next(call for call in calls if call["agent"] == "Harvey Vector" and "hinge question" in call["task"])
+    verdict_call = next(call for call in calls if call["role"] == "verdict writer")
+    assert "The Hinge Question" in claimant_answer["trial_history"]
+    assert "The Hinge Question" in respondent_answer["trial_history"]
+    assert "most favorable to the claimant" in claimant_answer["objective"]
+    assert "most favorable to the respondent" in respondent_answer["objective"]
+    assert all(name in verdict_call["trial_history"] for name in JUROR_PERSONAS)
+    assert "do not override the jury" in verdict_call["objective"]
+def test_custom_case_context_and_evidence_reach_lawyer_prompts():
+    custom = CasePacket(
+        id="custom",
+        title="Custom Trial",
+        subtitle="Entered by user.",
+        claimant="Claimant",
+        respondent="Respondent",
+        charge="Whether the custom record favors the claimant.",
+        setting="A custom courtroom.",
+        context="A bicycle disappeared after a disputed garage visit.",
+        claimant_claim="The claimant says the visit explains the missing bicycle.",
+        respondent_claim="The respondent says the timing and evidence are ambiguous.",
+        source_note="Custom test packet.",
+        evidence=[
+            EvidenceItem(
+                id="CUS-F1",
+                title="Garage Text",
+                source="Custom",
+                excerpt="The respondent asked to enter the garage.",
+                supports="claimant",
+                reliability=0.65,
+                note="Supports access.",
+            ),
+            EvidenceItem(
+                id="CUS-A1",
+                title="Neighbor Sighting",
+                source="Custom",
+                excerpt="A neighbor saw the bicycle later that day.",
+                supports="respondent",
+                reliability=0.65,
+                note="Supports alternative timing.",
+            ),
+        ],
+    )
+    calls = []
+    def recording_runner(**kwargs):
+        calls.append(kwargs.copy())
+        return fake_model_runner(**kwargs)
+    run_trial(TrialRequest(case_id="custom", custom_case=custom), model_runner=recording_runner)
+    claimant_opening = next(call for call in calls if call["agent"] == "Mike OSS" and call["role"] == "claimant advocate")
+    assert "A bicycle disappeared" in claimant_opening["case_summary"]
+    assert "CUS-F1" in claimant_opening["evidence_summary"]
+    assert "CUS-A1" in claimant_opening["evidence_summary"]
+def test_jury_contract_uses_public_history_personas():
+    assert JUDGE_NAME == "Marcus Aurelius"
+    assert JUROR_PERSONAS == {
+        "Karl Marx": "class power, material conditions, exploitation, institutional incentives",
+        "John Stuart Mill": "liberty, harm principle, utility, individual rights",
+        "Confucius": "social harmony, role duty, ritual order, moral cultivation",
+        "Cleopatra VII": "sovereign pragmatism, diplomacy, survival, legitimacy under pressure",
+        "Niccolo Machiavelli": "political realism, stability, power, consequences over ideals",
+        "Jensen Huang": "technological optimism, operator mindset, systems thinking, innovation tradeoffs",
+    }
+def test_role_prompt_requires_first_person_in_character_speech():
+    messages = build_role_messages(
+        agent="Harvey Vector",
+        role="respondent advocate",
+        case_summary="A short case summary.",
+        evidence_summary="SOC-E1: A record excerpt.",
+        task="Answer the bench for the respondent.",
+    )
+    system = messages[0]["content"]
+    user = messages[1]["content"]
+    assert "Stay fully in character as the assigned Agent and Role." in system
+    assert "Output only the words this character says aloud in court." in system
+    assert "Do not narrate about yourself in the third person." in system
+    assert "Use the case facts and evidence provided below" in system
+    assert "Speak as Harvey Vector." in user
+    assert "Give only the in-scene court line" in user
+    assert "SOC-E1" in user
+def test_juror_vote_prompt_uses_persona_history_and_json_contract():
+    messages = build_role_messages(
+        agent="Karl Marx",
+        role="juror",
+        case_summary="A short case summary.",
+        evidence_summary="SOC-E1: A record excerpt.",
+        trial_history="Mike OSS argued from SOC-E1.",
+        persona=JUROR_PERSONAS["Karl Marx"],
+        objective="Vote as Karl Marx would after watching the trial.",
+        task="Return one juror vote as JSON.",
+    )
+    system = messages[0]["content"]
+    user = messages[1]["content"]
+    assert "Output only the words this character says aloud in court." not in messages[0]["content"]
+    assert "You are an individual juror." in system
+    assert JUROR_PERSONAS["Karl Marx"] in user
+    assert "Mike OSS argued from SOC-E1." in user
+    assert "Return only the requested JSON object." in user
+def test_model_cleaner_extracts_final_speech_after_analysis_channel():
+    text = clean_model_text(
+        "analysis\nI should reason about the case first.\n\nfinal\nI stand for the respondent, and SOC-E1 leaves doubt."
+    )
+    assert text == "I stand for the respondent, and SOC-E1 leaves doubt."
+    assert "analysis" not in text.lower()
+def test_model_cleaner_rejects_visible_analysis_without_final_speech():
+    def analysis_runner(**kwargs):
+        return ModelResult(
+            text="analysis: I should think through the case before answering.",
+            input_text="SYSTEM:\nanalysis leak",
+            call=ModelCall(
+                model=kwargs["model"],
+                provider=kwargs.get("provider", "test"),
+                ok=True,
+                latency_ms=1,
+                prompt_hash="test-prompt",
+            ),
+        )
+    with pytest.raises(RequiredModelError):
+        next(stream_trial(TrialRequest(case_id="socrates"), model_runner=analysis_runner))
+def test_model_cleaner_removes_instruction_echo_when_dialogue_remains():
+    text = clean_model_text(
+        "I will now announce the case as requested, while maintaining the theatrical but clear tone required. "
+        "I will speak as Clerk Meridian in first person, starting with a pronoun.\n\n"
+        "I call The Polis v. Socrates before this court."
+    )
+    assert text == "I call The Polis v. Socrates before this court."
+def test_model_cleaner_rejects_instruction_echo_without_dialogue():
+    with pytest.raises(Exception, match="echoed instructions"):
+        clean_model_text(
+            "I will now announce the case as requested, while maintaining the theatrical but clear tone required. "
+            "I will speak as Clerk Meridian in first person, starting with a pronoun."
+        )
+def test_required_model_failure_stops_trial_without_canned_dialogue():
+    def failing_runner(**kwargs):
+        return ModelResult(
+            text="",
+            input_text="SYSTEM:\nfailed",
+            call=ModelCall(
+                model=kwargs["model"],
+                provider=kwargs.get("provider", "test"),
+                ok=False,
+                latency_ms=1,
+                prompt_hash="test-prompt",
+                error="offline",
+            ),
+        )
+    with pytest.raises(RequiredModelError, match="unavailable"):
+        run_trial(TrialRequest(case_id="socrates"), model_runner=failing_runner)
+def test_invalid_jury_output_stops_trial_without_fallback_votes():
+    def invalid_jury_runner(**kwargs):
+        result = fake_model_runner(**kwargs)
+        if kwargs["role"] == "juror":
+            result.text = "the jury refuses structured output"
+        return result
+    with pytest.raises(RequiredModelError, match="invalid JSON"):
+        run_trial(TrialRequest(case_id="socrates"), model_runner=invalid_jury_runner)
+def test_live_search_stops_when_query_is_weak():
+    with pytest.raises(RuntimeError, match="no fallback case"):
+        run_trial(TrialRequest(case_id="live", search_query="x"), model_runner=fake_model_runner)

tests/test_ui_rendering.py CHANGED Viewed

@@ -1,283 +1,578 @@
-import inspect
-from pathlib import Path
-from PIL import Image
-import app
-from sovereign_bench.models import AgentTurn, EvidenceItem, JurorVote, TrialEvent
-OLD_CARD_CLASSES = [
-    "paper-panel",
-    "juror-panel",
-    "mind-panel",
-    "empty-state",
-    "trial-downloads",
-]
-def _event_with_lower_tab_data() -> TrialEvent:
-    evidence = EvidenceItem(
-        id="E1",
-        title="Ledger entry",
-        source="Archive",
-        excerpt="A short exhibit excerpt.",
-        supports="claimant",
-        reliability=0.82,
-        note="Useful but incomplete.",
-    )
-    vote = JurorVote(
-        juror="Karl Marx",
-        persona=app.JUROR_PERSONAS["Karl Marx"],
-        vote="liable",
-        reason="The exhibit supports the claim.",
-        evidence_ids=["E1"],
-    )
-    return TrialEvent(
-        phase="deliberation",
-        title="Jury weighs the record",
-        body="The jury reviews the record.",
-        turns=[
-            AgentTurn(
-                agent="Nemotron Jury",
-                role="juror panel",
-                content="The jurors compare E1 and state their votes.",
-                model="nvidia/Nemotron-Orchestrator-8B",
-                confidence=0.84,
-                input="SYSTEM:\nYou are the jury.\n\nUSER:\nWeigh E1 and explain the vote.",
-            )
-        ],
-        evidence=[evidence],
-        votes=[vote],
-        trace={"mode": "test"},
-    )
-def _speaker_event(agent: str, phase: str = "questions") -> TrialEvent:
-    return TrialEvent(
-        phase=phase,
-        title=f"{agent} speaks",
-        body="A single speaker takes the floor.",
-        turns=[
-            AgentTurn(
-                agent=agent,
-                role="test speaker",
-                content=f"{agent} has the visible floor.",
-                model="test-model",
-                confidence=0.9,
-                input="SYSTEM:\nTest prompt.",
-            )
-        ],
-    )
-def test_lower_tab_renderers_emit_plain_text_classes():
-    event = _event_with_lower_tab_data()
-    html = "\n".join(
-        [
-            app.render_evidence([]),
-            app.render_evidence([event]),
-            app.render_jurors([]),
-            app.render_jurors([event]),
-            app.render_mind([], True),
-            app.render_mind([event], True),
-            app.render_mind([event], False),
-        ]
-    )
-    for class_name in OLD_CARD_CLASSES:
-        assert class_name not in html
-    assert "drawer-text-block" in html
-    assert "drawer-empty" in html
-    assert "mind-text" in html
-def test_download_controls_are_not_wired_into_app():
-    source = inspect.getsource(app.build_app)
-    assert "DownloadButton" not in source
-    assert "Download decree" not in source
-    assert "Download agent trace" not in source
-def test_courtroom_splits_six_jurors_between_side_benches():
-    html = app.render_court([_event_with_lower_tab_data()], started=True)
-    assert "jury-benches left" in html
-    assert "jury-benches right" in html
-    assert html.count("<a class='juror") == 6
-    assert html.find("jury-benches left") < html.find("jury-benches right")
-    assert ".jury-benches.left {\n  left: 1%;" in app.CSS
-    assert ".jury-benches.right {\n  right: 1%;" in app.CSS
-    assert ".jury-benches.left {\n    left: .5%;" in app.CSS
-    assert ".jury-benches.right {\n    right: .5%;" in app.CSS
-def test_courtroom_threads_show_model_input_output_on_hover_and_click():
-    html = app.render_court([_event_with_lower_tab_data()], started=True)
-    assert "tooltip-io-label'>Input" in html
-    assert "tooltip-io-label'>Output" in html
-    assert "Click to open full thread" in html
-    assert "class='ai-thread-modal'" in html
-    assert "thread-block'>SYSTEM:" in html
-    assert "The jurors compare E1 and state their votes." in html
-    assert "href='#ai-thread-karl-marx'" in html
-def test_courtroom_renders_historical_judge_and_juror_assets():
-    html = app.render_court([_event_with_lower_tab_data()], started=True)
-    assert "Marcus Aurelius" in html
-    assert "assets/characters/marcus-aurelius.png" in html
-    for name, image in app.JUROR_IMAGES.items():
-        assert name in html
-        assert image in html
-    assert html.count("class='juror-portrait'") == 6
-def test_courtroom_renders_foreground_fences_and_judge_table_above_characters():
-    html = app.render_court([_event_with_lower_tab_data()], started=True)
-    assert html.count("assets/foreground/foregroundFence.png") == 2
-    assert "assets/foreground/JudgeTable.png" in html
-    assert html.find("class='puppet judge") < html.find("class='foreground-props'")
-    assert ".foreground-props {\n  position: absolute;\n  inset: 0;\n  z-index: 13;" in app.CSS
-    assert ".puppet {\n  --skin: #c99257;" in app.CSS
-    assert "z-index: 8;" in app.CSS
-def test_foreground_prop_assets_have_real_transparency():
-    for path in [
-        Path("assets/foreground/foregroundFence.png"),
-        Path("assets/foreground/JudgeTable.png"),
-    ]:
-        alpha = Image.open(path).convert("RGBA").getchannel("A")
-        histogram = alpha.histogram()
-        assert histogram[0] > 0, f"{path} has no fully transparent pixels"
-        assert histogram[255] > 0, f"{path} has no fully opaque prop pixels"
-def test_latest_speaker_sets_stage_class_and_speech_bubble():
-    html = app.render_court([_speaker_event("Advocate Auric", phase="claims")], started=True)
-    assert "speaker-auric" in html
-    assert "class='speech-bubble'" in html
-    assert "Advocate Auric has the visible floor." in html
-    assert "puppet auric active walking" in html
-    assert "puppet sable active" not in html
-def test_individual_juror_can_be_active_speaker():
-    event = TrialEvent(
-        phase="deliberation",
-        title="Juror Karl Marx Votes",
-        body=app.JUROR_PERSONAS["Karl Marx"],
-        turns=[
-            AgentTurn(
-                agent="Karl Marx",
-                role="juror",
-                content="Liable. E1 carries the record.",
-                model="nvidia/Nemotron-Orchestrator-8B",
-                confidence=0.86,
-                input="SYSTEM:\nJury JSON prompt.",
-            )
-        ],
-        votes=[
-            JurorVote(
-                juror="Karl Marx",
-                persona=app.JUROR_PERSONAS["Karl Marx"],
-                vote="liable",
-                reason="E1 carries the record.",
-                evidence_ids=["E1"],
-            )
-        ],
-    )
-    html = app.render_court([event], started=True)
-    assert "speaker-karl-marx" in html
-    assert "<a class='juror active'" in html
-    assert "Liable. E1 carries the record." in html
-def test_lawyer_movement_css_is_speaker_specific_not_phase_wide():
-    assert ".speaker-auric .puppet.auric" in app.CSS
-    assert ".speaker-sable .puppet.sable" in app.CSS
-    assert ".phase-claims .puppet.auric" not in app.CSS
-    assert ".phase-opening .puppet.sable" not in app.CSS
-def test_closed_book_is_smaller_and_key_characters_are_lowered():
-    assert ".episode-book.closed {\n  top: 61%;\n  width: min(163px, 20vw);" in app.CSS
-    assert ".puppet.judge {\n  left: 50%;\n  top: 56%;" in app.CSS
-    assert ".puppet.auric {\n  left: 24%;\n  top: 87%;" in app.CSS
-    assert ".speaker-auric .puppet.auric {\n  left: 43%;\n  top: 91%;" in app.CSS
-    assert ".puppet.auditor {\n  left: 71%;\n  top: 80%;" in app.CSS
-    assert ".episode-book.closed {\n    top: 750px;\n    width: 140px;" in app.CSS
-    assert ".puppet.judge {\n    top: 717px;" in app.CSS
-    assert ".puppet.auric {\n    left: 20%;\n    top: 970px;" in app.CSS
-    assert ".puppet.auditor {\n    left: 78%;\n    top: 860px;" in app.CSS
-def test_run_ui_yields_five_outputs_without_download_status(monkeypatch):
-    event = _event_with_lower_tab_data()
-    monkeypatch.setattr(app, "get_events", lambda request: iter([event]))
-    outputs = list(app.run_ui("Trial of Socrates", "", "", "swift", True))
-    assert outputs
-    assert all(len(output) == 5 for output in outputs)
-    assert outputs[1][-1] == "Step 1: Jury weighs the record"
-    assert outputs[-1][-1] == "Verdict sealed."
-    assert "download" not in outputs[-1][-1].lower()
-def test_run_ui_stops_with_model_unavailable_error(monkeypatch):
-    def broken_events(request):
-        raise RuntimeError("Marcus Aurelius unavailable: offline")
-        yield
-    monkeypatch.setattr(app, "get_events", broken_events)
-    outputs = list(app.run_ui("Trial of Socrates", "", "", "swift", True))
-    assert outputs[-1][-1] == "Model response required. Trial stopped: Marcus Aurelius unavailable: offline"
-    assert "Claimant score" not in outputs[-1][0]
-def test_court_renders_sound_toggle():
-    html = app.render_court([])
-    assert "sound-toggle" in html
-    assert "aria-label='Toggle sound'" in html
-    assert "aria-pressed='false'" in html
-def test_audio_controller_has_score_breathing_and_mute_toggle():
-    assert "SCORE_BREATH_INTERVAL_MS = 20000" in app.APP_JS
-    assert "SCORE_BREATH_DURATION_MS = 5000" in app.APP_JS
-    assert "toggleMuted()" in app.APP_JS
-    assert "this.fadeScore(SCORE_QUIET_VOLUME, halfDuration" in app.APP_JS
-def test_courtroom_background_has_no_overlay_or_character_shadow():
-    assert "background: #141413 !important;" in app.CSS
-    assert "background-color: #141413 !important;" in app.CSS
-    assert "cover fixed no-repeat" not in app.CSS
-    assert ".court-episode-stage::before {\n  content: \"\";\n  display: none;" in app.CSS
-    assert ".court-episode-stage::after {\n  content: \"\";\n  display: none;" in app.CSS
-    assert "url('/gradio_api/file=assets/background/CourtRoom.png') center center / 100% 100% no-repeat" in app.CSS
-    assert "filter: drop-shadow(0 12px 14px" not in app.CSS
-    assert "filter: drop-shadow(0 8px 10px" not in app.CSS
-def test_synthetic_stage_props_do_not_tint_background():
-    assert ".bench-front {\n  display: none;" in app.CSS
-    assert ".trial-floor-mark {\n  display: none;" in app.CSS
-    assert ".gallery-benches {\n  display: none;" in app.CSS
-    assert ".prop-label {\n  display: none;" in app.CSS
-    assert ".counsel-table" in app.CSS
-    assert "background: transparent;\n  box-shadow: none;" in app.CSS
-    assert ".witness-area" in app.CSS

+import inspect
+import json
+from pathlib import Path
+from PIL import Image
+import app
+from sovereign_bench.models import AgentTurn, EvidenceItem, JurorVote, TrialEvent, Verdict
+OLD_CARD_CLASSES = [
+    "paper-panel",
+    "juror-panel",
+    "mind-panel",
+    "empty-state",
+    "trial-downloads",
+]
+def _event_with_lower_tab_data() -> TrialEvent:
+    evidence = EvidenceItem(
+        id="E1",
+        title="Ledger entry",
+        source="Archive",
+        excerpt="A short exhibit excerpt.",
+        supports="claimant",
+        reliability=0.82,
+        note="Useful but incomplete.",
+    )
+    vote = JurorVote(
+        juror="Karl Marx",
+        persona=app.JUROR_PERSONAS["Karl Marx"],
+        vote="liable",
+        reason="The exhibit supports the claim.",
+        evidence_ids=["E1"],
+    )
+    return TrialEvent(
+        phase="deliberation",
+        title="Jury weighs the record",
+        body="The jury reviews the record.",
+        turns=[
+            AgentTurn(
+                agent="Nemotron Jury",
+                role="juror panel",
+                content="The jurors compare E1 and state their votes.",
+                model="nvidia/Nemotron-Orchestrator-8B",
+                confidence=0.84,
+                input="SYSTEM:\nYou are the jury.\n\nUSER:\nWeigh E1 and explain the vote.",
+            )
+        ],
+        evidence=[evidence],
+        votes=[vote],
+        trace={"mode": "test"},
+    )
+def _speaker_event(agent: str, phase: str = "questions") -> TrialEvent:
+    return TrialEvent(
+        phase=phase,
+        title=f"{agent} speaks",
+        body="A single speaker takes the floor.",
+        turns=[
+            AgentTurn(
+                agent=agent,
+                role="test speaker",
+                content=f"{agent} has the visible floor.",
+                model="test-model",
+                confidence=0.9,
+                input="SYSTEM:\nTest prompt.",
+            )
+        ],
+    )
+def _verdict_event(finding: str = "liable") -> TrialEvent:
+    return TrialEvent(
+        phase="verdict",
+        title="The Court Announces Judgment",
+        body="Judgment is announced.",
+        verdict=Verdict(
+            finding=finding,
+            decree="The court enters the final judgment.",
+            rationale="The jury majority decides the record.",
+            evidence_ids=["E1"],
+            uncertainty="Some uncertainty remains.",
+            remedy="Record the judgment.",
+        ),
+        turns=[
+            AgentTurn(
+                agent=app.JUDGE_NAME,
+                role="verdict writer",
+                content="The judgment of the court is guilty.",
+                model="test-model",
+                confidence=0.9,
+                input="SYSTEM:\nAnnounce verdict.",
+            )
+        ],
+    )
+def test_lower_tab_renderers_emit_plain_text_classes():
+    event = _event_with_lower_tab_data()
+    html = "\n".join(
+        [
+            app.render_evidence([]),
+            app.render_evidence([event]),
+            app.render_jurors([]),
+            app.render_jurors([event]),
+            app.render_mind([], True),
+            app.render_mind([event], True),
+            app.render_mind([event], False),
+        ]
+    )
+    for class_name in OLD_CARD_CLASSES:
+        assert class_name not in html
+    assert "drawer-text-block" in html
+    assert "drawer-empty" in html
+    assert "mind-text" in html
+def test_download_controls_are_not_wired_into_app():
+    source = inspect.getsource(app.build_app)
+    assert "DownloadButton" not in source
+    assert "Download decree" not in source
+    assert "Download agent trace" not in source
+def test_case_dropdown_only_exposes_demo_and_custom_cases():
+    assert list(app.CASE_OPTIONS) == ["Trial of Socrates", "Greg Heffley vs Mom", "Custom"]
+    assert "The People v. Barnaby Buttons" not in app.CASE_OPTIONS
+    assert "Live Search Tribunal" not in app.CASE_OPTIONS
+def test_courtroom_splits_six_jurors_between_side_benches():
+    html = app.render_court([_event_with_lower_tab_data()], started=True)
+    assert "jury-benches left" in html
+    assert "jury-benches right" in html
+    assert html.count("<a class='juror") == 6
+    assert html.find("jury-benches left") < html.find("jury-benches right")
+    assert ".jury-benches.left {\n  left: 1%;" in app.CSS
+    assert ".jury-benches.right {\n  right: 1%;" in app.CSS
+    assert ".jury-benches.left {\n    left: .5%;" in app.CSS
+    assert ".jury-benches.right {\n    right: .5%;" in app.CSS
+def test_courtroom_threads_show_model_input_output_on_hover_and_click():
+    html = app.render_court([_event_with_lower_tab_data()], started=True)
+    assert "tooltip-io-label'>Input" in html
+    assert "tooltip-io-label'>Output" in html
+    assert "Click to open full thread" in html
+    assert "class='ai-thread-modal'" in html
+    assert "thread-block'>SYSTEM:" in html
+    assert "The jurors compare E1 and state their votes." in html
+    assert "href='#ai-thread-karl-marx'" in html
+def test_courtroom_renders_historical_judge_and_juror_assets():
+    html = app.render_court([_event_with_lower_tab_data()], started=True)
+    assert "Marcus Aurelius" in html
+    assert "assets/characters/marcus-aurelius.png" in html
+    assert "<img class='puppet-portrait' src='/gradio_api/file=assets/characters/marcus-aurelius.png'" in html
+    assert ".puppet.judge::before,\n.puppet.judge::after {\n  display: none;\n}" in app.CSS
+    assert ".puppet.judge .mouth {\n  display: none;\n}" in app.CSS
+    for name, image in app.JUROR_IMAGES.items():
+        assert name in html
+        assert image in html
+    assert html.count("class='juror-portrait'") == 6
+    assert "class='juror-face'" not in html
+    assert "class='juror-body'" not in html
+def test_courtroom_renders_foreground_fences_and_judge_table_above_characters():
+    html = app.render_court([_event_with_lower_tab_data()], started=True)
+    assert html.count("assets/foreground/foregroundFence.png") == 2
+    assert "assets/foreground/JudgeTable.png" in html
+    assert html.find("class='puppet judge") < html.find("class='foreground-props'")
+    assert ".foreground-props {\n  position: absolute;\n  inset: 0;\n  z-index: 13;" in app.CSS
+    assert ".puppet {\n  --skin: #c99257;" in app.CSS
+    assert "z-index: 8;" in app.CSS
+    assert ".puppet.clerk {\n  left: 43%;\n  top: 66%;\n  z-index: 14;" in app.CSS
+def test_trial_progress_defaults_to_pretrial_and_renders_all_stages():
+    html = app.render_court([])
+    assert "class='trial-progress'" in html
+    assert "data-phase='pretrial' aria-current='step'" in html
+    for _key, label in app.TRIAL_PROGRESS_STAGES:
+        assert label in html
+def test_trial_progress_marks_questions_current():
+    html = app.render_court([_speaker_event("Mike OSS", phase="questions")], started=True)
+    assert "class='trial-progress-segment current' data-phase='questions' aria-current='step'" in html
+    assert "data-phase='evidence'" in html
+def test_trial_progress_marks_deliberation_current():
+    html = app.render_court([_event_with_lower_tab_data()], started=True)
+    assert "class='trial-progress-segment current' data-phase='deliberation' aria-current='step'" in html
+    assert "class='trial-progress-segment complete' data-phase='questions'" in html
+def test_trial_progress_marks_verdict_current_and_complete():
+    html = app.render_court([_speaker_event(app.JUDGE_NAME, phase="verdict")], started=True)
+    assert "class='trial-progress-segment current complete' data-phase='verdict' aria-current='step'" in html
+    assert "class='trial-progress-segment complete' data-phase='deliberation'" in html
+def test_verdict_popup_renders_only_when_final_verdict_is_revealed():
+    event = _verdict_event("liable")
+    announcement = app.render_court([event], started=True)
+    sealed = app.render_court([event], started=True, show_verdict_popup=True)
+    assert "class='speech-bubble active-dialogue speaker-judge'" in announcement
+    assert "class='verdict-popup'" not in announcement
+    assert "class='speech-bubble active-dialogue speaker-judge'" in sealed
+    assert "class='verdict-popup'" in sealed
+    assert "data-finding='liable'" in sealed
+    assert "Verdict: Guilty" in sealed
+def test_run_ui_reveals_verdict_popup_after_judge_speech(monkeypatch):
+    event = _verdict_event("not_liable")
+    monkeypatch.setattr(app, "get_events", lambda request: iter([event]))
+    monkeypatch.setattr(app, "_reading_duration", lambda text: 0)
+    outputs = list(app.run_ui("Trial of Socrates", "", "", "", "swift", True))
+    assert "class='speech-bubble active-dialogue speaker-judge'" in outputs[1][0]
+    assert "class='verdict-popup'" not in outputs[1][0]
+    assert outputs[-1][-1] == "Verdict sealed."
+    assert "class='verdict-popup'" in outputs[-1][0]
+    assert "Verdict: Not Guilty" in outputs[-1][0]
+def test_trial_progress_ignores_unknown_phase_without_extra_segment():
+    html = app.render_court([_speaker_event("Clerk Meridian", phase="appeal")], started=True)
+    assert "class='trial-progress'" in html
+    assert html.count("class='trial-progress-segment") == len(app.TRIAL_PROGRESS_STAGES)
+    assert "aria-current='step'" not in html
+    assert "class='trial-progress-segment' data-phase='appeal'" not in html
+def test_trial_progress_css_is_fixed_and_translucent_theme_matched():
+    assert ".trial-progress {\n  position: fixed;\n  top: 0;" in app.CSS
+    assert "background: rgba(23, 13, 8, .58);" in app.CSS
+    assert "backdrop-filter: blur(8px);" in app.CSS
+    assert "background: #ffd675;" in app.CSS
+    assert ".trial-progress-abbrev {\n    display: inline;" in app.CSS
+def test_foreground_prop_assets_have_real_transparency():
+    for path in [
+        Path("assets/foreground/foregroundFence.png"),
+        Path("assets/foreground/JudgeTable.png"),
+    ]:
+        alpha = Image.open(path).convert("RGBA").getchannel("A")
+        histogram = alpha.histogram()
+        assert histogram[0] > 0, f"{path} has no fully transparent pixels"
+        assert histogram[255] > 0, f"{path} has no fully opaque prop pixels"
+def test_latest_speaker_sets_stage_class_and_speech_bubble():
+    html = app.render_court([_speaker_event("Mike OSS", phase="claims")], started=True)
+    assert "speaker-auric" in html
+    assert "class='speech-bubble active-dialogue speaker-auric'" in html
+    assert "data-speaker='Mike OSS'" in html
+    assert "<strong>Mike OSS</strong>" in html
+    assert "test speaker" in html
+    assert "Mike OSS has the visible floor." in html
+    assert "puppet auric active walking" in html
+    assert "puppet sable active" not in html
+    assert html.count("class='speech-bubble") == 1
+    assert html.find("class='foreground-props'") < html.find("class='speech-bubble active-dialogue")
+    assert ".speech-bubble.active-dialogue,\n.speech-bubble.active-dialogue * {\n  color: #141413 !important;\n}" in app.CSS
+    assert "border: 2px solid #141413;" in app.CSS
+    assert "font-size: 12px;" in app.CSS
+def test_speech_bubble_uses_full_turn_content_not_event_body():
+    long_text = " ".join(["The record speaks plainly"] * 18) + " with a final visible phrase."
+    event = TrialEvent(
+        phase="questions",
+        title="Counsel answers",
+        body="Narration only, not spoken dialogue.",
+        turns=[
+            AgentTurn(
+                agent="Harvey Vector",
+                role="respondent advocate",
+                content=long_text,
+                model="test-model",
+                confidence=0.9,
+            )
+        ],
+    )
+    html = app.render_court([event], started=True)
+    bubble = html[html.index("<div class='speech-bubble") : html.index("<div class='gallery-benches")]
+    assert "with a final visible phrase." in bubble
+    assert "Narration only" not in bubble
+    assert "..." not in bubble
+def test_pending_speaker_renders_single_preparing_bubble():
+    pending = app.SpeakerCue(
+        name="Harvey Vector",
+        role="respondent advocate",
+        text="Harvey Vector is preparing a response.",
+        pending=True,
+    )
+    html = app.render_court([], started=True, pending_speaker=pending)
+    assert "class='speech-bubble active-dialogue speaker-sable pending'" in html
+    assert "data-pending='true'" in html
+    assert "Harvey Vector is preparing a response." in html
+    assert "puppet sable active walking" in html
+    assert html.count("class='speech-bubble") == 1
+def test_reading_duration_scales_with_words_and_caps():
+    assert app._reading_duration("short line") == app.MIN_READ_SECONDS
+    assert app._reading_duration("word " * 18) > app.MIN_READ_SECONDS
+    assert app._reading_duration("word " * 200) == app.MAX_READ_SECONDS
+def test_individual_juror_can_be_active_speaker():
+    event = TrialEvent(
+        phase="deliberation",
+        title="Juror Karl Marx Votes",
+        body=app.JUROR_PERSONAS["Karl Marx"],
+        turns=[
+            AgentTurn(
+                agent="Karl Marx",
+                role="juror",
+                content="Liable. E1 carries the record.",
+                model="nvidia/Nemotron-Orchestrator-8B",
+                confidence=0.86,
+                input="SYSTEM:\nJury JSON prompt.",
+            )
+        ],
+        votes=[
+            JurorVote(
+                juror="Karl Marx",
+                persona=app.JUROR_PERSONAS["Karl Marx"],
+                vote="liable",
+                reason="E1 carries the record.",
+                evidence_ids=["E1"],
+            )
+        ],
+    )
+    html = app.render_court([event], started=True)
+    assert "speaker-karl-marx" in html
+    assert "<a class='juror active'" in html
+    assert "class='speech-bubble active-dialogue speaker-karl-marx juror-dialogue'" in html
+    assert "Liable. E1 carries the record." in html
+    assert html.count("class='speech-bubble") == 1
+def test_juror_speech_bubbles_anchor_above_side_benches():
+    assert ".speech-bubble.active-dialogue.juror-dialogue {\n  top: 42%;" in app.CSS
+    assert ".speech-bubble.active-dialogue.speaker-karl-marx,\n.speech-bubble.active-dialogue.speaker-john-stuart-mill,\n.speech-bubble.active-dialogue.speaker-confucius {\n  left: 1.5%;" in app.CSS
+    assert ".speech-bubble.active-dialogue.speaker-cleopatra-vii,\n.speech-bubble.active-dialogue.speaker-niccolo-machiavelli,\n.speech-bubble.active-dialogue.speaker-jensen-huang {\n  right: 1.5%;" in app.CSS
+    assert "--bubble-tail-x: 19%;" in app.CSS
+    assert "--bubble-tail-x: 81%;" in app.CSS
+    assert ".speech-bubble.active-dialogue.juror-dialogue,\n  .speech-bubble.active-dialogue.speaker-karl-marx" in app.CSS
+    assert "top: 500px;" in app.CSS
+def test_lawyer_movement_css_is_speaker_specific_not_phase_wide():
+    assert ".speaker-auric .puppet.auric" in app.CSS
+    assert ".speaker-sable .puppet.sable" in app.CSS
+    assert ".phase-claims .puppet.auric" not in app.CSS
+    assert ".phase-opening .puppet.sable" not in app.CSS
+def test_closed_book_and_key_characters_align_with_judge_table():
+    assert ".episode-book {\n  position: absolute;\n  left: 50%;\n  top: 122px;\n  z-index: 14;" in app.CSS
+    assert "width: min(980px, calc(100% - 32px));" in app.CSS
+    assert ".episode-book.closed {\n  top: 50%;\n  width: min(163px, 20vw);" in app.CSS
+    assert ".foreground-fence {\n  bottom: -6.5%;\n  width: 47%;" in app.CSS
+    assert ".judge-table-foreground {\n  left: 50%;\n  top: 20%;\n  z-index: 1;\n  width: 39.1%;" in app.CSS
+    assert ".puppet.judge {\n  left: 50%;\n  top: calc(40% + 156px);" in app.CSS
+    assert ".puppet.auric {\n  left: 24%;\n  top: 87%;" in app.CSS
+    assert ".speaker-auric .puppet.auric {\n  left: 43%;\n  top: 87%;" in app.CSS
+    assert ".puppet.sable {\n  left: 75%;\n  top: 87%;" in app.CSS
+    assert ".speaker-sable .puppet.sable {\n  left: 75%;\n  top: 87%;" in app.CSS
+    assert ".puppet.clerk {\n  left: 43%;\n  top: 66%;" in app.CSS
+    assert ".puppet.auditor" not in app.CSS
+    assert ".episode-book.closed {\n    top: 640px;\n    width: 140px;" in app.CSS
+    assert ".episode-book {\n    top: 218px;\n    width: min(680px, calc(100% - 20px));" in app.CSS
+    assert ".foreground-fence {\n    bottom: -66px;\n    width: 64%;" in app.CSS
+    assert ".judge-table-foreground {\n    top: 213px;\n    width: 646px;" in app.CSS
+    assert ".puppet.auric {\n    left: 20%;\n    top: 970px;" in app.CSS
+    assert ".puppet.sable {\n    left: 80%;\n    top: 970px;" in app.CSS
+    assert ".speaker-sable .puppet.sable {\n    left: 80%;\n    top: 970px;" in app.CSS
+    assert ".puppet.judge {\n    top: 576px;" not in app.CSS
+    assert ".puppet.sable {\n    left: 80%;\n    top: 640px;" not in app.CSS
+    assert ".speaker-sable .puppet.sable {\n    left: 80%;\n    top: 640px;" not in app.CSS
+    assert ".puppet.clerk {\n    left: 35%;\n    top: 880px;" in app.CSS
+    assert ".speech-bubble.active-dialogue.speaker-auditor" not in app.CSS
+def test_open_docket_book_renders_text_above_book_art():
+    html = app.render_court([])
+    assert "class='episode-book'" in html
+    assert "class='book-open-content'" in html
+    assert "Trial details" in html
+    assert "Evidence" in html
+def test_greg_case_preview_uses_cached_context_and_evidence_columns():
+    html = app.render_case_preview("Greg Heffley vs Mom")
+    assert "Greg Heffley v. Mom" in html
+    assert "diary" in html
+    assert "Evidence for Greg Heffley" in html
+    assert "Evidence for Susan Heffley" in html
+def test_custom_case_preview_renders_fillable_book_fields():
+    html = app.render_case_preview("Custom")
+    assert "episode-book custom-book" in html
+    assert "book-context-field" in html
+    assert html.count("book-claimant-field") == 3
+    assert html.count("book-respondent-field") == 3
+def test_custom_payload_builds_trial_request_packet(monkeypatch):
+    captured = {}
+    def fake_events(request):
+        captured["request"] = request
+        return iter([_event_with_lower_tab_data()])
+    monkeypatch.setattr(app, "get_events", fake_events)
+    monkeypatch.setattr(app, "_reading_duration", lambda text: 0)
+    payload = json.dumps(
+        {
+            "context": "A missing bicycle is traced to a disputed garage visit.",
+            "claimant_evidence": ["Garage text", "", "Scuffed tire mark"],
+            "respondent_evidence": ["Neighbor saw bike later", "", ""],
+        }
+    )
+    outputs = list(app.run_ui("Custom", "", "", payload, "swift", True))
+    assert outputs[-1][-1] == "Verdict sealed."
+    request = captured["request"]
+    assert request.case_id == "custom"
+    assert request.custom_case is not None
+    assert request.custom_case.context.startswith("A missing bicycle")
+    assert [item.supports for item in request.custom_case.evidence] == ["claimant", "claimant", "respondent"]
+def test_custom_payload_requires_context_and_both_evidence_sides():
+    payload = json.dumps({"context": "", "claimant_evidence": ["Only one side"], "respondent_evidence": []})
+    outputs = list(app.run_ui("Custom", "", "", payload, "swift", True))
+    assert outputs[-1][-1] == "Custom requires a trial details paragraph."
+def test_run_ui_yields_five_outputs_without_download_status(monkeypatch):
+    event = _event_with_lower_tab_data()
+    monkeypatch.setattr(app, "get_events", lambda request: iter([event]))
+    monkeypatch.setattr(app, "_reading_duration", lambda text: 0)
+    outputs = list(app.run_ui("Trial of Socrates", "", "", "", "swift", True))
+    assert outputs
+    assert all(len(output) == 5 for output in outputs)
+    assert outputs[0][-1] == "Clerk Meridian is preparing their response."
+    assert outputs[1][-1] == "Step 1: Nemotron Jury - Jury weighs the record"
+    assert outputs[-1][-1] == "Verdict sealed."
+    assert "download" not in outputs[-1][-1].lower()
+def test_run_ui_stops_with_model_unavailable_error(monkeypatch):
+    def broken_events(request):
+        raise RuntimeError("Marcus Aurelius unavailable: offline")
+        yield
+    monkeypatch.setattr(app, "get_events", broken_events)
+    outputs = list(app.run_ui("Trial of Socrates", "", "", "", "swift", True))
+    assert outputs[-1][-1] == "Model response required. Trial stopped: Marcus Aurelius unavailable: offline"
+    assert "Claimant score" not in outputs[-1][0]
+def test_remote_events_uses_default_modal_endpoint_without_local_token(monkeypatch):
+    captured = {}
+    class FakeResponse:
+        def __enter__(self):
+            return self
+        def __exit__(self, exc_type, exc, traceback):
+            return False
+        def raise_for_status(self):
+            return None
+        def iter_lines(self):
+            event = _speaker_event("Clerk Meridian", phase="intake")
+            yield json.dumps(event.model_dump())
+    def fake_stream(method, endpoint, json, timeout):
+        captured["method"] = method
+        captured["endpoint"] = endpoint
+        captured["payload"] = json
+        captured["timeout"] = timeout
+        return FakeResponse()
+    monkeypatch.delenv("MODAL_TRIAL_URL", raising=False)
+    monkeypatch.delenv("HF_TOKEN", raising=False)
+    monkeypatch.setattr(app.httpx, "stream", fake_stream)
+    event = next(app.get_events(app.TrialRequest(case_id="socrates"), delay=0.0))
+    assert captured["method"] == "POST"
+    assert captured["endpoint"] == app.DEFAULT_MODAL_TRIAL_URL
+    assert captured["timeout"] == 900.0
+    assert event.turns[0].agent == "Clerk Meridian"
+def test_court_renders_sound_toggle():
+    html = app.render_court([])
+    assert "sound-toggle" in html
+    assert "aria-label='Toggle sound'" in html
+    assert "aria-pressed='false'" in html
+def test_audio_controller_has_score_breathing_and_mute_toggle():
+    assert "SCORE_BREATH_INTERVAL_MS = 20000" in app.APP_JS
+    assert "SCORE_BREATH_DURATION_MS = 5000" in app.APP_JS
+    assert "toggleMuted()" in app.APP_JS
+    assert "this.fadeScore(SCORE_QUIET_VOLUME, halfDuration" in app.APP_JS
+def test_courtroom_background_has_no_overlay_or_character_shadow():
+    assert "background: #141413 !important;" in app.CSS
+    assert "background-color: #141413 !important;" in app.CSS
+    assert "cover fixed no-repeat" not in app.CSS
+    assert ".court-episode-stage::before {\n  content: \"\";\n  display: none;" in app.CSS
+    assert ".court-episode-stage::after {\n  content: \"\";\n  display: none;" in app.CSS
+    assert "url('/gradio_api/file=assets/background/CourtRoom.png') center center / 100% 100% no-repeat" in app.CSS
+    assert "filter: drop-shadow(0 12px 14px" not in app.CSS
+    assert "filter: drop-shadow(0 8px 10px" not in app.CSS
+def test_synthetic_stage_props_do_not_tint_background():
+    assert ".bench-front {\n  display: none;" in app.CSS
+    assert ".trial-floor-mark {\n  display: none;" in app.CSS
+    assert ".gallery-benches {\n  display: none;" in app.CSS
+    assert ".prop-label {\n  display: none;" in app.CSS
+    assert ".counsel-table" in app.CSS
+    assert "background: transparent;\n  box-shadow: none;" in app.CSS
+    assert ".witness-area" in app.CSS