Update Judge-GPT code and README

#3
by AliIqbal05 - opened
README.md CHANGED
@@ -9,97 +9,165 @@ app_file: app.py
9
  pinned: false
10
  license: mit
11
  short_description: AI-native miniature trials under 32B.
 
 
 
 
 
 
 
12
  ---
13
 
14
  # Judge-GPT
15
 
16
- Judge-GPT is a cinematic Gradio Space for the Build Small Hackathon's Thousand Token Wood track. It runs two-minute AI-native miniature trials where small-model agents act as advocates, judge, jurors, clerk, and evidence auditor.
17
 
18
- The app is built to stay under the 32B named-model budget:
19
 
20
- - `openai/gpt-oss-20b` for primary legal reasoning.
21
- - `openbmb/AgentCPM-Explore` for clerk/stage/verdict style.
22
- - `nvidia/Nemotron-Orchestrator-8B` for juror and evidence-auditor review.
23
 
24
- Total named budget: 32B parameters.
 
 
 
 
25
 
26
- ## What the app can do
27
 
28
- - Run cached trials for the Socrates and Barnaby demo cases without network search.
29
- - Run the Live Search Tribunal path, which builds a search packet from a user query and stops if live material is too weak to support a trial.
30
- - Add a hypothetical sidebar to shift the framing of a trial without editing cached case files.
31
- - Switch trial pacing between swift, measured, and ceremonial speeds.
32
- - Stage the courtroom with phase-specific visuals, agent puppets, evidence props, captions, and browser audio cues.
33
- - Show the Mind Layer as a compact JSON trace of agent turns and phase metadata.
34
- - Call a Modal streaming endpoint when `MODAL_TRIAL_URL` is configured. Endpoint or model failures stop the trial instead of substituting cached dialogue.
35
- - Retain decree and agent-trace export helpers in `sovereign_bench/export.py` for future UI restoration.
36
 
37
- ## Limitations
38
 
39
- - Judge-GPT is not legal advice and should not be used for real legal decisions.
40
- - Live search snippets are not independently verified by the app.
41
- - Output quality depends on Modal GPU availability, token limits, and the configured Hugging Face models.
42
- - Model, Modal, or live retrieval failures stop the current trial rather than returning substitute courtroom dialogue.
43
- - Trial results are not persisted across sessions.
44
- - Export generation remains in the codebase, but the visible download UI is currently hidden.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
- ## Run locally
47
 
48
  ```powershell
49
  python -m pip install -r requirements.txt
50
  python app.py
51
  ```
52
 
53
- ## Modal backend
54
 
55
- The Gradio app works locally without Modal. If `MODAL_TRIAL_URL` is set, the Space calls the Modal streaming endpoint and stops the trial if the endpoint is unavailable.
 
 
56
 
57
- The deployed Modal endpoint runs each role prompt through a GPU-backed vLLM class on H100 by default. Traces mark successful GPU calls with `runtime: modal-gpu-vllm`, `provider: modal-gpu-vllm`, and `gpu: H100`. If a GPU/model load fails, the trial stops; the app does not substitute provider or cached dialogue.
58
 
59
  ```powershell
60
  python -m modal deploy modal_app.py
61
  ```
62
 
63
- Keep the deployed endpoint URL as a Hugging Face Space variable named `MODAL_TRIAL_URL`.
64
 
65
- ## Project targets
 
 
66
 
67
- Workspace connected to:
68
 
69
- - GitHub: `https://github.com/aliiqbal24/BuildSmallfinal.git`
70
- - Modal profile: `ali-j-iqbal24`
71
- - Hugging Face user: `AliIqbal05`
72
 
73
- ## Secrets
74
 
75
- Credentials are not committed to this repo.
76
 
77
- - Local Hugging Face CLI auth is stored in the Hugging Face cache.
78
- - Modal auth is stored in the local Modal profile.
79
- - Modal has a secret named `huggingface` with `HF_TOKEN`.
80
 
81
- Use the Modal secret in functions like this:
82
 
83
- ```python
84
- @app.function(secrets=[modal.Secret.from_name("huggingface")])
85
- def run_model():
86
- token = os.getenv("HF_TOKEN")
87
- ```
88
 
89
- ## Developer guide
90
 
91
- - `app.py`: Gradio UI, CSS, JavaScript audio hooks, HTML renderers, and Modal/local streaming switch.
92
- - `sovereign_bench/engine.py`: trial phases, agent orchestration, verdict assembly, and trace construction.
93
- - `sovereign_bench/llm.py`: Hugging Face calls, strict model error handling, and prompt building.
94
- - `sovereign_bench/retrieval.py`: live search packet construction.
95
- - `sovereign_bench/models.py`: Pydantic schemas for cases, evidence, events, turns, votes, and verdicts.
96
- - `sovereign_bench/cases.py`: cached demo case packets.
97
- - `sovereign_bench/export.py`: dormant decree and trace writers.
98
- - `modal_app.py`: Modal deployment and GPU-backed streaming endpoint.
99
- - `tests/`: engine, case, and rendering regression coverage.
100
 
101
- ## Verify Modal to Hugging Face
102
 
103
  ```powershell
104
- python -m modal run modal_app.py
105
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  pinned: false
10
  license: mit
11
  short_description: AI-native miniature trials under 32B.
12
+ tags:
13
+ - track:wood
14
+ - sponsor:openai
15
+ - sponsor:nvidia
16
+ - sponsor:modal
17
+ - achievement:offbrand
18
+ - achievement:fieldnotes
19
  ---
20
 
21
  # Judge-GPT
22
 
23
+ Judge-GPT is a cinematic Gradio courtroom for the Build Small Hackathon's Thousand Token Wood track. It turns a compact evidence packet into a two-minute AI-native trial: a clerk opens the docket, two lawyers argue opposite sides, Marcus Aurelius presides, six fixed-perspective jurors vote, and the court seals a verdict.
24
 
25
+ The point is not legal advice. It is a small-model theater for structured disagreement: evidence is visible, roles are constrained, hidden reasoning is stripped, and every trial leaves a trace of which agent said what.
26
 
27
+ ## Submission Links
 
 
28
 
29
+ - Hugging Face Space: https://huggingface.co/spaces/build-small-hackathon/JudgeGPT
30
+ - Demo video: https://drive.google.com/drive/folders/10pWJ7NVCsnVV7wOlqm4MGWg4Kmh4rMY2?usp=sharing
31
+ - Social post: TODO paste final public social post URL
32
+ - GitHub repo: https://github.com/aliiqbal24/BuildSmallfinal
33
+ - Field guide validator: https://build-small-hackathon-field-guide.hf.space/submit
34
 
35
+ ## What Judges Should Try
36
 
37
+ 1. Open the Space and keep the default `Trial of Socrates`.
38
+ 2. Click `Begin Trial`.
39
+ 3. Watch the courtroom progress from intake to verdict.
40
+ 4. Hover the judge, clerk, lawyers, and jurors to inspect model/agent threads.
41
+ 5. Open the `Evidence Drawer` and `Juror Panel` tabs after the verdict.
42
+ 6. Try `Greg Heffley vs Mom` for a lighter family-court case.
43
+ 7. Try `Custom` to write a short dispute and up to three pieces of evidence per side directly into the docket book.
 
44
 
45
+ ## Why It Fits Build Small
46
 
47
+ - **Thousand Token Wood:** the app is whimsical, theatrical, and AI-native rather than a generic chatbot.
48
+ - **Best Use of Codex:** Codex was used throughout implementation, debugging, UI iteration, tests, and commit prep in the connected GitHub repo.
49
+ - **Nemotron Hardware Prize:** Nemotron is a core runtime model for the jury and juror vote generation.
50
+ - **Best Use of Modal:** the Gradio Space delegates live model inference to a Modal GPU streaming endpoint.
51
+ - **Off-Brand:** the UI pushes past stock Gradio with a custom courtroom, animated puppets, docket book, evidence props, audio cues, and verdict staging.
52
+ - **Field Notes:** this README documents the build idea, model choices, runtime architecture, limitations, and submission checklist.
53
+
54
+ ## Small-Model Budget
55
+
56
+ Every named model is under the 32B parameter cap.
57
+
58
+ | Role | Model | Budgeted size | Used for |
59
+ | --- | --- | ---: | --- |
60
+ | Presiding advocate | `openai/gpt-oss-20b` | 20B | Judge, claimant lawyer, respondent lawyer, verdict voice |
61
+ | Clerk of style | `openbmb/AgentCPM-Explore` | 4B | Clerk/stage voice |
62
+ | Jury ring | `nvidia/Nemotron-Orchestrator-8B` | 8B | Jury panel and six juror votes |
63
+
64
+ Displayed aggregate budget: 32B. The app does not use a model above 32B.
65
+
66
+ ## How It Works
67
+
68
+ Judge-GPT runs a deterministic courtroom sequence over a `CasePacket`:
69
+
70
+ 1. Clerk opens the docket.
71
+ 2. Judge frames the dispute.
72
+ 3. Mike OSS argues for the claimant.
73
+ 4. Harvey Vector argues for the respondent.
74
+ 5. The evidence record is displayed without adding a third lawyer.
75
+ 6. The judge asks a hinge question.
76
+ 7. Each lawyer answers from their side.
77
+ 8. Nemotron Jury retires the panel.
78
+ 9. Six named jurors vote from distinct worldviews.
79
+ 10. The judge announces the final verdict.
80
+
81
+ The shipped demo cases are:
82
+
83
+ - `The Polis v. Socrates`
84
+ - `Greg Heffley v. Mom`
85
+ - `Custom`, built from the docket-book fields in the UI
86
+
87
+ ## Runtime Architecture
88
+
89
+ - `app.py` renders the Gradio UI, courtroom HTML/CSS, audio hooks, case preview book, and live event stream.
90
+ - `sovereign_bench/engine.py` orchestrates trial phases, model calls, evidence events, jury votes, verdict assembly, and trace metadata.
91
+ - `sovereign_bench/llm.py` builds role prompts, calls Hugging Face-compatible chat models, and rejects hidden reasoning or instruction echoes.
92
+ - `sovereign_bench/cases.py` contains the cached demo case packets.
93
+ - `modal_app.py` hosts the GPU-backed streaming endpoint used by the Space.
94
+ - `tests/` contains engine, case, and rendering regression tests.
95
+
96
+ The Gradio app uses `MODAL_TRIAL_URL` when set, otherwise it uses the built-in deployed Modal endpoint. The Modal app owns the Hugging Face token through a Modal secret named `huggingface`; no real credentials are committed.
97
 
98
+ ## Run Locally
99
 
100
  ```powershell
101
  python -m pip install -r requirements.txt
102
  python app.py
103
  ```
104
 
105
+ Open:
106
 
107
+ ```text
108
+ http://127.0.0.1:7860
109
+ ```
110
 
111
+ ## Deploy Modal Backend
112
 
113
  ```powershell
114
  python -m modal deploy modal_app.py
115
  ```
116
 
117
+ After deployment, pre-warm every configured courtroom model in the deployed `sovereign-bench` app so the first trial does not wait for all GPU containers to cold start. Run this after each deploy because deployments reset Modal autoscaler overrides:
118
 
119
+ ```powershell
120
+ python -m modal run modal_app.py::warm_models
121
+ ```
122
 
123
+ If the endpoint changes, set the Hugging Face Space variable:
124
 
125
+ ```text
126
+ MODAL_TRIAL_URL=https://your-modal-endpoint.example
127
+ ```
128
 
129
+ ## Deploy Hugging Face Space
130
 
131
+ Create or upload this repo as a Gradio Space inside the official Build Small org:
132
 
133
+ ```text
134
+ build-small-hackathon/<your-space-name>
135
+ ```
136
 
137
+ Space settings:
138
 
139
+ - SDK: Gradio
140
+ - App file: `app.py`
141
+ - Python requirements: `requirements.txt`
142
+ - Optional variable: `MODAL_TRIAL_URL`
143
+ - No Space secret is required if using the hosted Modal endpoint.
144
 
145
+ ## Verification
146
 
147
+ ```powershell
148
+ python -m pytest
149
+ ```
 
 
 
 
 
 
150
 
151
+ Focused checks used during final prep:
152
 
153
  ```powershell
154
+ python -m pytest tests/test_engine.py tests/test_ui_rendering.py
155
  ```
156
+
157
+ ## Limitations
158
+
159
+ - Judge-GPT is not legal advice and should not be used for real legal decisions.
160
+ - The demo packets are compact, staged evidence packets, not exhaustive source research.
161
+ - Model, Modal, or retrieval failures stop the current trial instead of substituting fake dialogue.
162
+ - Trial results are not persisted across sessions.
163
+ - Custom trials require a short case context and evidence from both sides.
164
+
165
+ ## Final Submission Checklist
166
+
167
+ - [ ] Push the repo to the Build Small Hugging Face org as a Gradio Space.
168
+ - [ ] Confirm the Space launches and can complete `Trial of Socrates`.
169
+ - [ ] Record a short demo video showing the trial flow and verdict.
170
+ - [ ] Replace the `Demo video` TODO above with the final public URL.
171
+ - [ ] Publish one social post about the app.
172
+ - [ ] Replace the `Social post` TODO above with the final public URL.
173
+ - [ ] Run the README through the Build Small validator.
app.py CHANGED
@@ -2,13 +2,18 @@ from __future__ import annotations
2
 
3
  import json
4
  import os
 
 
 
5
  from collections.abc import Iterable
 
6
 
7
  import gradio as gr
8
  import httpx
9
 
10
  from sovereign_bench.engine import JUDGE_NAME, JUROR_PERSONAS, stream_trial
11
- from sovereign_bench.models import TrialEvent, TrialRequest
 
12
 
13
 
14
  def _load_env_file() -> None:
@@ -28,10 +33,16 @@ _load_env_file()
28
 
29
  CASE_OPTIONS = {
30
  "Trial of Socrates": "socrates",
31
- "The People v. Barnaby Buttons": "barnaby",
32
- "Live Search Tribunal": "live",
33
  }
34
 
 
 
 
 
 
 
35
  PHASE_GLYPHS = {
36
  "pretrial": "00",
37
  "intake": "01",
@@ -44,6 +55,24 @@ PHASE_GLYPHS = {
44
  "appeal": "08",
45
  }
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  AUDIO_PATHS = {
48
  "score": "/gradio_api/file=assets/audio/courtroom.ogg",
49
  "judgement": "/gradio_api/file=assets/audio/Judgement.ogg",
@@ -102,9 +131,9 @@ body,
102
  .docket-book-controls {
103
  position: fixed;
104
  left: 50%;
105
- top: clamp(172px, 21vh, 212px);
106
  z-index: 9999;
107
- width: min(620px, calc(100vw - 160px));
108
  max-width: none;
109
  margin: 0;
110
  padding: 0;
@@ -202,21 +231,6 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
202
  line-height: 1.25;
203
  }
204
 
205
- .trial-options {
206
- max-width: 1120px;
207
- margin: 0 auto 14px;
208
- border: 1px solid rgba(255, 226, 154, .18);
209
- border-radius: 6px;
210
- background: rgba(18, 9, 5, .78);
211
- color: #f5dfb5;
212
- }
213
-
214
- .trial-options label,
215
- .trial-options span,
216
- .trial-options .prose {
217
- color: #f5dfb5 !important;
218
- }
219
-
220
  .court-episode-stage {
221
  --spot-x: 50%;
222
  --spot-y: 36%;
@@ -250,6 +264,70 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
250
  z-index: 4;
251
  }
252
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
253
  .episode-room {
254
  position: absolute;
255
  inset: 0;
@@ -388,9 +466,9 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
388
  .episode-book {
389
  position: absolute;
390
  left: 50%;
391
- top: 12%;
392
- z-index: 12;
393
- width: min(760px, calc(100% - 32px));
394
  aspect-ratio: 3 / 2;
395
  transform: translateX(-50%) rotateX(0) rotateZ(-1deg);
396
  transform-origin: center bottom;
@@ -400,6 +478,10 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
400
  transition: top .85s ease, width .85s ease, transform .85s ease, filter .85s ease, opacity .85s ease;
401
  }
402
 
 
 
 
 
403
  .book-art {
404
  position: absolute;
405
  inset: 0;
@@ -416,8 +498,8 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
416
  }
417
 
418
  .episode-book.closed {
419
- top: 36%;
420
- width: min(245px, 30vw);
421
  transform: translateX(-50%) rotateX(56deg) rotateZ(1deg);
422
  opacity: .92;
423
  filter: drop-shadow(0 18px 18px rgba(0, 0, 0, .45));
@@ -438,35 +520,91 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
438
 
439
  .book-open-content {
440
  position: absolute;
441
- inset: 17% 10% 13%;
442
  z-index: 2;
443
  display: grid;
444
  grid-template-columns: 1fr 1fr;
445
- gap: 72px;
446
- padding: 0 28px;
447
  transition: opacity .35s ease;
448
  }
449
 
450
  .book-open-content h2 {
451
- margin: 0 0 10px;
452
  color: #4c2a12;
453
- font-size: 30px;
454
  letter-spacing: 0;
455
  }
456
 
457
  .book-open-content p,
458
  .book-entry {
459
  color: #3c2615;
460
- font-size: 15px;
461
- line-height: 1.34;
462
  }
463
 
464
  .book-entry {
465
- margin: 11px 0;
466
  padding-left: 12px;
467
  border-left: 3px solid rgba(111, 61, 23, .36);
468
  }
469
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
470
  .judge-dais {
471
  position: absolute;
472
  left: 50%;
@@ -536,11 +674,11 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
536
  }
537
 
538
  .jury-benches.left {
539
- left: 4.5%;
540
  }
541
 
542
  .jury-benches.right {
543
- right: 4.5%;
544
  }
545
 
546
  .jury-benches.left .jury-row {
@@ -594,7 +732,7 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
594
  }
595
 
596
  .foreground-fence {
597
- bottom: -1.5%;
598
  width: 47%;
599
  }
600
 
@@ -610,9 +748,9 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
610
 
611
  .judge-table-foreground {
612
  left: 50%;
613
- top: 35%;
614
  z-index: 1;
615
- width: 46%;
616
  transform: translateX(-50%);
617
  }
618
 
@@ -650,7 +788,7 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
650
 
651
  .puppet.judge {
652
  left: 50%;
653
- top: 31%;
654
  --skin: #c38a55;
655
  --robe: #1b1b20;
656
  --accent: #79242a;
@@ -660,7 +798,8 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
660
 
661
  .puppet.clerk {
662
  left: 43%;
663
- top: 41%;
 
664
  --skin: #b77b52;
665
  --robe: #365548;
666
  --accent: #2f6f5e;
@@ -668,7 +807,7 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
668
 
669
  .puppet.auric {
670
  left: 24%;
671
- top: 62%;
672
  --skin: #c9975d;
673
  --robe: #5b2719;
674
  --accent: #a45c25;
@@ -676,28 +815,20 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
676
 
677
  .speaker-auric .puppet.auric {
678
  left: 43%;
679
- top: 66%;
680
  }
681
 
682
  .puppet.sable {
683
  left: 75%;
684
- top: 62%;
685
  --skin: #a86d4a;
686
  --robe: #1d3045;
687
  --accent: #254f7a;
688
  }
689
 
690
  .speaker-sable .puppet.sable {
691
- left: 57%;
692
- top: 66%;
693
- }
694
-
695
- .puppet.auditor {
696
- left: 71%;
697
- top: 55%;
698
- --skin: #c6a65b;
699
- --robe: #4b3d1b;
700
- --accent: #8d6b1f;
701
  }
702
 
703
  .puppet-portrait {
@@ -713,10 +844,6 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
713
  pointer-events: none;
714
  }
715
 
716
- .phase-evidence .puppet.auditor {
717
- animation: evidence-focus 1.35s ease-in-out infinite;
718
- }
719
-
720
  .puppet::before {
721
  content: "";
722
  position: absolute;
@@ -749,6 +876,11 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
749
  linear-gradient(180deg, var(--accent), var(--robe) 52%, #130a07);
750
  }
751
 
 
 
 
 
 
752
  .puppet .mouth {
753
  position: absolute;
754
  left: 50%;
@@ -761,42 +893,169 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
761
  border-radius: 0 0 18px 18px;
762
  }
763
 
 
 
 
 
764
  .puppet.active .mouth,
765
  .puppet.walking .mouth {
766
  animation: speak-mouth .5s ease-in-out infinite;
767
  }
768
 
769
- .speech-bubble {
770
  position: absolute;
771
  left: 50%;
772
- bottom: calc(100% + 12px);
773
- z-index: 18;
774
- width: 260px;
775
- max-width: min(320px, calc(100vw - 32px));
776
- transform: translateX(-50%);
777
- padding: 10px 12px;
778
- border: 1px solid rgba(255, 226, 154, .48);
779
- border-radius: 6px;
780
- background: rgba(255, 244, 215, .94);
781
- color: #2d1b0d;
782
- box-shadow: 0 14px 30px rgba(0, 0, 0, .34);
 
 
783
  font-size: 12px;
784
- font-weight: 700;
785
- line-height: 1.3;
786
  pointer-events: none;
787
  }
788
 
789
- .speech-bubble::after {
 
 
 
 
 
 
790
  content: "";
791
  position: absolute;
792
- left: 50%;
793
- bottom: -8px;
794
- width: 14px;
795
- height: 14px;
796
  transform: translateX(-50%) rotate(45deg);
797
- border-right: 1px solid rgba(255, 226, 154, .48);
798
- border-bottom: 1px solid rgba(255, 226, 154, .48);
799
- background: rgba(255, 244, 215, .94);
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
800
  }
801
 
802
  .tooltip {
@@ -931,11 +1190,6 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
931
  animation: juror-react .82s ease-in-out infinite alternate;
932
  }
933
 
934
- .juror .speech-bubble {
935
- bottom: calc(100% + 6px);
936
- width: 230px;
937
- }
938
-
939
  .juror-face {
940
  position: absolute;
941
  left: 50%;
@@ -1195,14 +1449,43 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
1195
  100% { transform: rotate(-18deg) translateY(0); }
1196
  }
1197
 
 
 
 
 
 
 
 
 
 
 
 
1198
  @media (max-width: 820px) {
1199
  .docket-book-controls {
1200
  position: fixed;
1201
- top: 262px;
1202
  width: calc(100vw - 52px);
1203
  transform: translateX(-50%) rotate(-1deg);
1204
  }
1205
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1206
  .court-episode-stage {
1207
  height: 1280px;
1208
  min-height: 1280px;
@@ -1225,21 +1508,64 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
1225
  max-width: calc(100% - 32px);
1226
  }
1227
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1228
  .episode-book {
1229
- top: 220px;
1230
  width: min(680px, calc(100% - 20px));
1231
  }
1232
 
1233
  .episode-book.closed {
1234
- top: 430px;
1235
- width: 210px;
1236
  }
1237
 
1238
  .book-open-content {
1239
  grid-template-columns: 1fr;
1240
  gap: 10px;
1241
- inset: 17% 12% 14%;
1242
- padding: 0 18px;
1243
  }
1244
 
1245
  .book-open-content h2 {
@@ -1257,6 +1583,25 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
1257
  margin: 5px 0;
1258
  }
1259
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1260
  .judge-dais {
1261
  top: 390px;
1262
  width: 280px;
@@ -1278,32 +1623,27 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
1278
 
1279
  .puppet.auric {
1280
  left: 20%;
1281
- top: 650px;
1282
  }
1283
 
1284
  .puppet.sable {
1285
  left: 80%;
1286
- top: 650px;
1287
  }
1288
 
1289
  .speaker-auric .puppet.auric {
1290
  left: 42%;
1291
- top: 730px;
1292
  }
1293
 
1294
  .speaker-sable .puppet.sable {
1295
- left: 58%;
1296
- top: 730px;
1297
  }
1298
 
1299
  .puppet.clerk {
1300
  left: 35%;
1301
- top: 560px;
1302
- }
1303
-
1304
- .puppet.auditor {
1305
- left: 78%;
1306
- top: 540px;
1307
  }
1308
 
1309
  .witness-area {
@@ -1319,15 +1659,15 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
1319
  }
1320
 
1321
  .jury-benches.left {
1322
- left: 5%;
1323
  }
1324
 
1325
  .jury-benches.right {
1326
- right: 5%;
1327
  }
1328
 
1329
  .foreground-fence {
1330
- bottom: -2px;
1331
  width: 64%;
1332
  }
1333
 
@@ -1340,8 +1680,8 @@ body.trial-has-started .docket-book-controls .docket-book-controls {
1340
  }
1341
 
1342
  .judge-table-foreground {
1343
- top: 405px;
1344
- width: 760px;
1345
  }
1346
 
1347
  .evidence-props {
@@ -1530,12 +1870,28 @@ APP_HEAD = f"""
1530
  """
1531
 
1532
  START_JS = """
1533
- (case_label, search_query, hypothetical, speed, mind_layer) => {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1534
  document.body.classList.add('trial-has-started');
1535
  if (window.SovereignCourtAudio) {
1536
  window.SovereignCourtAudio.begin();
1537
  }
1538
- return [case_label, search_query, hypothetical, speed, mind_layer];
1539
  }
1540
  """
1541
 
@@ -1553,24 +1909,18 @@ CHARACTERS = {
1553
  "role": "Court clerk",
1554
  "model": "AgentCPM-Explore",
1555
  },
1556
- "Advocate Auric": {
1557
  "class": "auric",
1558
- "name": "Advocate Auric",
1559
  "role": "Claimant advocate",
1560
  "model": "gpt-oss-20b",
1561
  },
1562
- "Counsel Sable": {
1563
  "class": "sable",
1564
- "name": "Counsel Sable",
1565
  "role": "Respondent advocate",
1566
  "model": "gpt-oss-20b",
1567
  },
1568
- "Auditor Prism": {
1569
- "class": "auditor",
1570
- "name": "Auditor Prism",
1571
- "role": "Evidence auditor",
1572
- "model": "Nemotron-Orchestrator-8B",
1573
- },
1574
  "Nemotron Jury": {
1575
  "class": "jury",
1576
  "name": "Nemotron Jury",
@@ -1597,13 +1947,37 @@ JUROR_IMAGES = {
1597
  "Jensen Huang": "/gradio_api/file=assets/characters/jensen-huang.png",
1598
  }
1599
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1600
  PHASE_AGENTS = {
1601
  "pretrial": ["Clerk Meridian"],
1602
  }
1603
 
1604
 
 
 
 
 
 
 
 
 
 
 
 
1605
  def _remote_events(request: TrialRequest) -> Iterable[TrialEvent] | None:
1606
- endpoint = os.getenv("MODAL_TRIAL_URL", "").strip()
1607
  if not endpoint:
1608
  return None
1609
 
@@ -1617,13 +1991,13 @@ def _remote_events(request: TrialRequest) -> Iterable[TrialEvent] | None:
1617
  return iterator()
1618
 
1619
 
1620
- def get_events(request: TrialRequest) -> Iterable[TrialEvent]:
1621
  remote = _remote_events(request)
1622
  if remote is not None:
1623
  yield from remote
1624
  return
1625
- delay = {"swift": 1.4, "measured": 2.4, "ceremonial": 3.4}[request.speed]
1626
- yield from stream_trial(request, delay=delay)
1627
 
1628
 
1629
  def _escape(value: str) -> str:
@@ -1663,6 +2037,26 @@ def _active_speaker_for(event: TrialEvent | None) -> str:
1663
  return event.turns[0].agent
1664
 
1665
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1666
  def _speaker_class_for(speaker: str) -> str:
1667
  if not speaker:
1668
  return ""
@@ -1680,6 +2074,61 @@ def _latest_turn_text(event: TrialEvent | None, agent: str) -> str:
1680
  return _short_text(turn.content, 210)
1681
 
1682
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1683
  def _thread_id(name: str) -> str:
1684
  return "ai-thread-" + "".join(ch.lower() if ch.isalnum() else "-" for ch in name).strip("-")
1685
 
@@ -1767,17 +2216,51 @@ def _thread_modal(name: str, role: str, model: str, turns: list[dict[str, str]])
1767
  )
1768
 
1769
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1770
  def _puppet(agent: str, active_agents: set[str], phase: str, events: list[TrialEvent], latest: TrialEvent | None) -> str:
1771
  meta = CHARACTERS[agent]
1772
  active = " active" if agent in active_agents else ""
1773
- walking = " walking" if agent in {"Advocate Auric", "Counsel Sable"} and agent in active_agents else ""
1774
- small = " small" if agent in {"Clerk Meridian", "Auditor Prism"} else ""
1775
  turns = _thread_for_character(events, agent)
1776
- bubble = ""
1777
- if agent in active_agents:
1778
- speech = _latest_turn_text(latest, agent)
1779
- if speech:
1780
- bubble = f"<span class='speech-bubble'>{_escape(speech)}</span>"
1781
  portrait = ""
1782
  if meta.get("image"):
1783
  portrait = (
@@ -1788,7 +2271,6 @@ def _puppet(agent: str, active_agents: set[str], phase: str, events: list[TrialE
1788
  f"<a class='puppet {meta['class']}{active}{walking}{small}' href='#{_escape(_thread_id(agent))}' aria-label='Open {_escape(agent)} model thread'>"
1789
  f"{portrait}"
1790
  "<span class='mouth'></span>"
1791
- f"{bubble}"
1792
  f"{_tooltip(meta['name'], meta['role'], meta['model'], turns)}"
1793
  "</a>"
1794
  )
@@ -1799,36 +2281,103 @@ def _juror(name: str, active: bool, events: list[TrialEvent] | None = None, late
1799
  image = JUROR_IMAGES.get(name, "")
1800
  active_cls = " active" if active else ""
1801
  turns = _thread_for_character(events or [], name)
1802
- bubble = ""
1803
- if active:
1804
- vote = next((vote for vote in (latest.votes if latest else []) if vote.juror == name), None)
1805
- speech = _latest_turn_text(latest, name)
1806
- if vote:
1807
- speech = f"{vote.vote.replace('_', ' ').title()}. {vote.reason}"
1808
- if speech:
1809
- bubble = f"<span class='speech-bubble'>{_escape(_short_text(speech, 190))}</span>"
1810
  portrait = (
1811
  f"<img class='juror-portrait' src='{_escape(image)}' alt='{_escape(name)} bust' "
1812
  "onerror=\"this.style.display='none'\">"
1813
  if image
1814
  else ""
1815
  )
 
1816
  return (
1817
  f"<a class='juror{active_cls}' href='#{_escape(_thread_id(name))}' style='--face: {face}' aria-label='Open {_escape(name)} model thread'>"
1818
  f"{portrait}"
1819
- "<span class='juror-face'></span><span class='juror-body'></span>"
1820
- f"{bubble}"
1821
  f"{_tooltip(name, 'HF-style juror', 'Nemotron panel', turns)}"
1822
  "</a>"
1823
  )
1824
 
1825
 
1826
- def _book(open_book: bool) -> str:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1827
  closed = "" if open_book else " closed"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1828
  return (
1829
- f"<div class='episode-book{closed}'>"
1830
  "<img class='book-art open-art' src='/gradio_api/file=assets/book/docket-book-open.png' alt='Open docket book'>"
1831
  "<img class='book-art closed-art' src='/gradio_api/file=assets/book/docket-book-closed.png' alt='Closed docket book'>"
 
 
 
 
1832
  "</div>"
1833
  )
1834
 
@@ -1871,6 +2420,36 @@ def _foreground_props() -> str:
1871
  )
1872
 
1873
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1874
  def _courtroom_juror_names(votes: list) -> list[str]:
1875
  names = list(JUROR_FACES)
1876
  names.extend(vote.juror for vote in votes if vote.juror not in names)
@@ -1887,12 +2466,20 @@ def _latest_votes(events: list[TrialEvent]) -> list:
1887
  return ordered
1888
 
1889
 
1890
- def render_court(events: list[TrialEvent], started: bool = False) -> str:
 
 
 
 
 
 
 
1891
  latest = events[-1] if events else None
1892
  phase = latest.phase if latest else "pretrial"
1893
  title, subtitle = _latest_packet_title(events)
1894
- active_agents = _active_agents_for(latest)
1895
- active_speaker = _active_speaker_for(latest)
 
1896
  speaker_cls = _speaker_class_for(active_speaker)
1897
  caption_phase, caption_title, caption_body = _caption(latest, phase)
1898
  latest_votes = _latest_votes(events)
@@ -1901,7 +2488,7 @@ def render_court(events: list[TrialEvent], started: bool = False) -> str:
1901
  book_open = not started and not events
1902
  puppets = "".join(
1903
  _puppet(agent, active_agents, phase, events, latest)
1904
- for agent in [JUDGE_NAME, "Clerk Meridian", "Advocate Auric", "Counsel Sable", "Auditor Prism"]
1905
  )
1906
  left_jurors = "".join(_juror(name, name == active_speaker, events, latest) for name in juror_names[:3])
1907
  right_jurors = "".join(_juror(name, name == active_speaker, events, latest) for name in juror_names[3:6])
@@ -1915,6 +2502,7 @@ def render_court(events: list[TrialEvent], started: bool = False) -> str:
1915
  )
1916
  return (
1917
  f"<section id='court-stage' class='court-episode-stage phase-{_escape(phase)}{_escape(speaker_cls)}{started_cls}' data-phase='{_escape(phase)}'>"
 
1918
  "<div class='episode-room'></div>"
1919
  "<div class='audio-deck' aria-hidden='true'>"
1920
  + "".join(f"<audio preload='auto' src='{_escape(src)}'></audio>" for src in AUDIO_PATHS.values())
@@ -1926,7 +2514,7 @@ def render_court(events: list[TrialEvent], started: bool = False) -> str:
1926
  f"<h1>{_escape(title)}</h1>"
1927
  f"<p>{_escape(subtitle)}</p></div>"
1928
  f"<div class='decree-ribbon'>Step {len(events) if events else 0}: {caption_title}<br>Hover characters for agent and model details</div>"
1929
- f"{_book(book_open)}"
1930
  f"<div class='judge-dais'><div class='prop-label'>{_escape(JUDGE_NAME)}</div><div class='bench-front'></div><span class='gavel'></span></div>"
1931
  "<div class='counsel-table left'><div class='prop-label'>Claimant Table</div></div>"
1932
  "<div class='counsel-table right'><div class='prop-label'>Respondent Table</div></div>"
@@ -1939,6 +2527,8 @@ def render_court(events: list[TrialEvent], started: bool = False) -> str:
1939
  f"{puppets}"
1940
  f"{evidence_props}"
1941
  f"{_foreground_props()}"
 
 
1942
  "<div class='gallery-benches'><div></div><div></div><div></div><div></div><div></div><div></div></div>"
1943
  "<div class='trial-caption'>"
1944
  f"<div class='caption-phase'>Live Trial Feed / {_escape(caption_phase)}</div>"
@@ -2009,33 +2599,137 @@ def render_mind(events: list[TrialEvent], enabled: bool) -> str:
2009
  return f"<pre class='mind-text'>{_escape(json.dumps(compact, indent=2))}</pre>"
2010
 
2011
 
2012
- def run_ui(case_label: str, search_query: str, hypothetical: str, speed: str, mind_layer: bool):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2013
  request = TrialRequest(
2014
- case_id=CASE_OPTIONS.get(case_label, "socrates"),
2015
  search_query=search_query or "",
2016
  hypothetical=hypothetical or "",
 
2017
  speed=speed or "swift",
2018
  mind_layer=bool(mind_layer),
2019
  )
2020
  events: list[TrialEvent] = []
 
 
2021
  yield (
2022
- render_court(events, started=True),
2023
  render_evidence(events),
2024
  render_jurors(events),
2025
  render_mind(events, mind_layer),
2026
- "The docket closes and the bailiff calls the room to order.",
2027
  )
2028
  try:
2029
- for event in get_events(request):
 
 
 
 
 
 
2030
  events.append(event)
2031
- status = f"Step {len(events)}: {event.title}"
2032
  yield (
2033
  render_court(events, started=True),
2034
  render_evidence(events),
2035
  render_jurors(events),
2036
  render_mind(events, mind_layer),
2037
- status,
2038
  )
 
 
 
 
 
 
 
 
 
 
 
 
2039
  except Exception as exc:
2040
  yield (
2041
  render_court(events, started=True),
@@ -2046,7 +2740,7 @@ def run_ui(case_label: str, search_query: str, hypothetical: str, speed: str, mi
2046
  )
2047
  return
2048
  yield (
2049
- render_court(events, started=True),
2050
  render_evidence(events),
2051
  render_jurors(events),
2052
  render_mind(events, mind_layer),
@@ -2067,13 +2761,12 @@ def build_app() -> gr.Blocks:
2067
  )
2068
  start = gr.Button("Begin Trial", variant="primary", scale=1)
2069
  status = gr.Markdown("Ready.", elem_classes=["book-status"])
2070
- courtroom = gr.HTML(render_court([]), label="Live courtroom")
2071
  search = gr.State("")
 
 
2072
  speed = gr.State("swift")
2073
  mind = gr.State(True)
2074
- with gr.Accordion("Advanced trial options", open=False, elem_classes=["trial-options"]):
2075
- with gr.Row():
2076
- hypo = gr.Textbox(label="Hypothetical sidebar", lines=1)
2077
  with gr.Row(elem_classes=["drawer-shell"]):
2078
  with gr.Column(scale=1):
2079
  with gr.Tab("Evidence Drawer"):
@@ -2081,9 +2774,14 @@ def build_app() -> gr.Blocks:
2081
  with gr.Tab("Juror Panel"):
2082
  jurors = gr.HTML(render_jurors([]))
2083
  mind_html = gr.HTML(render_mind([], True), visible=False)
 
 
 
 
 
2084
  start.click(
2085
  run_ui,
2086
- inputs=[case, search, hypo, speed, mind],
2087
  outputs=[courtroom, evidence, jurors, mind_html, status],
2088
  js=START_JS,
2089
  )
 
2
 
3
  import json
4
  import os
5
+ import queue
6
+ import threading
7
+ import time
8
  from collections.abc import Iterable
9
+ from dataclasses import dataclass
10
 
11
  import gradio as gr
12
  import httpx
13
 
14
  from sovereign_bench.engine import JUDGE_NAME, JUROR_PERSONAS, stream_trial
15
+ from sovereign_bench.cases import CASES, get_case
16
+ from sovereign_bench.models import CasePacket, EvidenceItem, TrialEvent, TrialRequest
17
 
18
 
19
  def _load_env_file() -> None:
 
33
 
34
  CASE_OPTIONS = {
35
  "Trial of Socrates": "socrates",
36
+ "Greg Heffley vs Mom": "greg",
37
+ "Custom": "custom",
38
  }
39
 
40
+ DEFAULT_MODAL_TRIAL_URL = "https://ali-j-iqbal24--trial-stream.modal.run"
41
+ MIN_READ_SECONDS = 2.2
42
+ WORDS_PER_SECOND = 3.2
43
+ READ_BUFFER_SECONDS = 0.8
44
+ MAX_READ_SECONDS = 8.5
45
+
46
  PHASE_GLYPHS = {
47
  "pretrial": "00",
48
  "intake": "01",
 
55
  "appeal": "08",
56
  }
57
 
58
+ TRIAL_PROGRESS_STAGES = (
59
+ ("pretrial", "Pretrial"),
60
+ ("intake", "Intake"),
61
+ ("claims", "Claims"),
62
+ ("opening", "Opening"),
63
+ ("evidence", "Evidence"),
64
+ ("questions", "Questions"),
65
+ ("deliberation", "Deliberation"),
66
+ ("verdict", "Verdict"),
67
+ )
68
+
69
+ VERDICT_LABELS = {
70
+ "liable": "Guilty",
71
+ "not_liable": "Not Guilty",
72
+ "mixed": "Mixed",
73
+ "uncertain": "Uncertain",
74
+ }
75
+
76
  AUDIO_PATHS = {
77
  "score": "/gradio_api/file=assets/audio/courtroom.ogg",
78
  "judgement": "/gradio_api/file=assets/audio/Judgement.ogg",
 
131
  .docket-book-controls {
132
  position: fixed;
133
  left: 50%;
134
+ top: 72px;
135
  z-index: 9999;
136
+ width: min(760px, calc(100vw - 160px));
137
  max-width: none;
138
  margin: 0;
139
  padding: 0;
 
231
  line-height: 1.25;
232
  }
233
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
234
  .court-episode-stage {
235
  --spot-x: 50%;
236
  --spot-y: 36%;
 
264
  z-index: 4;
265
  }
266
 
267
+ .trial-progress {
268
+ position: fixed;
269
+ top: 0;
270
+ left: 0;
271
+ right: 0;
272
+ z-index: 70;
273
+ display: grid;
274
+ grid-template-columns: repeat(8, minmax(0, 1fr));
275
+ gap: 1px;
276
+ padding: 3px clamp(10px, 2vw, 24px) 4px;
277
+ border-bottom: 1px solid rgba(217, 176, 96, .2);
278
+ background: rgba(23, 13, 8, .58);
279
+ backdrop-filter: blur(8px);
280
+ box-shadow: 0 8px 18px rgba(8, 4, 2, .22);
281
+ pointer-events: none;
282
+ }
283
+
284
+ .trial-progress-segment {
285
+ position: relative;
286
+ min-width: 0;
287
+ padding-top: 5px;
288
+ overflow: hidden;
289
+ color: rgba(244, 213, 143, .38);
290
+ font: 800 10px/1 ui-monospace, SFMono-Regular, Consolas, monospace;
291
+ letter-spacing: .04em;
292
+ text-align: center;
293
+ text-transform: uppercase;
294
+ white-space: nowrap;
295
+ }
296
+
297
+ .trial-progress-segment::before {
298
+ content: "";
299
+ position: absolute;
300
+ left: 3px;
301
+ right: 3px;
302
+ top: 0;
303
+ height: 2px;
304
+ border-radius: 999px;
305
+ background: rgba(217, 176, 96, .18);
306
+ }
307
+
308
+ .trial-progress-segment.complete {
309
+ color: rgba(217, 176, 96, .68);
310
+ }
311
+
312
+ .trial-progress-segment.complete::before {
313
+ background: rgba(217, 176, 96, .48);
314
+ }
315
+
316
+ .trial-progress-segment.current {
317
+ color: #ffe6a6;
318
+ text-shadow: 0 0 10px rgba(255, 211, 116, .52);
319
+ }
320
+
321
+ .trial-progress-segment.current::before {
322
+ height: 3px;
323
+ background: #ffd675;
324
+ box-shadow: 0 0 12px rgba(255, 214, 117, .68);
325
+ }
326
+
327
+ .trial-progress-abbrev {
328
+ display: none;
329
+ }
330
+
331
  .episode-room {
332
  position: absolute;
333
  inset: 0;
 
466
  .episode-book {
467
  position: absolute;
468
  left: 50%;
469
+ top: 122px;
470
+ z-index: 14;
471
+ width: min(980px, calc(100% - 32px));
472
  aspect-ratio: 3 / 2;
473
  transform: translateX(-50%) rotateX(0) rotateZ(-1deg);
474
  transform-origin: center bottom;
 
478
  transition: top .85s ease, width .85s ease, transform .85s ease, filter .85s ease, opacity .85s ease;
479
  }
480
 
481
+ .episode-book.custom-book {
482
+ pointer-events: auto;
483
+ }
484
+
485
  .book-art {
486
  position: absolute;
487
  inset: 0;
 
498
  }
499
 
500
  .episode-book.closed {
501
+ top: 50%;
502
+ width: min(163px, 20vw);
503
  transform: translateX(-50%) rotateX(56deg) rotateZ(1deg);
504
  opacity: .92;
505
  filter: drop-shadow(0 18px 18px rgba(0, 0, 0, .45));
 
520
 
521
  .book-open-content {
522
  position: absolute;
523
+ inset: 15% 9% 12%;
524
  z-index: 2;
525
  display: grid;
526
  grid-template-columns: 1fr 1fr;
527
+ gap: 82px;
528
+ padding: 0 20px;
529
  transition: opacity .35s ease;
530
  }
531
 
532
  .book-open-content h2 {
533
+ margin: 0 0 8px;
534
  color: #4c2a12;
535
+ font-size: 28px;
536
  letter-spacing: 0;
537
  }
538
 
539
  .book-open-content p,
540
  .book-entry {
541
  color: #3c2615;
542
+ font-size: 14px;
543
+ line-height: 1.28;
544
  }
545
 
546
  .book-entry {
547
+ margin: 8px 0;
548
  padding-left: 12px;
549
  border-left: 3px solid rgba(111, 61, 23, .36);
550
  }
551
 
552
+ .book-context {
553
+ margin-top: 8px;
554
+ }
555
+
556
+ .book-case-title {
557
+ margin: 0 0 6px;
558
+ color: #4c2a12;
559
+ font-weight: 800;
560
+ }
561
+
562
+ .book-evidence-columns {
563
+ display: grid;
564
+ grid-template-columns: 1fr 1fr;
565
+ gap: 12px;
566
+ }
567
+
568
+ .book-evidence-column h3 {
569
+ margin: 0 0 6px;
570
+ color: #4c2a12;
571
+ font-size: 15px;
572
+ line-height: 1.12;
573
+ }
574
+
575
+ .book-evidence-list {
576
+ margin: 0;
577
+ padding: 0;
578
+ list-style: none;
579
+ }
580
+
581
+ .book-evidence-list li {
582
+ margin: 0 0 6px;
583
+ padding-left: 9px;
584
+ border-left: 2px solid rgba(111, 61, 23, .32);
585
+ color: #3c2615;
586
+ font-size: 12px;
587
+ line-height: 1.2;
588
+ }
589
+
590
+ .book-field {
591
+ width: 100%;
592
+ min-height: 42px;
593
+ resize: none;
594
+ border: 1px solid rgba(90, 50, 20, .34);
595
+ border-radius: 4px;
596
+ background: rgba(255, 247, 224, .7);
597
+ color: #2b1b10;
598
+ font: 12px/1.22 Georgia, "Times New Roman", serif;
599
+ box-shadow: inset 0 1px 2px rgba(59, 29, 10, .16);
600
+ pointer-events: auto;
601
+ }
602
+
603
+ .book-context-field {
604
+ min-height: 138px;
605
+ font-size: 13px;
606
+ }
607
+
608
  .judge-dais {
609
  position: absolute;
610
  left: 50%;
 
674
  }
675
 
676
  .jury-benches.left {
677
+ left: 1%;
678
  }
679
 
680
  .jury-benches.right {
681
+ right: 1%;
682
  }
683
 
684
  .jury-benches.left .jury-row {
 
732
  }
733
 
734
  .foreground-fence {
735
+ bottom: -6.5%;
736
  width: 47%;
737
  }
738
 
 
748
 
749
  .judge-table-foreground {
750
  left: 50%;
751
+ top: 20%;
752
  z-index: 1;
753
+ width: 39.1%;
754
  transform: translateX(-50%);
755
  }
756
 
 
788
 
789
  .puppet.judge {
790
  left: 50%;
791
+ top: calc(40% + 156px);
792
  --skin: #c38a55;
793
  --robe: #1b1b20;
794
  --accent: #79242a;
 
798
 
799
  .puppet.clerk {
800
  left: 43%;
801
+ top: 66%;
802
+ z-index: 14;
803
  --skin: #b77b52;
804
  --robe: #365548;
805
  --accent: #2f6f5e;
 
807
 
808
  .puppet.auric {
809
  left: 24%;
810
+ top: 87%;
811
  --skin: #c9975d;
812
  --robe: #5b2719;
813
  --accent: #a45c25;
 
815
 
816
  .speaker-auric .puppet.auric {
817
  left: 43%;
818
+ top: 87%;
819
  }
820
 
821
  .puppet.sable {
822
  left: 75%;
823
+ top: 87%;
824
  --skin: #a86d4a;
825
  --robe: #1d3045;
826
  --accent: #254f7a;
827
  }
828
 
829
  .speaker-sable .puppet.sable {
830
+ left: 75%;
831
+ top: 87%;
 
 
 
 
 
 
 
 
832
  }
833
 
834
  .puppet-portrait {
 
844
  pointer-events: none;
845
  }
846
 
 
 
 
 
847
  .puppet::before {
848
  content: "";
849
  position: absolute;
 
876
  linear-gradient(180deg, var(--accent), var(--robe) 52%, #130a07);
877
  }
878
 
879
+ .puppet.judge::before,
880
+ .puppet.judge::after {
881
+ display: none;
882
+ }
883
+
884
  .puppet .mouth {
885
  position: absolute;
886
  left: 50%;
 
893
  border-radius: 0 0 18px 18px;
894
  }
895
 
896
+ .puppet.judge .mouth {
897
+ display: none;
898
+ }
899
+
900
  .puppet.active .mouth,
901
  .puppet.walking .mouth {
902
  animation: speak-mouth .5s ease-in-out infinite;
903
  }
904
 
905
+ .speech-bubble.active-dialogue {
906
  position: absolute;
907
  left: 50%;
908
+ top: 43%;
909
+ bottom: auto;
910
+ z-index: 30;
911
+ width: min(500px, calc(100vw - 44px));
912
+ max-height: 34vh;
913
+ overflow: visible;
914
+ transform: translate(-50%, -100%);
915
+ padding: 10px 13px 11px;
916
+ border: 2px solid #141413;
917
+ border-radius: 20px;
918
+ background: rgba(255, 253, 247, .97);
919
+ color: #141413 !important;
920
+ box-shadow: 0 12px 24px rgba(0, 0, 0, .32);
921
  font-size: 12px;
922
+ font-weight: 650;
923
+ line-height: 1.32;
924
  pointer-events: none;
925
  }
926
 
927
+ .speech-bubble.active-dialogue,
928
+ .speech-bubble.active-dialogue * {
929
+ color: #141413 !important;
930
+ }
931
+
932
+ .speech-bubble.active-dialogue::before,
933
+ .speech-bubble.active-dialogue::after {
934
  content: "";
935
  position: absolute;
936
+ left: var(--bubble-tail-x, 50%);
937
+ display: block;
 
 
938
  transform: translateX(-50%) rotate(45deg);
939
+ }
940
+
941
+ .speech-bubble.active-dialogue::before {
942
+ bottom: -13px;
943
+ width: 22px;
944
+ height: 22px;
945
+ background: #141413;
946
+ border-radius: 0 0 5px 0;
947
+ }
948
+
949
+ .speech-bubble.active-dialogue::after {
950
+ bottom: -9px;
951
+ width: 16px;
952
+ height: 16px;
953
+ transform: translateX(-50%) rotate(45deg);
954
+ background: rgba(255, 253, 247, .97);
955
+ border-radius: 0 0 3px 0;
956
+ }
957
+
958
+ .speech-bubble.active-dialogue.pending {
959
+ opacity: .82;
960
+ }
961
+
962
+ .dialogue-meta {
963
+ display: flex;
964
+ align-items: baseline;
965
+ gap: 6px;
966
+ margin-bottom: 5px;
967
+ font: 800 9px/1.2 ui-monospace, SFMono-Regular, Consolas, monospace;
968
+ text-transform: uppercase;
969
+ }
970
+
971
+ .dialogue-meta strong {
972
+ font-size: 10px;
973
+ }
974
+
975
+ .dialogue-text {
976
+ max-height: calc(34vh - 42px);
977
+ overflow: auto;
978
+ white-space: pre-wrap;
979
+ }
980
+
981
+ .speech-bubble.active-dialogue.speaker-clerk { left: 43%; top: 62%; }
982
+ .speech-bubble.active-dialogue.speaker-judge { left: 50%; top: 43%; }
983
+ .speech-bubble.active-dialogue.speaker-auric { left: 43%; top: 78%; }
984
+ .speech-bubble.active-dialogue.speaker-sable { left: 75%; top: 78%; }
985
+ .speech-bubble.active-dialogue.juror-dialogue { left: 50%; top: 57%; }
986
+ .speech-bubble.active-dialogue.juror-dialogue {
987
+ top: 42%;
988
+ width: min(340px, calc(50vw - 24px));
989
+ }
990
+
991
+ .speech-bubble.active-dialogue.speaker-karl-marx,
992
+ .speech-bubble.active-dialogue.speaker-john-stuart-mill,
993
+ .speech-bubble.active-dialogue.speaker-confucius {
994
+ left: 1.5%;
995
+ transform: translateY(-100%);
996
+ }
997
+
998
+ .speech-bubble.active-dialogue.speaker-cleopatra-vii,
999
+ .speech-bubble.active-dialogue.speaker-niccolo-machiavelli,
1000
+ .speech-bubble.active-dialogue.speaker-jensen-huang {
1001
+ right: 1.5%;
1002
+ left: auto;
1003
+ transform: translateY(-100%);
1004
+ }
1005
+
1006
+ .speech-bubble.active-dialogue.speaker-karl-marx,
1007
+ .speech-bubble.active-dialogue.speaker-cleopatra-vii {
1008
+ --bubble-tail-x: 19%;
1009
+ }
1010
+
1011
+ .speech-bubble.active-dialogue.speaker-john-stuart-mill,
1012
+ .speech-bubble.active-dialogue.speaker-niccolo-machiavelli {
1013
+ --bubble-tail-x: 50%;
1014
+ }
1015
+
1016
+ .speech-bubble.active-dialogue.speaker-confucius,
1017
+ .speech-bubble.active-dialogue.speaker-jensen-huang {
1018
+ --bubble-tail-x: 81%;
1019
+ }
1020
+
1021
+ .verdict-popup {
1022
+ position: absolute;
1023
+ left: 50%;
1024
+ top: 54%;
1025
+ z-index: 42;
1026
+ width: min(460px, calc(100vw - 44px));
1027
+ transform: translate(-50%, -50%);
1028
+ padding: 18px 22px 20px;
1029
+ border: 2px solid rgba(255, 235, 178, .94);
1030
+ border-radius: 8px;
1031
+ background: rgba(20, 12, 7, .95);
1032
+ color: #fff4d6;
1033
+ text-align: center;
1034
+ box-shadow: 0 28px 58px rgba(0, 0, 0, .5);
1035
+ animation: verdict-pop .34s ease-out both;
1036
+ }
1037
+
1038
+ .verdict-popup-kicker {
1039
+ display: block;
1040
+ margin-bottom: 7px;
1041
+ color: #d9b060;
1042
+ font: 800 11px/1 ui-monospace, SFMono-Regular, Consolas, monospace;
1043
+ letter-spacing: 0;
1044
+ text-transform: uppercase;
1045
+ }
1046
+
1047
+ .verdict-popup-finding {
1048
+ display: block;
1049
+ color: #fff8e6;
1050
+ font: 900 clamp(28px, 5vw, 48px)/1.02 Georgia, serif;
1051
+ }
1052
+
1053
+ .verdict-popup-decree {
1054
+ margin: 10px auto 0;
1055
+ max-width: 38ch;
1056
+ color: rgba(255, 244, 214, .86);
1057
+ font-size: 13px;
1058
+ line-height: 1.35;
1059
  }
1060
 
1061
  .tooltip {
 
1190
  animation: juror-react .82s ease-in-out infinite alternate;
1191
  }
1192
 
 
 
 
 
 
1193
  .juror-face {
1194
  position: absolute;
1195
  left: 50%;
 
1449
  100% { transform: rotate(-18deg) translateY(0); }
1450
  }
1451
 
1452
+ @keyframes verdict-pop {
1453
+ 0% {
1454
+ opacity: 0;
1455
+ transform: translate(-50%, -46%) scale(.94);
1456
+ }
1457
+ 100% {
1458
+ opacity: 1;
1459
+ transform: translate(-50%, -50%) scale(1);
1460
+ }
1461
+ }
1462
+
1463
  @media (max-width: 820px) {
1464
  .docket-book-controls {
1465
  position: fixed;
1466
+ top: 130px;
1467
  width: calc(100vw - 52px);
1468
  transform: translateX(-50%) rotate(-1deg);
1469
  }
1470
 
1471
+ .trial-progress {
1472
+ grid-template-columns: repeat(8, minmax(24px, 1fr));
1473
+ padding: 2px 5px 3px;
1474
+ }
1475
+
1476
+ .trial-progress-segment {
1477
+ font-size: 9px;
1478
+ letter-spacing: 0;
1479
+ }
1480
+
1481
+ .trial-progress-label {
1482
+ display: none;
1483
+ }
1484
+
1485
+ .trial-progress-abbrev {
1486
+ display: inline;
1487
+ }
1488
+
1489
  .court-episode-stage {
1490
  height: 1280px;
1491
  min-height: 1280px;
 
1508
  max-width: calc(100% - 32px);
1509
  }
1510
 
1511
+ .speech-bubble.active-dialogue,
1512
+ .speech-bubble.active-dialogue.speaker-clerk,
1513
+ .speech-bubble.active-dialogue.speaker-judge,
1514
+ .speech-bubble.active-dialogue.speaker-auric,
1515
+ .speech-bubble.active-dialogue.speaker-sable,
1516
+ .speech-bubble.active-dialogue.juror-dialogue {
1517
+ left: 50%;
1518
+ top: 218px;
1519
+ width: calc(100% - 28px);
1520
+ max-height: 260px;
1521
+ transform: translateX(-50%);
1522
+ }
1523
+
1524
+ .speech-bubble.active-dialogue::after {
1525
+ display: none;
1526
+ }
1527
+
1528
+ .speech-bubble.active-dialogue.juror-dialogue,
1529
+ .speech-bubble.active-dialogue.speaker-karl-marx,
1530
+ .speech-bubble.active-dialogue.speaker-john-stuart-mill,
1531
+ .speech-bubble.active-dialogue.speaker-confucius,
1532
+ .speech-bubble.active-dialogue.speaker-cleopatra-vii,
1533
+ .speech-bubble.active-dialogue.speaker-niccolo-machiavelli,
1534
+ .speech-bubble.active-dialogue.speaker-jensen-huang {
1535
+ top: 500px;
1536
+ width: min(320px, calc(100vw - 28px));
1537
+ transform: translateY(-100%);
1538
+ }
1539
+
1540
+ .speech-bubble.active-dialogue.speaker-karl-marx,
1541
+ .speech-bubble.active-dialogue.speaker-john-stuart-mill,
1542
+ .speech-bubble.active-dialogue.speaker-confucius {
1543
+ left: 14px;
1544
+ right: auto;
1545
+ }
1546
+
1547
+ .speech-bubble.active-dialogue.speaker-cleopatra-vii,
1548
+ .speech-bubble.active-dialogue.speaker-niccolo-machiavelli,
1549
+ .speech-bubble.active-dialogue.speaker-jensen-huang {
1550
+ right: 14px;
1551
+ left: auto;
1552
+ }
1553
+
1554
  .episode-book {
1555
+ top: 218px;
1556
  width: min(680px, calc(100% - 20px));
1557
  }
1558
 
1559
  .episode-book.closed {
1560
+ top: 640px;
1561
+ width: 140px;
1562
  }
1563
 
1564
  .book-open-content {
1565
  grid-template-columns: 1fr;
1566
  gap: 10px;
1567
+ inset: 15% 11% 13%;
1568
+ padding: 0 16px;
1569
  }
1570
 
1571
  .book-open-content h2 {
 
1583
  margin: 5px 0;
1584
  }
1585
 
1586
+ .book-evidence-columns {
1587
+ grid-template-columns: 1fr 1fr;
1588
+ gap: 8px;
1589
+ }
1590
+
1591
+ .book-evidence-list li {
1592
+ font-size: 10px;
1593
+ line-height: 1.12;
1594
+ }
1595
+
1596
+ .book-field {
1597
+ min-height: 34px;
1598
+ font-size: 10px;
1599
+ }
1600
+
1601
+ .book-context-field {
1602
+ min-height: 84px;
1603
+ }
1604
+
1605
  .judge-dais {
1606
  top: 390px;
1607
  width: 280px;
 
1623
 
1624
  .puppet.auric {
1625
  left: 20%;
1626
+ top: 970px;
1627
  }
1628
 
1629
  .puppet.sable {
1630
  left: 80%;
1631
+ top: 970px;
1632
  }
1633
 
1634
  .speaker-auric .puppet.auric {
1635
  left: 42%;
1636
+ top: 970px;
1637
  }
1638
 
1639
  .speaker-sable .puppet.sable {
1640
+ left: 80%;
1641
+ top: 970px;
1642
  }
1643
 
1644
  .puppet.clerk {
1645
  left: 35%;
1646
+ top: 880px;
 
 
 
 
 
1647
  }
1648
 
1649
  .witness-area {
 
1659
  }
1660
 
1661
  .jury-benches.left {
1662
+ left: .5%;
1663
  }
1664
 
1665
  .jury-benches.right {
1666
+ right: .5%;
1667
  }
1668
 
1669
  .foreground-fence {
1670
+ bottom: -66px;
1671
  width: 64%;
1672
  }
1673
 
 
1680
  }
1681
 
1682
  .judge-table-foreground {
1683
+ top: 213px;
1684
+ width: 646px;
1685
  }
1686
 
1687
  .evidence-props {
 
1870
  """
1871
 
1872
  START_JS = """
1873
+ (case_label, search_query, hypothetical, custom_payload, speed, mind_layer) => {
1874
+ const book = document.querySelector('.episode-book.custom-book');
1875
+ const collect = (selector) => Array.from(document.querySelectorAll(selector)).map((node) => node.value || '');
1876
+ const payload = book ? JSON.stringify({
1877
+ context: document.querySelector('.book-context-field')?.value || '',
1878
+ claimant_evidence: collect('.book-claimant-field'),
1879
+ respondent_evidence: collect('.book-respondent-field')
1880
+ }) : (custom_payload || '');
1881
+ if (book) {
1882
+ const data = JSON.parse(payload);
1883
+ const hasContext = data.context.trim().length > 0;
1884
+ const hasClaimant = data.claimant_evidence.some((value) => value.trim().length > 0);
1885
+ const hasRespondent = data.respondent_evidence.some((value) => value.trim().length > 0);
1886
+ if (!hasContext || !hasClaimant || !hasRespondent) {
1887
+ return [case_label, search_query, hypothetical, payload, speed, mind_layer];
1888
+ }
1889
+ }
1890
  document.body.classList.add('trial-has-started');
1891
  if (window.SovereignCourtAudio) {
1892
  window.SovereignCourtAudio.begin();
1893
  }
1894
+ return [case_label, search_query, hypothetical, payload, speed, mind_layer];
1895
  }
1896
  """
1897
 
 
1909
  "role": "Court clerk",
1910
  "model": "AgentCPM-Explore",
1911
  },
1912
+ "Mike OSS": {
1913
  "class": "auric",
1914
+ "name": "Mike OSS",
1915
  "role": "Claimant advocate",
1916
  "model": "gpt-oss-20b",
1917
  },
1918
+ "Harvey Vector": {
1919
  "class": "sable",
1920
+ "name": "Harvey Vector",
1921
  "role": "Respondent advocate",
1922
  "model": "gpt-oss-20b",
1923
  },
 
 
 
 
 
 
1924
  "Nemotron Jury": {
1925
  "class": "jury",
1926
  "name": "Nemotron Jury",
 
1947
  "Jensen Huang": "/gradio_api/file=assets/characters/jensen-huang.png",
1948
  }
1949
 
1950
+ TRIAL_TURN_ORDER = (
1951
+ "Clerk Meridian",
1952
+ JUDGE_NAME,
1953
+ "Mike OSS",
1954
+ "Harvey Vector",
1955
+ JUDGE_NAME,
1956
+ "Mike OSS",
1957
+ "Harvey Vector",
1958
+ "Nemotron Jury",
1959
+ *JUROR_PERSONAS.keys(),
1960
+ JUDGE_NAME,
1961
+ )
1962
+
1963
  PHASE_AGENTS = {
1964
  "pretrial": ["Clerk Meridian"],
1965
  }
1966
 
1967
 
1968
+ @dataclass(frozen=True)
1969
+ class SpeakerCue:
1970
+ name: str
1971
+ role: str
1972
+ text: str
1973
+ pending: bool = False
1974
+
1975
+
1976
+ _EVENT_STREAM_DONE = object()
1977
+
1978
+
1979
  def _remote_events(request: TrialRequest) -> Iterable[TrialEvent] | None:
1980
+ endpoint = os.getenv("MODAL_TRIAL_URL", DEFAULT_MODAL_TRIAL_URL).strip()
1981
  if not endpoint:
1982
  return None
1983
 
 
1991
  return iterator()
1992
 
1993
 
1994
+ def get_events(request: TrialRequest, delay: float | None = None) -> Iterable[TrialEvent]:
1995
  remote = _remote_events(request)
1996
  if remote is not None:
1997
  yield from remote
1998
  return
1999
+ stream_delay = {"swift": 1.4, "measured": 2.4, "ceremonial": 3.4}[request.speed] if delay is None else delay
2000
+ yield from stream_trial(request, delay=stream_delay)
2001
 
2002
 
2003
  def _escape(value: str) -> str:
 
2037
  return event.turns[0].agent
2038
 
2039
 
2040
+ def _role_for_speaker(name: str, event: TrialEvent | None = None) -> str:
2041
+ if event is not None:
2042
+ turn = next((turn for turn in event.turns if turn.agent == name), None)
2043
+ if turn is not None:
2044
+ return turn.role
2045
+ if name in CHARACTERS:
2046
+ return CHARACTERS[name]["role"]
2047
+ if name in JUROR_FACES:
2048
+ return "juror"
2049
+ return "speaker"
2050
+
2051
+
2052
+ def _expected_next_speaker(events: list[TrialEvent]) -> SpeakerCue | None:
2053
+ if len(events) >= len(TRIAL_TURN_ORDER):
2054
+ return None
2055
+ name = TRIAL_TURN_ORDER[len(events)]
2056
+ role = _role_for_speaker(name)
2057
+ return SpeakerCue(name=name, role=role, text=f"{name} is preparing a response.", pending=True)
2058
+
2059
+
2060
  def _speaker_class_for(speaker: str) -> str:
2061
  if not speaker:
2062
  return ""
 
2074
  return _short_text(turn.content, 210)
2075
 
2076
 
2077
+ def _active_speaker_cue(event: TrialEvent | None, pending_speaker: SpeakerCue | None = None) -> SpeakerCue | None:
2078
+ if pending_speaker is not None:
2079
+ return pending_speaker
2080
+ if event is None or not event.turns:
2081
+ return None
2082
+ turn = event.turns[0]
2083
+ text = turn.content.strip()
2084
+ if not text:
2085
+ return None
2086
+ return SpeakerCue(name=turn.agent, role=turn.role, text=text)
2087
+
2088
+
2089
+ def _reading_duration(text: str) -> float:
2090
+ word_count = len(text.split())
2091
+ return min(MAX_READ_SECONDS, max(MIN_READ_SECONDS, (word_count / WORDS_PER_SECOND) + READ_BUFFER_SECONDS))
2092
+
2093
+
2094
+ def _event_dialogue_text(event: TrialEvent) -> str:
2095
+ if event.turns:
2096
+ return event.turns[0].content
2097
+ return event.body
2098
+
2099
+
2100
+ def _event_status(event: TrialEvent, step: int) -> str:
2101
+ if event.turns:
2102
+ return f"Step {step}: {event.turns[0].agent} - {event.title}"
2103
+ return f"Step {step}: {event.title}"
2104
+
2105
+
2106
+ def _pending_status(cue: SpeakerCue | None) -> str:
2107
+ if cue is None:
2108
+ return "The court is preparing the next turn."
2109
+ return f"{cue.name} is preparing their response."
2110
+
2111
+
2112
+ def _start_event_producer(request: TrialRequest) -> queue.Queue[object]:
2113
+ events: queue.Queue[object] = queue.Queue()
2114
+
2115
+ def produce() -> None:
2116
+ try:
2117
+ try:
2118
+ stream = get_events(request, delay=0.0)
2119
+ except TypeError:
2120
+ stream = get_events(request)
2121
+ for event in stream:
2122
+ events.put(event)
2123
+ except Exception as exc:
2124
+ events.put(exc)
2125
+ finally:
2126
+ events.put(_EVENT_STREAM_DONE)
2127
+
2128
+ threading.Thread(target=produce, name="trial-event-producer", daemon=True).start()
2129
+ return events
2130
+
2131
+
2132
  def _thread_id(name: str) -> str:
2133
  return "ai-thread-" + "".join(ch.lower() if ch.isalnum() else "-" for ch in name).strip("-")
2134
 
 
2216
  )
2217
 
2218
 
2219
+ def _active_dialogue(cue: SpeakerCue | None) -> str:
2220
+ if cue is None:
2221
+ return ""
2222
+ speaker_cls = _speaker_class_for(cue.name).strip()
2223
+ classes = ["speech-bubble", "active-dialogue"]
2224
+ if speaker_cls:
2225
+ classes.append(speaker_cls)
2226
+ if cue.name in JUROR_FACES:
2227
+ classes.append("juror-dialogue")
2228
+ if cue.pending:
2229
+ classes.append("pending")
2230
+ pending_attr = " data-pending='true'" if cue.pending else ""
2231
+ return (
2232
+ f"<div class='{' '.join(classes)}' data-speaker='{_escape(cue.name)}'{pending_attr}>"
2233
+ "<div class='dialogue-meta'>"
2234
+ f"<strong>{_escape(cue.name)}</strong>"
2235
+ f"<span>{_escape(cue.role)}</span>"
2236
+ "</div>"
2237
+ f"<div class='dialogue-text'>{_escape(cue.text)}</div>"
2238
+ "</div>"
2239
+ )
2240
+
2241
+
2242
+ def _verdict_popup(events: list[TrialEvent], show: bool) -> str:
2243
+ if not show:
2244
+ return ""
2245
+ verdict = next((event.verdict for event in reversed(events) if event.verdict is not None), None)
2246
+ if verdict is None:
2247
+ return ""
2248
+ finding = VERDICT_LABELS.get(verdict.finding, verdict.finding.replace("_", " ").title())
2249
+ return (
2250
+ f"<div class='verdict-popup' role='alert' aria-live='assertive' data-finding='{_escape(verdict.finding)}'>"
2251
+ "<span class='verdict-popup-kicker'>Verdict</span>"
2252
+ f"<strong class='verdict-popup-finding'>Verdict: {_escape(finding)}</strong>"
2253
+ f"<p class='verdict-popup-decree'>{_escape(verdict.decree)}</p>"
2254
+ "</div>"
2255
+ )
2256
+
2257
+
2258
  def _puppet(agent: str, active_agents: set[str], phase: str, events: list[TrialEvent], latest: TrialEvent | None) -> str:
2259
  meta = CHARACTERS[agent]
2260
  active = " active" if agent in active_agents else ""
2261
+ walking = " walking" if agent in {"Mike OSS", "Harvey Vector"} and agent in active_agents else ""
2262
+ small = " small" if agent == "Clerk Meridian" else ""
2263
  turns = _thread_for_character(events, agent)
 
 
 
 
 
2264
  portrait = ""
2265
  if meta.get("image"):
2266
  portrait = (
 
2271
  f"<a class='puppet {meta['class']}{active}{walking}{small}' href='#{_escape(_thread_id(agent))}' aria-label='Open {_escape(agent)} model thread'>"
2272
  f"{portrait}"
2273
  "<span class='mouth'></span>"
 
2274
  f"{_tooltip(meta['name'], meta['role'], meta['model'], turns)}"
2275
  "</a>"
2276
  )
 
2281
  image = JUROR_IMAGES.get(name, "")
2282
  active_cls = " active" if active else ""
2283
  turns = _thread_for_character(events or [], name)
 
 
 
 
 
 
 
 
2284
  portrait = (
2285
  f"<img class='juror-portrait' src='{_escape(image)}' alt='{_escape(name)} bust' "
2286
  "onerror=\"this.style.display='none'\">"
2287
  if image
2288
  else ""
2289
  )
2290
+ fallback_art = "" if image else "<span class='juror-face'></span><span class='juror-body'></span>"
2291
  return (
2292
  f"<a class='juror{active_cls}' href='#{_escape(_thread_id(name))}' style='--face: {face}' aria-label='Open {_escape(name)} model thread'>"
2293
  f"{portrait}"
2294
+ f"{fallback_art}"
 
2295
  f"{_tooltip(name, 'HF-style juror', 'Nemotron panel', turns)}"
2296
  "</a>"
2297
  )
2298
 
2299
 
2300
+ def _packet_for_label(case_label: str) -> CasePacket:
2301
+ return get_case(CASE_OPTIONS.get(case_label, "socrates"))
2302
+
2303
+
2304
+ def _split_evidence(packet: CasePacket) -> tuple[list[EvidenceItem], list[EvidenceItem]]:
2305
+ claimant = [item for item in packet.evidence if item.supports == "claimant"]
2306
+ respondent = [item for item in packet.evidence if item.supports == "respondent"]
2307
+ if len(claimant) < 3:
2308
+ claimant.extend(item for item in packet.evidence if item.supports in {"mixed", "context"} and item not in claimant)
2309
+ if len(respondent) < 3:
2310
+ respondent.extend(item for item in packet.evidence if item.supports in {"mixed", "context"} and item not in respondent)
2311
+ return claimant[:3], respondent[:3]
2312
+
2313
+
2314
+ def _book_evidence_column(title: str, items: list[EvidenceItem]) -> str:
2315
+ entries = "".join(
2316
+ "<li>"
2317
+ f"<strong>{_escape(item.title)}</strong><br>"
2318
+ f"{_escape(item.note)}"
2319
+ "</li>"
2320
+ for item in items
2321
+ )
2322
+ return (
2323
+ "<section class='book-evidence-column'>"
2324
+ f"<h3>{_escape(title)}</h3>"
2325
+ f"<ul class='book-evidence-list'>{entries}</ul>"
2326
+ "</section>"
2327
+ )
2328
+
2329
+
2330
+ def _custom_evidence_fields(class_name: str, label: str) -> str:
2331
+ fields = "".join(
2332
+ f"<textarea class='book-field {class_name}' aria-label='{_escape(label)} {index}' "
2333
+ f"placeholder='{_escape(label)} {index}'></textarea>"
2334
+ for index in range(1, 4)
2335
+ )
2336
+ return f"<section class='book-evidence-column'><h3>{_escape(label)}</h3>{fields}</section>"
2337
+
2338
+
2339
+ def _book(open_book: bool, packet: CasePacket | None = None, custom_mode: bool = False) -> str:
2340
  closed = "" if open_book else " closed"
2341
+ custom_class = " custom-book" if custom_mode and open_book else ""
2342
+ hidden_attr = "" if custom_mode and open_book else " aria-hidden='true'"
2343
+ packet = packet or get_case("socrates")
2344
+ if custom_mode and open_book:
2345
+ left_page = (
2346
+ "<section><h2>Trial details</h2>"
2347
+ "<textarea class='book-field book-context-field' aria-label='Custom trial details' "
2348
+ "placeholder='Write a short paragraph describing what happened and why the court is hearing it.'></textarea>"
2349
+ "</section>"
2350
+ )
2351
+ right_page = (
2352
+ "<section><h2>Evidence</h2><div class='book-evidence-columns'>"
2353
+ f"{_custom_evidence_fields('book-claimant-field', 'Evidence for Claimant')}"
2354
+ f"{_custom_evidence_fields('book-respondent-field', 'Evidence against Claimant')}"
2355
+ "</div></section>"
2356
+ )
2357
+ else:
2358
+ claimant_evidence, respondent_evidence = _split_evidence(packet)
2359
+ left_page = (
2360
+ "<section><h2>Trial details</h2>"
2361
+ f"<p class='book-case-title'>{_escape(packet.title)}</p>"
2362
+ f"<p class='book-context'>{_escape(packet.context or packet.setting)}</p>"
2363
+ f"<div class='book-entry'><strong>{_escape(packet.claimant)}</strong><br>{_escape(packet.claimant_claim)}</div>"
2364
+ f"<div class='book-entry'><strong>{_escape(packet.respondent)}</strong><br>{_escape(packet.respondent_claim)}</div>"
2365
+ "</section>"
2366
+ )
2367
+ right_page = (
2368
+ "<section><h2>Evidence</h2><div class='book-evidence-columns'>"
2369
+ f"{_book_evidence_column(f'Evidence for {packet.claimant}', claimant_evidence)}"
2370
+ f"{_book_evidence_column(f'Evidence for {packet.respondent}', respondent_evidence)}"
2371
+ "</div></section>"
2372
+ )
2373
  return (
2374
+ f"<div class='episode-book{closed}{custom_class}'>"
2375
  "<img class='book-art open-art' src='/gradio_api/file=assets/book/docket-book-open.png' alt='Open docket book'>"
2376
  "<img class='book-art closed-art' src='/gradio_api/file=assets/book/docket-book-closed.png' alt='Closed docket book'>"
2377
+ f"<div class='book-open-content'{hidden_attr}>"
2378
+ f"{left_page}"
2379
+ f"{right_page}"
2380
+ "</div>"
2381
  "</div>"
2382
  )
2383
 
 
2420
  )
2421
 
2422
 
2423
+ def _trial_progress(events: list[TrialEvent]) -> str:
2424
+ latest = events[-1] if events else None
2425
+ current_phase = latest.phase if latest else "pretrial"
2426
+ stage_keys = [key for key, _label in TRIAL_PROGRESS_STAGES]
2427
+ current_index = stage_keys.index(current_phase) if current_phase in stage_keys else None
2428
+ segments = []
2429
+ for index, (key, label) in enumerate(TRIAL_PROGRESS_STAGES):
2430
+ classes = ["trial-progress-segment"]
2431
+ attrs = [f"data-phase='{_escape(key)}'"]
2432
+ if current_index is not None and index < current_index:
2433
+ classes.append("complete")
2434
+ if current_index == index:
2435
+ classes.append("current")
2436
+ attrs.append("aria-current='step'")
2437
+ if key == "verdict":
2438
+ classes.append("complete")
2439
+ abbrev = label[:3]
2440
+ segments.append(
2441
+ f"<span class='{' '.join(classes)}' {' '.join(attrs)}>"
2442
+ f"<span class='trial-progress-label'>{_escape(label)}</span>"
2443
+ f"<span class='trial-progress-abbrev' aria-hidden='true'>{_escape(abbrev)}</span>"
2444
+ "</span>"
2445
+ )
2446
+ return (
2447
+ "<nav class='trial-progress' aria-label='Trial progress'>"
2448
+ + "".join(segments)
2449
+ + "</nav>"
2450
+ )
2451
+
2452
+
2453
  def _courtroom_juror_names(votes: list) -> list[str]:
2454
  names = list(JUROR_FACES)
2455
  names.extend(vote.juror for vote in votes if vote.juror not in names)
 
2466
  return ordered
2467
 
2468
 
2469
+ def render_court(
2470
+ events: list[TrialEvent],
2471
+ started: bool = False,
2472
+ pending_speaker: SpeakerCue | None = None,
2473
+ show_verdict_popup: bool = False,
2474
+ pretrial_case: CasePacket | None = None,
2475
+ custom_mode: bool = False,
2476
+ ) -> str:
2477
  latest = events[-1] if events else None
2478
  phase = latest.phase if latest else "pretrial"
2479
  title, subtitle = _latest_packet_title(events)
2480
+ active_cue = _active_speaker_cue(latest, pending_speaker)
2481
+ active_speaker = active_cue.name if active_cue is not None else _active_speaker_for(latest)
2482
+ active_agents = {active_speaker} if active_speaker else _active_agents_for(latest)
2483
  speaker_cls = _speaker_class_for(active_speaker)
2484
  caption_phase, caption_title, caption_body = _caption(latest, phase)
2485
  latest_votes = _latest_votes(events)
 
2488
  book_open = not started and not events
2489
  puppets = "".join(
2490
  _puppet(agent, active_agents, phase, events, latest)
2491
+ for agent in [JUDGE_NAME, "Clerk Meridian", "Mike OSS", "Harvey Vector"]
2492
  )
2493
  left_jurors = "".join(_juror(name, name == active_speaker, events, latest) for name in juror_names[:3])
2494
  right_jurors = "".join(_juror(name, name == active_speaker, events, latest) for name in juror_names[3:6])
 
2502
  )
2503
  return (
2504
  f"<section id='court-stage' class='court-episode-stage phase-{_escape(phase)}{_escape(speaker_cls)}{started_cls}' data-phase='{_escape(phase)}'>"
2505
+ f"{_trial_progress(events)}"
2506
  "<div class='episode-room'></div>"
2507
  "<div class='audio-deck' aria-hidden='true'>"
2508
  + "".join(f"<audio preload='auto' src='{_escape(src)}'></audio>" for src in AUDIO_PATHS.values())
 
2514
  f"<h1>{_escape(title)}</h1>"
2515
  f"<p>{_escape(subtitle)}</p></div>"
2516
  f"<div class='decree-ribbon'>Step {len(events) if events else 0}: {caption_title}<br>Hover characters for agent and model details</div>"
2517
+ f"{_book(book_open, pretrial_case, custom_mode)}"
2518
  f"<div class='judge-dais'><div class='prop-label'>{_escape(JUDGE_NAME)}</div><div class='bench-front'></div><span class='gavel'></span></div>"
2519
  "<div class='counsel-table left'><div class='prop-label'>Claimant Table</div></div>"
2520
  "<div class='counsel-table right'><div class='prop-label'>Respondent Table</div></div>"
 
2527
  f"{puppets}"
2528
  f"{evidence_props}"
2529
  f"{_foreground_props()}"
2530
+ f"{_active_dialogue(active_cue)}"
2531
+ f"{_verdict_popup(events, show_verdict_popup)}"
2532
  "<div class='gallery-benches'><div></div><div></div><div></div><div></div><div></div><div></div></div>"
2533
  "<div class='trial-caption'>"
2534
  f"<div class='caption-phase'>Live Trial Feed / {_escape(caption_phase)}</div>"
 
2599
  return f"<pre class='mind-text'>{_escape(json.dumps(compact, indent=2))}</pre>"
2600
 
2601
 
2602
+ def _clean_custom_items(values: list[str]) -> list[str]:
2603
+ return [" ".join(value.split()) for value in values if " ".join(value.split())]
2604
+
2605
+
2606
+ def _custom_case_from_payload(payload: str) -> CasePacket:
2607
+ try:
2608
+ data = json.loads(payload or "{}")
2609
+ except json.JSONDecodeError as exc:
2610
+ raise ValueError("Custom case details could not be read from the docket book.") from exc
2611
+ context = " ".join(str(data.get("context", "")).split())
2612
+ claimant_items = _clean_custom_items([str(value) for value in data.get("claimant_evidence", [])])
2613
+ respondent_items = _clean_custom_items([str(value) for value in data.get("respondent_evidence", [])])
2614
+ if not context:
2615
+ raise ValueError("Custom requires a trial details paragraph.")
2616
+ if not claimant_items or not respondent_items:
2617
+ raise ValueError("Custom requires at least one evidence item for each side.")
2618
+ evidence = [
2619
+ EvidenceItem(
2620
+ id=f"CUS-F{index}",
2621
+ title=f"Claimant Evidence {index}",
2622
+ source="Custom docket entry",
2623
+ excerpt=item,
2624
+ supports="claimant",
2625
+ reliability=0.65,
2626
+ note=item,
2627
+ )
2628
+ for index, item in enumerate(claimant_items[:3], start=1)
2629
+ ]
2630
+ evidence.extend(
2631
+ EvidenceItem(
2632
+ id=f"CUS-A{index}",
2633
+ title=f"Respondent Evidence {index}",
2634
+ source="Custom docket entry",
2635
+ excerpt=item,
2636
+ supports="respondent",
2637
+ reliability=0.65,
2638
+ note=item,
2639
+ )
2640
+ for index, item in enumerate(respondent_items[:3], start=1)
2641
+ )
2642
+ return CasePacket(
2643
+ id="custom",
2644
+ title="Custom Trial",
2645
+ subtitle="A custom docket assembled in the opening book.",
2646
+ claimant="Claimant",
2647
+ respondent="Respondent",
2648
+ charge="Whether the custom record supports the claimant or the respondent.",
2649
+ setting="A custom courtroom packet entered by the user.",
2650
+ context=context,
2651
+ claimant_claim="The claimant says the custom context and supporting evidence justify a favorable finding.",
2652
+ respondent_claim="The respondent says the custom context is incomplete, overread, or answered by contrary evidence.",
2653
+ source_note="Custom user-entered case packet from the docket book.",
2654
+ evidence=evidence,
2655
+ )
2656
+
2657
+
2658
+ def render_case_preview(case_label: str) -> str:
2659
+ case_id = CASE_OPTIONS.get(case_label, "socrates")
2660
+ return render_court(
2661
+ [],
2662
+ pretrial_case=get_case(case_id) if case_id != "custom" else None,
2663
+ custom_mode=case_id == "custom",
2664
+ )
2665
+
2666
+
2667
+ def run_ui(
2668
+ case_label: str,
2669
+ search_query: str,
2670
+ hypothetical: str,
2671
+ custom_payload: str,
2672
+ speed: str,
2673
+ mind_layer: bool,
2674
+ ):
2675
+ case_id = CASE_OPTIONS.get(case_label, "socrates")
2676
+ try:
2677
+ custom_case = _custom_case_from_payload(custom_payload) if case_id == "custom" else None
2678
+ except ValueError as exc:
2679
+ yield (
2680
+ render_court([], pretrial_case=None, custom_mode=True),
2681
+ render_evidence([]),
2682
+ render_jurors([]),
2683
+ render_mind([], mind_layer),
2684
+ str(exc),
2685
+ )
2686
+ return
2687
  request = TrialRequest(
2688
+ case_id=case_id,
2689
  search_query=search_query or "",
2690
  hypothetical=hypothetical or "",
2691
+ custom_case=custom_case,
2692
  speed=speed or "swift",
2693
  mind_layer=bool(mind_layer),
2694
  )
2695
  events: list[TrialEvent] = []
2696
+ produced_events = _start_event_producer(request)
2697
+ pending_speaker = _expected_next_speaker(events)
2698
  yield (
2699
+ render_court(events, started=True, pending_speaker=pending_speaker),
2700
  render_evidence(events),
2701
  render_jurors(events),
2702
  render_mind(events, mind_layer),
2703
+ _pending_status(pending_speaker),
2704
  )
2705
  try:
2706
+ while True:
2707
+ item = produced_events.get()
2708
+ if item is _EVENT_STREAM_DONE:
2709
+ break
2710
+ if isinstance(item, Exception):
2711
+ raise item
2712
+ event = item
2713
  events.append(event)
 
2714
  yield (
2715
  render_court(events, started=True),
2716
  render_evidence(events),
2717
  render_jurors(events),
2718
  render_mind(events, mind_layer),
2719
+ _event_status(event, len(events)),
2720
  )
2721
+ duration = _reading_duration(_event_dialogue_text(event))
2722
+ if duration > 0:
2723
+ time.sleep(duration)
2724
+ pending_speaker = _expected_next_speaker(events)
2725
+ if pending_speaker is not None and produced_events.empty():
2726
+ yield (
2727
+ render_court(events, started=True, pending_speaker=pending_speaker),
2728
+ render_evidence(events),
2729
+ render_jurors(events),
2730
+ render_mind(events, mind_layer),
2731
+ _pending_status(pending_speaker),
2732
+ )
2733
  except Exception as exc:
2734
  yield (
2735
  render_court(events, started=True),
 
2740
  )
2741
  return
2742
  yield (
2743
+ render_court(events, started=True, show_verdict_popup=True),
2744
  render_evidence(events),
2745
  render_jurors(events),
2746
  render_mind(events, mind_layer),
 
2761
  )
2762
  start = gr.Button("Begin Trial", variant="primary", scale=1)
2763
  status = gr.Markdown("Ready.", elem_classes=["book-status"])
2764
+ courtroom = gr.HTML(render_case_preview("Trial of Socrates"), label="Live courtroom")
2765
  search = gr.State("")
2766
+ hypo = gr.State("")
2767
+ custom_payload = gr.State("")
2768
  speed = gr.State("swift")
2769
  mind = gr.State(True)
 
 
 
2770
  with gr.Row(elem_classes=["drawer-shell"]):
2771
  with gr.Column(scale=1):
2772
  with gr.Tab("Evidence Drawer"):
 
2774
  with gr.Tab("Juror Panel"):
2775
  jurors = gr.HTML(render_jurors([]))
2776
  mind_html = gr.HTML(render_mind([], True), visible=False)
2777
+ case.change(
2778
+ render_case_preview,
2779
+ inputs=[case],
2780
+ outputs=[courtroom],
2781
+ )
2782
  start.click(
2783
  run_ui,
2784
+ inputs=[case, search, hypo, custom_payload, speed, mind],
2785
  outputs=[courtroom, evidence, jurors, mind_html, status],
2786
  js=START_JS,
2787
  )
modal_app.py CHANGED
@@ -3,7 +3,7 @@ import time
3
 
4
  import modal
5
 
6
- from sovereign_bench.engine import stream_trial_jsonl
7
  from sovereign_bench.llm import (
8
  ModelCall,
9
  ModelResult,
@@ -12,10 +12,12 @@ from sovereign_bench.llm import (
12
  )
13
  from sovereign_bench.models import TrialRequest
14
 
15
- app = modal.App("sovereign-bench")
 
16
  GPU_NAME = "H100"
17
  GPU_TIMEOUT_SECONDS = 20 * 60
18
  HF_CACHE_DIR = "/root/.cache/huggingface"
 
19
 
20
  image = (
21
  modal.Image.debian_slim(python_version="3.12")
@@ -89,7 +91,8 @@ class VllmModel:
89
  "role": "user",
90
  "content": (
91
  "Your previous response did not include visible courtroom dialogue. "
92
- "Return only the final spoken dialogue now. Do not include <think>, analysis, reasoning, markdown, or notes. /no_think"
 
93
  ),
94
  }
95
  ]
@@ -115,6 +118,10 @@ class VllmModel:
115
  "latency_ms": int((time.perf_counter() - started) * 1000),
116
  }
117
 
 
 
 
 
118
 
119
  def modal_gpu_enabled() -> bool:
120
  return os.getenv("SOVEREIGN_DISABLE_MODAL_GPU", "").lower() not in {"1", "true", "yes"}
@@ -127,6 +134,9 @@ def modal_gpu_runner(**kwargs) -> ModelResult:
127
  case_summary=kwargs["case_summary"],
128
  task=kwargs["task"],
129
  evidence_summary=kwargs["evidence_summary"],
 
 
 
130
  )
131
  requested_model = kwargs["model"]
132
  prompt_hash = messages_hash(messages)
@@ -191,3 +201,12 @@ def trial_stream(payload: dict):
191
  @app.local_entrypoint()
192
  def main():
193
  print(check_huggingface_connection.remote())
 
 
 
 
 
 
 
 
 
 
3
 
4
  import modal
5
 
6
+ from sovereign_bench.engine import MODEL_BUDGET, stream_trial_jsonl
7
  from sovereign_bench.llm import (
8
  ModelCall,
9
  ModelResult,
 
12
  )
13
  from sovereign_bench.models import TrialRequest
14
 
15
+ MODAL_APP_NAME = "sovereign-bench"
16
+ app = modal.App(MODAL_APP_NAME)
17
  GPU_NAME = "H100"
18
  GPU_TIMEOUT_SECONDS = 20 * 60
19
  HF_CACHE_DIR = "/root/.cache/huggingface"
20
+ USED_MODEL_IDS = tuple(dict.fromkeys(model for _, model, _ in MODEL_BUDGET))
21
 
22
  image = (
23
  modal.Image.debian_slim(python_version="3.12")
 
91
  "role": "user",
92
  "content": (
93
  "Your previous response did not include visible courtroom dialogue. "
94
+ "Return only the final answer now. Do not mention prompts, tasks, requirements, or that you are following instructions. "
95
+ "Do not include <think>, analysis, reasoning, markdown, narration, or notes. /no_think"
96
  ),
97
  }
98
  ]
 
118
  "latency_ms": int((time.perf_counter() - started) * 1000),
119
  }
120
 
121
+ @modal.method()
122
+ def warm(self) -> dict:
123
+ return {"model": self.model_id, "status": "warm"}
124
+
125
 
126
  def modal_gpu_enabled() -> bool:
127
  return os.getenv("SOVEREIGN_DISABLE_MODAL_GPU", "").lower() not in {"1", "true", "yes"}
 
134
  case_summary=kwargs["case_summary"],
135
  task=kwargs["task"],
136
  evidence_summary=kwargs["evidence_summary"],
137
+ trial_history=kwargs.get("trial_history", ""),
138
+ persona=kwargs.get("persona", ""),
139
+ objective=kwargs.get("objective", ""),
140
  )
141
  requested_model = kwargs["model"]
142
  prompt_hash = messages_hash(messages)
 
201
  @app.local_entrypoint()
202
  def main():
203
  print(check_huggingface_connection.remote())
204
+
205
+
206
+ @app.local_entrypoint()
207
+ def warm_models():
208
+ deployed_model = modal.Cls.from_name(MODAL_APP_NAME, "VllmModel")
209
+ for model_id in USED_MODEL_IDS:
210
+ model = deployed_model(model_id=model_id)
211
+ model.update_autoscaler(min_containers=1)
212
+ print(model.warm.remote())
sovereign_bench/cases.py CHANGED
@@ -11,6 +11,11 @@ SOCRATES = CasePacket(
11
  respondent="Socrates",
12
  charge="Corrupting the youth and refusing the sanctioned gods of the city.",
13
  setting="Athens, 399 BCE, reassembled inside a pocket tribunal.",
 
 
 
 
 
14
  claimant_claim=(
15
  "The city argues that Socrates trained young citizens to mock public authority "
16
  "and placed private daimonion guidance above civic religion."
@@ -25,19 +30,7 @@ SOCRATES = CasePacket(
25
  ),
26
  evidence=[
27
  EvidenceItem(
28
- id="SOC-E1",
29
- title="The Oracle Burden",
30
- source="Plato, Apology tradition",
31
- excerpt=(
32
- "Socrates describes testing reputedly wise citizens after a Delphic oracle "
33
- "report, creating public embarrassment but framing the act as duty."
34
- ),
35
- supports="mixed",
36
- reliability=0.78,
37
- note="Shows both civic irritation and a claimed religious motivation.",
38
- ),
39
- EvidenceItem(
40
- id="SOC-E2",
41
  title="Youthful Imitators",
42
  source="Plato, Apology tradition",
43
  excerpt=(
@@ -49,7 +42,31 @@ SOCRATES = CasePacket(
49
  note="Supports social effect, but does not prove intentional corruption.",
50
  ),
51
  EvidenceItem(
52
- id="SOC-E3",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  title="No Fee, No School",
54
  source="Ancient defense tradition",
55
  excerpt=(
@@ -61,16 +78,127 @@ SOCRATES = CasePacket(
61
  note="Weakens the claim that he operated a formal corrupting academy.",
62
  ),
63
  EvidenceItem(
64
- id="SOC-E4",
65
- title="The Daimonion",
66
- source="Ancient biographical tradition",
67
  excerpt=(
68
- "Socrates reports a private divine sign that restrains him from certain actions, "
69
- "which the court may read as piety or heterodoxy."
70
  ),
71
- supports="mixed",
72
- reliability=0.64,
73
- note="Central ambiguity: private religious experience versus civic irreverence.",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  ),
75
  ],
76
  )
@@ -84,6 +212,11 @@ BARNABY = CasePacket(
84
  respondent="Barnaby Buttons",
85
  charge="Theft of the final mooncake and alteration of the communal snack ledger.",
86
  setting="A fluorescent office kitchen at 4:47 p.m., under the humming republic of the fridge.",
 
 
 
 
 
87
  claimant_claim=(
88
  "Barnaby removed the final mooncake, changed the snack ledger from '1 mooncake' "
89
  "to '0 mooncakes', and left the team dessertless."
@@ -92,7 +225,7 @@ BARNABY = CasePacket(
92
  "Barnaby says the mooncake was already abandoned, the ledger pen skipped naturally, "
93
  "and the crumbs came from an unrelated biscuit."
94
  ),
95
- source_note="Cached original whimsical packet made for reliable hackathon demos.",
96
  evidence=[
97
  EvidenceItem(
98
  id="BTN-E1",
@@ -134,7 +267,7 @@ BARNABY = CasePacket(
134
  )
135
 
136
 
137
- CASES = {case.id: case for case in (SOCRATES, BARNABY)}
138
 
139
 
140
  def get_case(case_id: str) -> CasePacket:
 
11
  respondent="Socrates",
12
  charge="Corrupting the youth and refusing the sanctioned gods of the city.",
13
  setting="Athens, 399 BCE, reassembled inside a pocket tribunal.",
14
+ context=(
15
+ "Athens has brought Socrates back before a civic court after years of public questioning, "
16
+ "youthful imitators, and anxiety about private religious claims. The city says his method "
17
+ "weakened civic order; Socrates says he served the public by exposing false wisdom."
18
+ ),
19
  claimant_claim=(
20
  "The city argues that Socrates trained young citizens to mock public authority "
21
  "and placed private daimonion guidance above civic religion."
 
30
  ),
31
  evidence=[
32
  EvidenceItem(
33
+ id="SOC-F1",
 
 
 
 
 
 
 
 
 
 
 
 
34
  title="Youthful Imitators",
35
  source="Plato, Apology tradition",
36
  excerpt=(
 
42
  note="Supports social effect, but does not prove intentional corruption.",
43
  ),
44
  EvidenceItem(
45
+ id="SOC-F2",
46
+ title="Public Embarrassment",
47
+ source="Ancient defense tradition",
48
+ excerpt=(
49
+ "Socrates describes testing reputedly wise citizens in public after hearing the "
50
+ "Delphic oracle report."
51
+ ),
52
+ supports="claimant",
53
+ reliability=0.74,
54
+ note="Shows a repeated practice that made civic leaders look foolish.",
55
+ ),
56
+ EvidenceItem(
57
+ id="SOC-F3",
58
+ title="The Daimonion Suspicion",
59
+ source="Ancient biographical tradition",
60
+ excerpt=(
61
+ "Socrates reports a private divine sign that restrains him from certain actions, "
62
+ "which civic accusers read as religious irregularity."
63
+ ),
64
+ supports="claimant",
65
+ reliability=0.64,
66
+ note="Supports the impiety theory if private revelation is treated as civic defiance.",
67
+ ),
68
+ EvidenceItem(
69
+ id="SOC-A1",
70
  title="No Fee, No School",
71
  source="Ancient defense tradition",
72
  excerpt=(
 
78
  note="Weakens the claim that he operated a formal corrupting academy.",
79
  ),
80
  EvidenceItem(
81
+ id="SOC-A2",
82
+ title="Oracle as Duty",
83
+ source="Plato, Apology tradition",
84
  excerpt=(
85
+ "Socrates frames his questioning as obedience to a divine puzzle rather than "
86
+ "contempt for religion."
87
  ),
88
+ supports="respondent",
89
+ reliability=0.78,
90
+ note="Turns the impiety charge into a competing account of piety.",
91
+ ),
92
+ EvidenceItem(
93
+ id="SOC-A3",
94
+ title="Cross-Examination as Service",
95
+ source="Defense summary",
96
+ excerpt=(
97
+ "The defense treats uncomfortable questioning as civic improvement, not sabotage "
98
+ "or intentional corruption."
99
+ ),
100
+ supports="respondent",
101
+ reliability=0.7,
102
+ note="Gives the jury a public-interest reason to tolerate Socrates.",
103
+ ),
104
+ ],
105
+ )
106
+
107
+
108
+ GREG = CasePacket(
109
+ id="greg",
110
+ title="Greg Heffley v. Mom",
111
+ subtitle="A family-court argument over a diary, embarrassment, and parental good intentions.",
112
+ claimant="Greg Heffley",
113
+ respondent="Susan Heffley",
114
+ charge="Whether Mom wrongfully saddled Greg with an embarrassing diary instead of a normal journal.",
115
+ setting="The Heffley house on the eve of another middle-school year.",
116
+ context=(
117
+ "Greg receives a book from his mom meant to help him record his thoughts, but he objects "
118
+ "that the word diary makes him look childish and vulnerable at school. Mom treats the book "
119
+ "as a harmless tool for reflection; Greg treats it as social evidence waiting to be used "
120
+ "against him."
121
+ ),
122
+ claimant_claim=(
123
+ "Greg argues that Mom ignored the obvious social risk of handing a middle-school boy a diary "
124
+ "and failed to respect how easily classmates can turn an object into humiliation."
125
+ ),
126
+ respondent_claim=(
127
+ "Mom answers that the writing book is a constructive outlet, that Greg can choose how to use it, "
128
+ "and that parental encouragement is not social sabotage."
129
+ ),
130
+ source_note=(
131
+ "Cached demo packet using paraphrased context from the Diary of a Wimpy Kid setup. "
132
+ "No book text is quoted."
133
+ ),
134
+ evidence=[
135
+ EvidenceItem(
136
+ id="GRG-F1",
137
+ title="The Label Problem",
138
+ source="Greg's objection",
139
+ excerpt=(
140
+ "Greg objects that diary is the wrong label for a middle-school boy and could be "
141
+ "used to mock him."
142
+ ),
143
+ supports="claimant",
144
+ reliability=0.74,
145
+ note="Shows a foreseeable embarrassment risk from Greg's perspective.",
146
+ ),
147
+ EvidenceItem(
148
+ id="GRG-F2",
149
+ title="Middle-School Audience",
150
+ source="School context",
151
+ excerpt=(
152
+ "Greg's social world rewards status and punishes anything classmates can frame "
153
+ "as childish."
154
+ ),
155
+ supports="claimant",
156
+ reliability=0.7,
157
+ note="Makes the harm plausible even before anyone finds the book.",
158
+ ),
159
+ EvidenceItem(
160
+ id="GRG-F3",
161
+ title="Ignored Preference",
162
+ source="Family exchange summary",
163
+ excerpt=(
164
+ "Greg wanted distance from the diary framing, but Mom treated the gift as settled."
165
+ ),
166
+ supports="claimant",
167
+ reliability=0.66,
168
+ note="Supports Greg's autonomy argument, though parents often choose school supplies.",
169
+ ),
170
+ EvidenceItem(
171
+ id="GRG-A1",
172
+ title="Private Writing Tool",
173
+ source="Mom's purpose",
174
+ excerpt=(
175
+ "Mom intended the book as a private place for Greg to record his thoughts and school year."
176
+ ),
177
+ supports="respondent",
178
+ reliability=0.78,
179
+ note="Shows a constructive parental purpose rather than intent to embarrass.",
180
+ ),
181
+ EvidenceItem(
182
+ id="GRG-A2",
183
+ title="Greg Controls Disclosure",
184
+ source="Household facts",
185
+ excerpt=(
186
+ "The book is not inherently public; Greg can keep it private and decide what to write."
187
+ ),
188
+ supports="respondent",
189
+ reliability=0.68,
190
+ note="Weakens the claim that the gift itself creates inevitable harm.",
191
+ ),
192
+ EvidenceItem(
193
+ id="GRG-A3",
194
+ title="Reflection Has Value",
195
+ source="Parenting rationale",
196
+ excerpt=(
197
+ "A journal can help a student process school, family, and growing-up pressures."
198
+ ),
199
+ supports="respondent",
200
+ reliability=0.71,
201
+ note="Gives Mom a reasonable-benefit argument even if the branding is awkward.",
202
  ),
203
  ],
204
  )
 
212
  respondent="Barnaby Buttons",
213
  charge="Theft of the final mooncake and alteration of the communal snack ledger.",
214
  setting="A fluorescent office kitchen at 4:47 p.m., under the humming republic of the fridge.",
215
+ context=(
216
+ "An office breakroom has lost its final mooncake after a suspicious ledger update and "
217
+ "a trail of crumbs. The commonwealth blames Barnaby Buttons; Barnaby says the evidence "
218
+ "is ordinary office mess and coincidence."
219
+ ),
220
  claimant_claim=(
221
  "Barnaby removed the final mooncake, changed the snack ledger from '1 mooncake' "
222
  "to '0 mooncakes', and left the team dessertless."
 
225
  "Barnaby says the mooncake was already abandoned, the ledger pen skipped naturally, "
226
  "and the crumbs came from an unrelated biscuit."
227
  ),
228
+ source_note="Cached original whimsical packet kept for compatibility with older tests.",
229
  evidence=[
230
  EvidenceItem(
231
  id="BTN-E1",
 
267
  )
268
 
269
 
270
+ CASES = {case.id: case for case in (SOCRATES, GREG, BARNABY)}
271
 
272
 
273
  def get_case(case_id: str) -> CasePacket:
sovereign_bench/engine.py CHANGED
@@ -9,7 +9,7 @@ from collections.abc import Callable, Iterable
9
  from pydantic import ValidationError
10
 
11
  from .cases import get_case
12
- from .llm import ModelCall, ModelResult, call_small_model
13
  from .models import AgentTurn, CasePacket, JurorVote, TrialEvent, TrialRequest, Verdict
14
  from .retrieval import build_live_case
15
 
@@ -20,11 +20,11 @@ OPENAI_PROVIDER = "auto"
20
  OPENBMB_PROVIDER = "featherless-ai"
21
  NEMOTRON_PROVIDER = "featherless-ai"
22
 
23
- MODEL_BUDGET = [
24
- ("Presiding Advocate", GPT_OSS_MODEL, 20.0),
25
- ("Clerk of Style", OPENBMB_MODEL, 4.0),
26
- ("Juror/Auditor Ring", NEMOTRON_MODEL, 8.0),
27
- ]
28
  TOTAL_PARAMS_B = sum(item[2] for item in MODEL_BUDGET)
29
 
30
  JUDGE_NAME = "Marcus Aurelius"
@@ -59,12 +59,14 @@ def _turn(agent: str, role: str, result: ModelResult, model: str, confidence: fl
59
  )
60
 
61
 
62
- def _case_summary(packet: CasePacket) -> str:
63
- return (
64
- f"{packet.title}. Charge: {packet.charge}\n"
65
- f"Claimant: {packet.claimant_claim}\n"
66
- f"Respondent: {packet.respondent_claim}"
67
- )
 
 
68
 
69
 
70
  def _evidence_summary(packet: CasePacket) -> str:
@@ -78,8 +80,12 @@ def _call_trace(calls: list[ModelCall]) -> list[dict]:
78
  return [call.__dict__ for call in calls]
79
 
80
 
81
- def resolve_case(request: TrialRequest) -> tuple[CasePacket, dict]:
82
- if request.case_id == "live":
 
 
 
 
83
  packet = build_live_case(request.search_query, request.hypothetical)
84
  if packet:
85
  return packet, {"mode": "live"}
@@ -99,12 +105,16 @@ def _required_role(model_runner: ModelRunner | None, model_calls: list[ModelCall
99
  except Exception as exc:
100
  raise RequiredModelError(f"{kwargs.get('agent', 'Model')} unavailable: {exc}") from exc
101
  model_calls.append(result.call)
102
- if not result.call.ok:
103
- error = result.call.error or "model call did not complete"
104
- raise RequiredModelError(f"{kwargs.get('agent', 'Model')} unavailable: {error}")
105
- if not result.text.strip():
106
- raise RequiredModelError(f"{kwargs.get('agent', 'Model')} returned an empty response.")
107
- return result
 
 
 
 
108
 
109
 
110
  def _trace(packet: CasePacket, source_trace: dict, model_calls: list[ModelCall]) -> dict:
@@ -119,7 +129,7 @@ def _trace(packet: CasePacket, source_trace: dict, model_calls: list[ModelCall])
119
  }
120
 
121
 
122
- def _emit(
123
  packet: CasePacket,
124
  source_trace: dict,
125
  model_calls: list[ModelCall],
@@ -129,10 +139,47 @@ def _emit(
129
  event.trace = _trace(packet, source_trace, model_calls)
130
  if delay > 0:
131
  time.sleep(delay)
132
- return event
133
-
134
-
135
- def _extract_json(text: str) -> object:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
136
  stripped = text.strip()
137
  if stripped.startswith("```"):
138
  stripped = re.sub(r"^```(?:json)?\s*", "", stripped, flags=re.I)
@@ -146,41 +193,37 @@ def _extract_json(text: str) -> object:
146
  return json.loads(match.group(1))
147
 
148
 
149
- def _parse_jury_votes(result: ModelResult, packet: CasePacket) -> list[JurorVote]:
150
- try:
151
- data = _extract_json(result.text)
152
- except json.JSONDecodeError as exc:
153
- raise RequiredModelError(f"Nemotron Jury returned invalid JSON: {exc.msg}") from exc
154
-
155
- raw_votes = data.get("votes") if isinstance(data, dict) else data
156
- if not isinstance(raw_votes, list):
157
- raise RequiredModelError("Nemotron Jury output must contain a votes list.")
158
- if len(raw_votes) != len(JUROR_NAMES):
159
- raise RequiredModelError("Nemotron Jury must return exactly six juror votes.")
160
-
161
- known_evidence = {item.id for item in packet.evidence}
162
- votes: list[JurorVote] = []
163
- try:
164
- for item in raw_votes:
165
- vote = JurorVote.model_validate(item)
166
- votes.append(vote)
167
- except ValidationError as exc:
168
- raise RequiredModelError(f"Nemotron Jury vote schema is invalid: {exc.errors()[0]['msg']}") from exc
169
-
170
- if [vote.juror for vote in votes] != JUROR_NAMES:
171
- raise RequiredModelError("Nemotron Jury must return votes in the fixed juror order.")
172
- for vote in votes:
173
- expected_persona = JUROR_PERSONAS[vote.juror]
174
- if vote.persona.strip().lower() != expected_persona:
175
- raise RequiredModelError(f"{vote.juror} persona must be '{expected_persona}'.")
176
- if not vote.reason.strip():
177
- raise RequiredModelError(f"{vote.juror} must include a rationale.")
178
- if not vote.evidence_ids or any(evidence_id not in known_evidence for evidence_id in vote.evidence_ids):
179
- raise RequiredModelError(f"{vote.juror} must cite known evidence IDs.")
180
- return votes
181
-
182
-
183
- def _majority_finding(votes: list[JurorVote]) -> str:
184
  counts = Counter(vote.vote for vote in votes)
185
  top = counts.most_common()
186
  if not top:
@@ -227,15 +270,12 @@ def _verdict_from_votes(votes: list[JurorVote]) -> Verdict:
227
  )
228
 
229
 
230
- def _jury_task() -> str:
231
- personas = "\n".join(f"- {name}: {persona}" for name, persona in JUROR_PERSONAS.items())
232
  return (
233
- "Return JSON only with a top-level 'votes' array. Create exactly one vote for each juror, in this order: "
234
- f"{', '.join(JUROR_NAMES)}. Valid vote values are liable, not_liable, uncertain. Each item must contain "
235
- "juror, persona, vote, reason, and evidence_ids. The persona value must exactly match the profile below. "
236
- "Each reason should be one concise sentence and each evidence_ids list must cite evidence IDs from the record. "
237
- "Vote through the named public-history worldview, not a generic juror role.\n"
238
- f"{personas}"
239
  )
240
 
241
 
@@ -249,10 +289,11 @@ def stream_trial(
249
  model_runner: ModelRunner | None = None,
250
  ) -> Iterable[TrialEvent]:
251
  packet, source_trace = resolve_case(request)
252
- case_summary = _case_summary(packet)
253
- evidence_summary = _evidence_summary(packet)
254
- model_calls: list[ModelCall] = []
255
- hypo = request.hypothetical.strip()
 
256
  hypo_line = f"\n\nUser hypothetical admitted as a blue-ribbon sidebar: {hypo}" if hypo else ""
257
 
258
  clerk = _required_role(
@@ -263,14 +304,15 @@ def stream_trial(
263
  model=OPENBMB_MODEL,
264
  case_summary=case_summary,
265
  evidence_summary=evidence_summary,
266
- task="Announce the case by name, identify the parties, and read the charge.",
267
  provider=OPENBMB_PROVIDER,
268
  max_tokens=110,
269
  )
270
- yield _emit(
271
- packet,
272
- source_trace,
273
- model_calls,
 
274
  TrialEvent(
275
  phase="intake",
276
  title="The Court Convenes",
@@ -289,17 +331,21 @@ def stream_trial(
289
  model=GPT_OSS_MODEL,
290
  case_summary=case_summary,
291
  evidence_summary=evidence_summary,
 
 
 
292
  task=(
293
  f"As {JUDGE_NAME}, a Stoic courtroom judge guided by {JUDGE_PERSONA}, explain the proceeding "
294
- "and the burden of proof in one or two disciplined sentences."
295
  ),
296
  provider=OPENAI_PROVIDER,
297
  max_tokens=110,
298
  )
299
- yield _emit(
300
- packet,
301
- source_trace,
302
- model_calls,
 
303
  TrialEvent(
304
  phase="intake",
305
  title="The Burden Is Set",
@@ -313,24 +359,27 @@ def stream_trial(
313
  claimant_opening = _required_role(
314
  model_runner,
315
  model_calls,
316
- agent="Advocate Auric",
317
  role="claimant advocate",
318
  model=GPT_OSS_MODEL,
319
- case_summary=case_summary,
320
- evidence_summary=evidence_summary,
321
- task="Make the claimant's opening statement alone. Cite the strongest claimant-side exhibit.",
 
 
322
  provider=OPENAI_PROVIDER,
323
  max_tokens=130,
324
  )
325
- yield _emit(
326
- packet,
327
- source_trace,
328
- model_calls,
 
329
  TrialEvent(
330
  phase="claims",
331
  title="Claimant Opening",
332
  body=packet.claimant_claim,
333
- turns=[_turn("Advocate Auric", "claimant advocate", claimant_opening, GPT_OSS_MODEL, 0.88)],
334
  evidence=packet.evidence,
335
  ),
336
  delay,
@@ -339,53 +388,45 @@ def stream_trial(
339
  respondent_opening = _required_role(
340
  model_runner,
341
  model_calls,
342
- agent="Counsel Sable",
343
  role="respondent advocate",
344
  model=GPT_OSS_MODEL,
345
- case_summary=case_summary,
346
- evidence_summary=evidence_summary,
347
- task="Make the respondent's opening statement alone. Emphasize uncertainty and cite a helpful exhibit.",
 
 
348
  provider=OPENAI_PROVIDER,
349
  max_tokens=130,
350
  )
351
- yield _emit(
352
- packet,
353
- source_trace,
354
- model_calls,
 
355
  TrialEvent(
356
  phase="opening",
357
  title="Respondent Opening",
358
  body=packet.respondent_claim,
359
- turns=[_turn("Counsel Sable", "respondent advocate", respondent_opening, GPT_OSS_MODEL, 0.88)],
360
  evidence=packet.evidence,
361
  ),
362
  delay,
363
  )
364
 
365
- auditor = _required_role(
366
- model_runner,
367
- model_calls,
368
- agent="Auditor Prism",
369
- role="evidence auditor",
370
- model=NEMOTRON_MODEL,
371
- case_summary=case_summary,
372
- evidence_summary=evidence_summary,
373
- task="Present the evidence record. Identify the strongest exhibit and the weakest inference.",
374
- provider=NEMOTRON_PROVIDER,
375
- max_tokens=150,
376
- )
377
- yield _emit(
378
- packet,
379
- source_trace,
380
- model_calls,
381
- TrialEvent(
382
- phase="evidence",
383
- title="The Record Is Audited",
384
- body="\n".join(f"{item.id}: {item.title} | reliability {item.reliability:.2f} | {item.note}" for item in packet.evidence),
385
- turns=[_turn("Auditor Prism", "evidence auditor", auditor, NEMOTRON_MODEL, 0.86)],
386
- evidence=packet.evidence,
387
- ),
388
- delay,
389
  )
390
 
391
  judge_question = _required_role(
@@ -396,17 +437,21 @@ def stream_trial(
396
  model=GPT_OSS_MODEL,
397
  case_summary=case_summary,
398
  evidence_summary=evidence_summary,
 
 
 
399
  task=(
400
  f"As {JUDGE_NAME}, ask one sharp hinge question that would change the outcome if answered. "
401
- "Use Stoic restraint and public reason."
402
  ),
403
  provider=OPENAI_PROVIDER,
404
  max_tokens=100,
405
  )
406
- yield _emit(
407
- packet,
408
- source_trace,
409
- model_calls,
 
410
  TrialEvent(
411
  phase="questions",
412
  title="The Hinge Question",
@@ -420,24 +465,27 @@ def stream_trial(
420
  claimant_answer = _required_role(
421
  model_runner,
422
  model_calls,
423
- agent="Advocate Auric",
424
  role="claimant advocate",
425
  model=GPT_OSS_MODEL,
426
  case_summary=case_summary,
427
  evidence_summary=evidence_summary,
428
- task=f"Answer {JUDGE_NAME}'s hinge question for the claimant: {judge_question.text}",
 
 
429
  provider=OPENAI_PROVIDER,
430
  max_tokens=130,
431
  )
432
- yield _emit(
433
- packet,
434
- source_trace,
435
- model_calls,
 
436
  TrialEvent(
437
  phase="questions",
438
  title="Claimant Answers the Bench",
439
  body="The claimant answers the hinge question.",
440
- turns=[_turn("Advocate Auric", "claimant advocate", claimant_answer, GPT_OSS_MODEL, 0.88)],
441
  evidence=packet.evidence,
442
  ),
443
  delay,
@@ -446,24 +494,27 @@ def stream_trial(
446
  respondent_answer = _required_role(
447
  model_runner,
448
  model_calls,
449
- agent="Counsel Sable",
450
  role="respondent advocate",
451
  model=GPT_OSS_MODEL,
452
  case_summary=case_summary,
453
  evidence_summary=evidence_summary,
454
- task=f"Answer {JUDGE_NAME}'s hinge question for the respondent: {judge_question.text}",
 
 
455
  provider=OPENAI_PROVIDER,
456
  max_tokens=130,
457
  )
458
- yield _emit(
459
- packet,
460
- source_trace,
461
- model_calls,
 
462
  TrialEvent(
463
  phase="questions",
464
  title="Respondent Answers the Bench",
465
  body="The respondent answers the hinge question.",
466
- turns=[_turn("Counsel Sable", "respondent advocate", respondent_answer, GPT_OSS_MODEL, 0.88)],
467
  evidence=packet.evidence,
468
  ),
469
  delay,
@@ -474,17 +525,20 @@ def stream_trial(
474
  model_calls,
475
  agent="Nemotron Jury",
476
  role="juror panel",
477
- model=NEMOTRON_MODEL,
478
- case_summary=case_summary,
479
- evidence_summary=evidence_summary,
480
- task="Announce that the six named jurors retire to vote. Do not reveal the votes yet.",
 
 
481
  provider=NEMOTRON_PROVIDER,
482
  max_tokens=100,
483
  )
484
- yield _emit(
485
- packet,
486
- source_trace,
487
- model_calls,
 
488
  TrialEvent(
489
  phase="deliberation",
490
  title="The Jury Retires",
@@ -495,29 +549,35 @@ def stream_trial(
495
  delay,
496
  )
497
 
498
- jury_votes_result = _required_role(
499
- model_runner,
500
- model_calls,
501
- agent="Nemotron Jury",
502
- role="juror vote generator",
503
- model=NEMOTRON_MODEL,
504
- case_summary=case_summary,
505
- evidence_summary=evidence_summary,
506
- task=_jury_task(),
507
- provider=NEMOTRON_PROVIDER,
508
- max_tokens=650,
509
- )
510
- votes = _parse_jury_votes(jury_votes_result, packet)
511
- for vote in votes:
512
- juror_result = ModelResult(
513
- text=f"{vote.vote.replace('_', ' ').title()}. {vote.reason}",
514
- call=jury_votes_result.call,
515
- input_text=jury_votes_result.input_text,
516
- )
517
- yield _emit(
518
- packet,
519
- source_trace,
520
- model_calls,
 
 
 
 
 
 
521
  TrialEvent(
522
  phase="deliberation",
523
  title=f"Juror {vote.juror} Votes",
@@ -538,18 +598,22 @@ def stream_trial(
538
  model=GPT_OSS_MODEL,
539
  case_summary=case_summary,
540
  evidence_summary=evidence_summary,
 
 
 
541
  task=(
542
  f"As {JUDGE_NAME}, announce the final legal finding after the jury vote with Stoic restraint. "
543
  f"Finding: {verdict.finding}. "
544
- f"Jury rationale: {verdict.rationale} Remedy: {verdict.remedy}. Include uncertainty without disclaiming the role."
545
  ),
546
  provider=OPENAI_PROVIDER,
547
  max_tokens=160,
548
  )
549
- yield _emit(
550
- packet,
551
- source_trace,
552
- model_calls,
 
553
  TrialEvent(
554
  phase="verdict",
555
  title="The Court Announces Judgment",
 
9
  from pydantic import ValidationError
10
 
11
  from .cases import get_case
12
+ from .llm import ModelCall, ModelCallError, ModelResult, call_small_model, clean_model_text
13
  from .models import AgentTurn, CasePacket, JurorVote, TrialEvent, TrialRequest, Verdict
14
  from .retrieval import build_live_case
15
 
 
20
  OPENBMB_PROVIDER = "featherless-ai"
21
  NEMOTRON_PROVIDER = "featherless-ai"
22
 
23
+ MODEL_BUDGET = [
24
+ ("Presiding Advocate", GPT_OSS_MODEL, 20.0),
25
+ ("Clerk of Style", OPENBMB_MODEL, 4.0),
26
+ ("Jury Ring", NEMOTRON_MODEL, 8.0),
27
+ ]
28
  TOTAL_PARAMS_B = sum(item[2] for item in MODEL_BUDGET)
29
 
30
  JUDGE_NAME = "Marcus Aurelius"
 
59
  )
60
 
61
 
62
+ def _case_summary(packet: CasePacket) -> str:
63
+ context = packet.context or packet.setting
64
+ return (
65
+ f"{packet.title}. Charge: {packet.charge}\n"
66
+ f"Context: {context}\n"
67
+ f"Claimant: {packet.claimant_claim}\n"
68
+ f"Respondent: {packet.respondent_claim}"
69
+ )
70
 
71
 
72
  def _evidence_summary(packet: CasePacket) -> str:
 
80
  return [call.__dict__ for call in calls]
81
 
82
 
83
+ def resolve_case(request: TrialRequest) -> tuple[CasePacket, dict]:
84
+ if request.case_id == "custom":
85
+ if request.custom_case is None:
86
+ raise RuntimeError("Custom case requires trial details and evidence before the court can begin.")
87
+ return request.custom_case, {"mode": "custom"}
88
+ if request.case_id == "live":
89
  packet = build_live_case(request.search_query, request.hypothetical)
90
  if packet:
91
  return packet, {"mode": "live"}
 
105
  except Exception as exc:
106
  raise RequiredModelError(f"{kwargs.get('agent', 'Model')} unavailable: {exc}") from exc
107
  model_calls.append(result.call)
108
+ if not result.call.ok:
109
+ error = result.call.error or "model call did not complete"
110
+ raise RequiredModelError(f"{kwargs.get('agent', 'Model')} unavailable: {error}")
111
+ try:
112
+ result.text = clean_model_text(result.text)
113
+ except ModelCallError as exc:
114
+ raise RequiredModelError(f"{kwargs.get('agent', 'Model')} returned non-dialogue output: {exc}") from exc
115
+ if not result.text.strip():
116
+ raise RequiredModelError(f"{kwargs.get('agent', 'Model')} returned an empty response.")
117
+ return result
118
 
119
 
120
  def _trace(packet: CasePacket, source_trace: dict, model_calls: list[ModelCall]) -> dict:
 
129
  }
130
 
131
 
132
+ def _emit(
133
  packet: CasePacket,
134
  source_trace: dict,
135
  model_calls: list[ModelCall],
 
139
  event.trace = _trace(packet, source_trace, model_calls)
140
  if delay > 0:
141
  time.sleep(delay)
142
+ return event
143
+
144
+
145
+ def _record_and_emit(
146
+ events: list[TrialEvent],
147
+ packet: CasePacket,
148
+ source_trace: dict,
149
+ model_calls: list[ModelCall],
150
+ event: TrialEvent,
151
+ delay: float,
152
+ ) -> TrialEvent:
153
+ emitted = _emit(packet, source_trace, model_calls, event, delay)
154
+ events.append(emitted)
155
+ return emitted
156
+
157
+
158
+ def _compact(value: str, limit: int = 420) -> str:
159
+ text = " ".join(value.split())
160
+ return text if len(text) <= limit else text[: limit - 3].rstrip() + "..."
161
+
162
+
163
+ def _trial_history(events: list[TrialEvent]) -> str:
164
+ if not events:
165
+ return "No trial statements have been made yet."
166
+ lines = []
167
+ for index, event in enumerate(events, start=1):
168
+ if event.turns:
169
+ turn = event.turns[0]
170
+ lines.append(
171
+ f"{index}. {event.phase} / {event.title} - {turn.agent} ({turn.role}): {_compact(turn.content)}"
172
+ )
173
+ elif event.body:
174
+ lines.append(f"{index}. {event.phase} / {event.title}: {_compact(event.body)}")
175
+ for vote in event.votes:
176
+ lines.append(
177
+ f" Vote - {vote.juror}: {vote.vote}; reason: {_compact(vote.reason, 220)}; evidence: {', '.join(vote.evidence_ids)}"
178
+ )
179
+ return "\n".join(lines)
180
+
181
+
182
+ def _extract_json(text: str) -> object:
183
  stripped = text.strip()
184
  if stripped.startswith("```"):
185
  stripped = re.sub(r"^```(?:json)?\s*", "", stripped, flags=re.I)
 
193
  return json.loads(match.group(1))
194
 
195
 
196
+ def _parse_juror_vote(result: ModelResult, packet: CasePacket, juror: str) -> JurorVote:
197
+ try:
198
+ data = _extract_json(result.text)
199
+ except json.JSONDecodeError as exc:
200
+ raise RequiredModelError(f"{juror} returned invalid JSON: {exc.msg}") from exc
201
+ if isinstance(data, dict) and isinstance(data.get("votes"), list):
202
+ if len(data["votes"]) != 1:
203
+ raise RequiredModelError(f"{juror} must return exactly one vote.")
204
+ data = data["votes"][0]
205
+ if not isinstance(data, dict):
206
+ raise RequiredModelError(f"{juror} vote output must be a JSON object.")
207
+
208
+ try:
209
+ vote = JurorVote.model_validate(data)
210
+ except ValidationError as exc:
211
+ raise RequiredModelError(f"{juror} vote schema is invalid: {exc.errors()[0]['msg']}") from exc
212
+
213
+ known_evidence = {item.id for item in packet.evidence}
214
+ expected_persona = JUROR_PERSONAS[juror]
215
+ if vote.juror != juror:
216
+ raise RequiredModelError(f"{juror} vote must use juror '{juror}'.")
217
+ if vote.persona.strip().lower() != expected_persona:
218
+ raise RequiredModelError(f"{juror} persona must be '{expected_persona}'.")
219
+ if not vote.reason.strip():
220
+ raise RequiredModelError(f"{juror} must include a rationale.")
221
+ if not vote.evidence_ids or any(evidence_id not in known_evidence for evidence_id in vote.evidence_ids):
222
+ raise RequiredModelError(f"{juror} must cite known evidence IDs.")
223
+ return vote
224
+
225
+
226
+ def _majority_finding(votes: list[JurorVote]) -> str:
 
 
 
 
227
  counts = Counter(vote.vote for vote in votes)
228
  top = counts.most_common()
229
  if not top:
 
270
  )
271
 
272
 
273
+ def _juror_task(juror: str, persona: str) -> str:
 
274
  return (
275
+ f"After watching the trial, vote as {juror}. Your worldview is: {persona}. "
276
+ "Return exactly one JSON object with keys juror, persona, vote, reason, and evidence_ids. "
277
+ "Valid vote values are liable, not_liable, uncertain. The persona value must exactly match your worldview. "
278
+ "The reason must be one concise sentence grounded in your beliefs and the record. Cite evidence IDs from the record."
 
 
279
  )
280
 
281
 
 
289
  model_runner: ModelRunner | None = None,
290
  ) -> Iterable[TrialEvent]:
291
  packet, source_trace = resolve_case(request)
292
+ case_summary = _case_summary(packet)
293
+ evidence_summary = _evidence_summary(packet)
294
+ model_calls: list[ModelCall] = []
295
+ events: list[TrialEvent] = []
296
+ hypo = request.hypothetical.strip()
297
  hypo_line = f"\n\nUser hypothetical admitted as a blue-ribbon sidebar: {hypo}" if hypo else ""
298
 
299
  clerk = _required_role(
 
304
  model=OPENBMB_MODEL,
305
  case_summary=case_summary,
306
  evidence_summary=evidence_summary,
307
+ task="Begin with 'I call'. Announce the case by name, identify the parties, and read the charge.",
308
  provider=OPENBMB_PROVIDER,
309
  max_tokens=110,
310
  )
311
+ yield _record_and_emit(
312
+ events,
313
+ packet,
314
+ source_trace,
315
+ model_calls,
316
  TrialEvent(
317
  phase="intake",
318
  title="The Court Convenes",
 
331
  model=GPT_OSS_MODEL,
332
  case_summary=case_summary,
333
  evidence_summary=evidence_summary,
334
+ trial_history=_trial_history(events),
335
+ persona=JUDGE_PERSONA,
336
+ objective="Set a fair standard for hearing both sides.",
337
  task=(
338
  f"As {JUDGE_NAME}, a Stoic courtroom judge guided by {JUDGE_PERSONA}, explain the proceeding "
339
+ "and the burden of proof in one or two disciplined sentences using I or we."
340
  ),
341
  provider=OPENAI_PROVIDER,
342
  max_tokens=110,
343
  )
344
+ yield _record_and_emit(
345
+ events,
346
+ packet,
347
+ source_trace,
348
+ model_calls,
349
  TrialEvent(
350
  phase="intake",
351
  title="The Burden Is Set",
 
359
  claimant_opening = _required_role(
360
  model_runner,
361
  model_calls,
362
+ agent="Mike OSS",
363
  role="claimant advocate",
364
  model=GPT_OSS_MODEL,
365
+ case_summary=case_summary,
366
+ evidence_summary=evidence_summary,
367
+ trial_history=_trial_history(events),
368
+ objective="Win the case for the claimant using the strongest fair reading of the record.",
369
+ task="Make the claimant's opening statement alone, speaking as I for the claimant. Cite the strongest claimant-side exhibit.",
370
  provider=OPENAI_PROVIDER,
371
  max_tokens=130,
372
  )
373
+ yield _record_and_emit(
374
+ events,
375
+ packet,
376
+ source_trace,
377
+ model_calls,
378
  TrialEvent(
379
  phase="claims",
380
  title="Claimant Opening",
381
  body=packet.claimant_claim,
382
+ turns=[_turn("Mike OSS", "claimant advocate", claimant_opening, GPT_OSS_MODEL, 0.88)],
383
  evidence=packet.evidence,
384
  ),
385
  delay,
 
388
  respondent_opening = _required_role(
389
  model_runner,
390
  model_calls,
391
+ agent="Harvey Vector",
392
  role="respondent advocate",
393
  model=GPT_OSS_MODEL,
394
+ case_summary=case_summary,
395
+ evidence_summary=evidence_summary,
396
+ trial_history=_trial_history(events),
397
+ objective="Win the case for the respondent using doubt, context, and the strongest fair reading of the record.",
398
+ task="Make the respondent's opening statement alone, speaking as I for the respondent. Emphasize uncertainty and cite a helpful exhibit.",
399
  provider=OPENAI_PROVIDER,
400
  max_tokens=130,
401
  )
402
+ yield _record_and_emit(
403
+ events,
404
+ packet,
405
+ source_trace,
406
+ model_calls,
407
  TrialEvent(
408
  phase="opening",
409
  title="Respondent Opening",
410
  body=packet.respondent_claim,
411
+ turns=[_turn("Harvey Vector", "respondent advocate", respondent_opening, GPT_OSS_MODEL, 0.88)],
412
  evidence=packet.evidence,
413
  ),
414
  delay,
415
  )
416
 
417
+ yield _record_and_emit(
418
+ events,
419
+ packet,
420
+ source_trace,
421
+ model_calls,
422
+ TrialEvent(
423
+ phase="evidence",
424
+ title="The Evidence Record",
425
+ body="\n".join(f"{item.id}: {item.title} | reliability {item.reliability:.2f} | {item.note}" for item in packet.evidence),
426
+ turns=[],
427
+ evidence=packet.evidence,
428
+ ),
429
+ delay,
 
 
 
 
 
 
 
 
 
 
 
430
  )
431
 
432
  judge_question = _required_role(
 
437
  model=GPT_OSS_MODEL,
438
  case_summary=case_summary,
439
  evidence_summary=evidence_summary,
440
+ trial_history=_trial_history(events),
441
+ persona=JUDGE_PERSONA,
442
+ objective="Ask the question most likely to reveal which side has met its burden.",
443
  task=(
444
  f"As {JUDGE_NAME}, ask one sharp hinge question that would change the outcome if answered. "
445
+ "Use Stoic restraint and public reason, speaking from the bench as I or we."
446
  ),
447
  provider=OPENAI_PROVIDER,
448
  max_tokens=100,
449
  )
450
+ yield _record_and_emit(
451
+ events,
452
+ packet,
453
+ source_trace,
454
+ model_calls,
455
  TrialEvent(
456
  phase="questions",
457
  title="The Hinge Question",
 
465
  claimant_answer = _required_role(
466
  model_runner,
467
  model_calls,
468
+ agent="Mike OSS",
469
  role="claimant advocate",
470
  model=GPT_OSS_MODEL,
471
  case_summary=case_summary,
472
  evidence_summary=evidence_summary,
473
+ trial_history=_trial_history(events),
474
+ objective="Answer the judge in the way most favorable to the claimant.",
475
+ task=f"Answer {JUDGE_NAME}'s hinge question as I for the claimant: {judge_question.text}",
476
  provider=OPENAI_PROVIDER,
477
  max_tokens=130,
478
  )
479
+ yield _record_and_emit(
480
+ events,
481
+ packet,
482
+ source_trace,
483
+ model_calls,
484
  TrialEvent(
485
  phase="questions",
486
  title="Claimant Answers the Bench",
487
  body="The claimant answers the hinge question.",
488
+ turns=[_turn("Mike OSS", "claimant advocate", claimant_answer, GPT_OSS_MODEL, 0.88)],
489
  evidence=packet.evidence,
490
  ),
491
  delay,
 
494
  respondent_answer = _required_role(
495
  model_runner,
496
  model_calls,
497
+ agent="Harvey Vector",
498
  role="respondent advocate",
499
  model=GPT_OSS_MODEL,
500
  case_summary=case_summary,
501
  evidence_summary=evidence_summary,
502
+ trial_history=_trial_history(events),
503
+ objective="Answer the judge in the way most favorable to the respondent.",
504
+ task=f"Answer {JUDGE_NAME}'s hinge question as I for the respondent: {judge_question.text}",
505
  provider=OPENAI_PROVIDER,
506
  max_tokens=130,
507
  )
508
+ yield _record_and_emit(
509
+ events,
510
+ packet,
511
+ source_trace,
512
+ model_calls,
513
  TrialEvent(
514
  phase="questions",
515
  title="Respondent Answers the Bench",
516
  body="The respondent answers the hinge question.",
517
+ turns=[_turn("Harvey Vector", "respondent advocate", respondent_answer, GPT_OSS_MODEL, 0.88)],
518
  evidence=packet.evidence,
519
  ),
520
  delay,
 
525
  model_calls,
526
  agent="Nemotron Jury",
527
  role="juror panel",
528
+ model=NEMOTRON_MODEL,
529
+ case_summary=case_summary,
530
+ evidence_summary=evidence_summary,
531
+ trial_history=_trial_history(events),
532
+ objective="Move the court from arguments into individual jury votes.",
533
+ task="Announce as we, the six named jurors, that we retire to vote. Do not reveal the votes yet.",
534
  provider=NEMOTRON_PROVIDER,
535
  max_tokens=100,
536
  )
537
+ yield _record_and_emit(
538
+ events,
539
+ packet,
540
+ source_trace,
541
+ model_calls,
542
  TrialEvent(
543
  phase="deliberation",
544
  title="The Jury Retires",
 
549
  delay,
550
  )
551
 
552
+ votes: list[JurorVote] = []
553
+ for juror, persona in JUROR_PERSONAS.items():
554
+ juror_vote_result = _required_role(
555
+ model_runner,
556
+ model_calls,
557
+ agent=juror,
558
+ role="juror",
559
+ model=NEMOTRON_MODEL,
560
+ case_summary=case_summary,
561
+ evidence_summary=evidence_summary,
562
+ trial_history=_trial_history(events),
563
+ persona=persona,
564
+ objective="Reach the verdict this historical worldview would consider right after watching the trial.",
565
+ task=_juror_task(juror, persona),
566
+ provider=NEMOTRON_PROVIDER,
567
+ max_tokens=220,
568
+ )
569
+ vote = _parse_juror_vote(juror_vote_result, packet, juror)
570
+ votes.append(vote)
571
+ juror_result = ModelResult(
572
+ text=f"I vote {vote.vote.replace('_', ' ').title()}. {vote.reason}",
573
+ call=juror_vote_result.call,
574
+ input_text=juror_vote_result.input_text,
575
+ )
576
+ yield _record_and_emit(
577
+ events,
578
+ packet,
579
+ source_trace,
580
+ model_calls,
581
  TrialEvent(
582
  phase="deliberation",
583
  title=f"Juror {vote.juror} Votes",
 
598
  model=GPT_OSS_MODEL,
599
  case_summary=case_summary,
600
  evidence_summary=evidence_summary,
601
+ trial_history=_trial_history(events),
602
+ persona=JUDGE_PERSONA,
603
+ objective="Announce the jury result fairly, summarize both sides, and do not override the jury.",
604
  task=(
605
  f"As {JUDGE_NAME}, announce the final legal finding after the jury vote with Stoic restraint. "
606
  f"Finding: {verdict.finding}. "
607
+ f"Jury rationale: {verdict.rationale} Remedy: {verdict.remedy}. Speak as I from the bench and include uncertainty without disclaiming the role."
608
  ),
609
  provider=OPENAI_PROVIDER,
610
  max_tokens=160,
611
  )
612
+ yield _record_and_emit(
613
+ events,
614
+ packet,
615
+ source_trace,
616
+ model_calls,
617
  TrialEvent(
618
  phase="verdict",
619
  title="The Court Announces Judgment",
sovereign_bench/llm.py CHANGED
@@ -69,6 +69,21 @@ def _response_text(response: object) -> str:
69
  return ""
70
 
71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  def clean_model_text(text: str) -> str:
73
  cleaned = re.sub(r"(?is)<think>.*?</think>", "", text).strip()
74
  if re.search(r"(?i)<think>", cleaned):
@@ -76,6 +91,26 @@ def clean_model_text(text: str) -> str:
76
  cleaned = re.sub(r"(?is)<analysis>.*?</analysis>", "", cleaned).strip()
77
  cleaned = re.sub(r"(?is)<reasoning>.*?</reasoning>", "", cleaned).strip()
78
  cleaned = cleaned.replace("</think>", "").strip()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  if not cleaned:
80
  raise ModelCallError("model returned no visible output")
81
  return cleaned
@@ -108,7 +143,9 @@ def call_hf_chat_model(
108
  "role": "user",
109
  "content": (
110
  "Your previous response did not include visible courtroom dialogue. "
111
- "Return only the final spoken dialogue now. Do not include <think>, analysis, reasoning, markdown, or notes. /no_think"
 
 
112
  ),
113
  }
114
  ]
@@ -166,6 +203,9 @@ def call_small_model(
166
  case_summary: str,
167
  task: str,
168
  evidence_summary: str,
 
 
 
169
  provider: str = "auto",
170
  max_tokens: int = 120,
171
  ) -> ModelResult:
@@ -175,6 +215,9 @@ def call_small_model(
175
  case_summary=case_summary,
176
  task=task,
177
  evidence_summary=evidence_summary,
 
 
 
178
  )
179
  result = call_hf_chat_model(
180
  model=model,
@@ -193,17 +236,61 @@ def build_role_messages(
193
  case_summary: str,
194
  task: str,
195
  evidence_summary: str,
 
 
 
196
  ) -> list[dict[str, str]]:
 
 
197
  system = (
198
  "You are one AI character in Sovereign Bench, a miniature virtual courtroom. "
199
- "Write concise courtroom dialogue only. Cite evidence IDs when relevant. "
 
200
  "Do not claim certainty beyond the record. Do not add markdown. "
201
- "Return final spoken dialogue only; never reveal hidden reasoning, analysis, or <think> text. "
202
  "Do not use thinking mode."
203
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
204
  user = (
205
  f"Agent: {agent}\nRole: {role}\nCase:\n{case_summary}\n\n"
206
- f"Evidence:\n{evidence_summary}\n\nTask: {task}\n"
207
- "Answer in 1-3 sentences, theatrical but clear.\n/no_think"
 
208
  )
209
  return [{"role": "system", "content": system}, {"role": "user", "content": user}]
 
69
  return ""
70
 
71
 
72
+ INSTRUCTION_ECHO_RE = re.compile(
73
+ r"(?is)\b("
74
+ r"as requested|"
75
+ r"first[- ]person|"
76
+ r"pronoun|"
77
+ r"1\s*-\s*3 sentences|"
78
+ r"theatrical but clear|"
79
+ r"i will speak as|"
80
+ r"i will now (?:announce|answer|respond|deliver|speak)|"
81
+ r"as the assigned agent|"
82
+ r"the task"
83
+ r")\b"
84
+ )
85
+
86
+
87
  def clean_model_text(text: str) -> str:
88
  cleaned = re.sub(r"(?is)<think>.*?</think>", "", text).strip()
89
  if re.search(r"(?i)<think>", cleaned):
 
91
  cleaned = re.sub(r"(?is)<analysis>.*?</analysis>", "", cleaned).strip()
92
  cleaned = re.sub(r"(?is)<reasoning>.*?</reasoning>", "", cleaned).strip()
93
  cleaned = cleaned.replace("</think>", "").strip()
94
+ channel_match = re.search(r"(?ims)^\s*(?:final|assistant_final)\s*:?\s*(.+)\Z", cleaned)
95
+ if channel_match:
96
+ cleaned = channel_match.group(1).strip()
97
+ else:
98
+ final_after_analysis = re.search(
99
+ r"(?ims)^\s*(?:analysis|reasoning|assistant_analysis)\s*:?.*?^\s*(?:final|assistant_final)\s*:?\s*(.+)\Z",
100
+ cleaned,
101
+ )
102
+ if final_after_analysis:
103
+ cleaned = final_after_analysis.group(1).strip()
104
+ elif re.search(r"(?im)^\s*(?:analysis|reasoning|assistant_analysis)\s*:?", cleaned):
105
+ raise ModelCallError("model returned hidden analysis instead of courtroom dialogue")
106
+ if re.search(r"(?i)\b(?:analysis|reasoning)\s*:", cleaned[:80]):
107
+ raise ModelCallError("model returned hidden analysis instead of courtroom dialogue")
108
+ if INSTRUCTION_ECHO_RE.search(cleaned[:420]):
109
+ pieces = [piece.strip() for piece in re.split(r"\n\s*\n", cleaned) if piece.strip()]
110
+ dialogue_pieces = [piece for piece in pieces if not INSTRUCTION_ECHO_RE.search(piece)]
111
+ if not dialogue_pieces:
112
+ raise ModelCallError("model echoed instructions instead of courtroom dialogue")
113
+ cleaned = "\n\n".join(dialogue_pieces).strip()
114
  if not cleaned:
115
  raise ModelCallError("model returned no visible output")
116
  return cleaned
 
143
  "role": "user",
144
  "content": (
145
  "Your previous response did not include visible courtroom dialogue. "
146
+ "Return only the final first-person spoken dialogue now, as the assigned agent. "
147
+ "Do not mention prompts, tasks, requirements, pronouns, sentence counts, or that you are following instructions. "
148
+ "Do not include <think>, analysis, reasoning, markdown, narration, or notes. /no_think"
149
  ),
150
  }
151
  ]
 
203
  case_summary: str,
204
  task: str,
205
  evidence_summary: str,
206
+ trial_history: str = "",
207
+ persona: str = "",
208
+ objective: str = "",
209
  provider: str = "auto",
210
  max_tokens: int = 120,
211
  ) -> ModelResult:
 
215
  case_summary=case_summary,
216
  task=task,
217
  evidence_summary=evidence_summary,
218
+ trial_history=trial_history,
219
+ persona=persona,
220
+ objective=objective,
221
  )
222
  result = call_hf_chat_model(
223
  model=model,
 
236
  case_summary: str,
237
  task: str,
238
  evidence_summary: str,
239
+ trial_history: str = "",
240
+ persona: str = "",
241
+ objective: str = "",
242
  ) -> list[dict[str, str]]:
243
+ vote_role = role == "juror"
244
+ dialogue_role = not vote_role
245
  system = (
246
  "You are one AI character in Sovereign Bench, a miniature virtual courtroom. "
247
+ "Stay fully in character as the assigned Agent and Role. "
248
+ "Use the case facts and evidence provided below; cite evidence IDs when relevant. "
249
  "Do not claim certainty beyond the record. Do not add markdown. "
250
+ "Never reveal hidden reasoning, analysis, or <think> text. "
251
  "Do not use thinking mode."
252
  )
253
+ if role in {"claimant advocate", "respondent advocate"}:
254
+ system += (
255
+ " You are a lawyer trying to win for your side. Use the evidence, the other side's claims, "
256
+ "and the trial record to make the strongest fair argument available."
257
+ )
258
+ elif role in {"judge", "verdict writer"}:
259
+ system += (
260
+ " You are a fair judge. Consider both sides, the evidence, and the trial record. "
261
+ "At verdict, announce and contextualize the jury result rather than replacing it with your own preferred outcome."
262
+ )
263
+ elif role == "juror":
264
+ system += (
265
+ " You are an individual juror. Decide through your named worldview and the trial transcript, "
266
+ "not a generic juror role. Output only valid JSON for your vote."
267
+ )
268
+ elif role == "juror panel":
269
+ system += " You speak for the jury panel procedurally; do not reveal votes before deliberation."
270
+ elif role == "clerk":
271
+ system += " You are a procedural courtroom role; present the record clearly without deciding the verdict."
272
+
273
+ if dialogue_role:
274
+ system += (
275
+ " Output only the words this character says aloud in court. "
276
+ "Use I, me, my, we, or our naturally when the role calls for it. "
277
+ "Do not narrate about yourself in the third person. Do not summarize what the agent would say."
278
+ )
279
+ answer_instruction = (
280
+ f"Speak as {agent}. Give only the in-scene court line, 1-3 concise sentences."
281
+ )
282
+ else:
283
+ answer_instruction = (
284
+ "Return only the requested JSON object. "
285
+ "Do not add dialogue, markdown, or commentary."
286
+ )
287
+ persona_block = f"\nPersona / worldview:\n{persona}\n" if persona else ""
288
+ objective_block = f"\nObjective:\n{objective}\n" if objective else ""
289
+ history_block = f"\nTrial history so far:\n{trial_history}\n" if trial_history else ""
290
  user = (
291
  f"Agent: {agent}\nRole: {role}\nCase:\n{case_summary}\n\n"
292
+ f"Evidence:\n{evidence_summary}\n"
293
+ f"{persona_block}{objective_block}{history_block}\nTask: {task}\n"
294
+ f"{answer_instruction}\n/no_think"
295
  )
296
  return [{"role": "system", "content": system}, {"role": "user", "content": user}]
sovereign_bench/models.py CHANGED
@@ -35,6 +35,7 @@ class CasePacket(BaseModel):
35
  respondent: str
36
  charge: str
37
  setting: str
 
38
  claimant_claim: str
39
  respondent_claim: str
40
  source_note: str
@@ -45,6 +46,7 @@ class TrialRequest(BaseModel):
45
  case_id: str = "socrates"
46
  search_query: str = ""
47
  hypothetical: str = ""
 
48
  speed: Literal["swift", "measured", "ceremonial"] = "swift"
49
  mind_layer: bool = True
50
 
 
35
  respondent: str
36
  charge: str
37
  setting: str
38
+ context: str = ""
39
  claimant_claim: str
40
  respondent_claim: str
41
  source_note: str
 
46
  case_id: str = "socrates"
47
  search_query: str = ""
48
  hypothetical: str = ""
49
+ custom_case: CasePacket | None = None
50
  speed: Literal["swift", "measured", "ceremonial"] = "swift"
51
  mind_layer: bool = True
52
 
tests/test_cases.py CHANGED
@@ -2,7 +2,15 @@ from sovereign_bench.cases import CASES
2
 
3
 
4
  def test_cached_cases_have_evidence():
5
- assert {"socrates", "barnaby"} <= set(CASES)
6
  for case in CASES.values():
7
  assert len(case.evidence) >= 4
8
  assert all(item.id and item.excerpt for item in case.evidence)
 
 
 
 
 
 
 
 
 
2
 
3
 
4
  def test_cached_cases_have_evidence():
5
+ assert {"socrates", "greg", "barnaby"} <= set(CASES)
6
  for case in CASES.values():
7
  assert len(case.evidence) >= 4
8
  assert all(item.id and item.excerpt for item in case.evidence)
9
+
10
+
11
+ def test_demo_cases_have_book_context_and_three_items_per_side():
12
+ for case_id in ["socrates", "greg"]:
13
+ case = CASES[case_id]
14
+ assert case.context
15
+ assert len([item for item in case.evidence if item.supports == "claimant"]) >= 3
16
+ assert len([item for item in case.evidence if item.supports == "respondent"]) >= 3
tests/test_engine.py CHANGED
@@ -3,39 +3,36 @@ import re
3
 
4
  import pytest
5
 
6
- from sovereign_bench.engine import JUDGE_NAME, JUROR_PERSONAS, RequiredModelError, run_trial
7
- from sovereign_bench.llm import ModelCall, ModelResult
8
- from sovereign_bench.models import TrialRequest
9
 
10
 
11
- def _jury_json(evidence_summary: str, vote: str = "liable") -> str:
12
- evidence_ids = re.findall(r"^([A-Z]+-E\d+):", evidence_summary, flags=re.M)
13
- evidence_ids = (evidence_ids or ["SOC-E1"]) * 6
14
  return json.dumps(
15
  {
16
- "votes": [
17
- {
18
- "juror": name,
19
- "persona": persona,
20
- "vote": vote if idx < 4 else "not_liable",
21
- "reason": f"{name} applies a {persona} lens to exhibit {evidence_ids[idx]}.",
22
- "evidence_ids": [evidence_ids[idx]],
23
- }
24
- for idx, (name, persona) in enumerate(JUROR_PERSONAS.items())
25
- ]
26
  }
27
  )
28
 
29
 
30
  def fake_model_runner(**kwargs):
31
  text = (
32
- _jury_json(kwargs["evidence_summary"])
33
- if kwargs["role"] == "juror vote generator"
34
  else f"{kwargs['agent']} responds to: {kwargs['task']}"
35
  )
36
  prompt = (
37
  f"SYSTEM:\nFake live model for tests.\n\nUSER:\n"
38
- f"Agent: {kwargs['agent']}\nRole: {kwargs['role']}\nTask: {kwargs['task']}\n\nASSISTANT:\n"
 
 
39
  )
40
  return ModelResult(
41
  text=text,
@@ -54,12 +51,11 @@ def test_cached_cases_emit_sequential_speaker_order():
54
  expected_speakers = [
55
  "Clerk Meridian",
56
  JUDGE_NAME,
57
- "Advocate Auric",
58
- "Counsel Sable",
59
- "Auditor Prism",
60
  JUDGE_NAME,
61
- "Advocate Auric",
62
- "Counsel Sable",
63
  "Nemotron Jury",
64
  *list(JUROR_PERSONAS),
65
  JUDGE_NAME,
@@ -67,7 +63,10 @@ def test_cached_cases_emit_sequential_speaker_order():
67
  for case_id in ["socrates", "barnaby"]:
68
  events = run_trial(TrialRequest(case_id=case_id), model_runner=fake_model_runner)
69
 
70
- assert [event.turns[0].agent for event in events] == expected_speakers
 
 
 
71
  assert [event.phase for event in events].count("deliberation") == 7
72
  assert events[0].turns[0].input
73
  assert "SYSTEM:" in events[0].turns[0].input
@@ -81,12 +80,12 @@ def test_no_event_contains_both_lawyers_speaking_together():
81
 
82
  for event in events:
83
  agents = {turn.agent for turn in event.turns}
84
- assert not {"Advocate Auric", "Counsel Sable"}.issubset(agents)
85
 
86
 
87
  def test_juror_vote_events_have_fixed_personas_and_evidence():
88
  events = run_trial(TrialRequest(case_id="socrates"), model_runner=fake_model_runner)
89
- juror_events = [event for event in events if event.turns[0].agent in JUROR_PERSONAS]
90
 
91
  assert len(juror_events) == 6
92
  for event in juror_events:
@@ -94,6 +93,7 @@ def test_juror_vote_events_have_fixed_personas_and_evidence():
94
  assert vote.juror == event.turns[0].agent
95
  assert vote.persona == JUROR_PERSONAS[vote.juror]
96
  assert vote.vote in {"liable", "not_liable", "uncertain"}
 
97
  assert vote.reason
98
  assert vote.evidence_ids
99
 
@@ -102,6 +102,95 @@ def test_juror_vote_events_have_fixed_personas_and_evidence():
102
  assert [vote.juror for vote in final.votes] == list(JUROR_PERSONAS)
103
 
104
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
  def test_jury_contract_uses_public_history_personas():
106
  assert JUDGE_NAME == "Marcus Aurelius"
107
  assert JUROR_PERSONAS == {
@@ -114,6 +203,94 @@ def test_jury_contract_uses_public_history_personas():
114
  }
115
 
116
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  def test_required_model_failure_stops_trial_without_canned_dialogue():
118
  def failing_runner(**kwargs):
119
  return ModelResult(
@@ -136,7 +313,7 @@ def test_required_model_failure_stops_trial_without_canned_dialogue():
136
  def test_invalid_jury_output_stops_trial_without_fallback_votes():
137
  def invalid_jury_runner(**kwargs):
138
  result = fake_model_runner(**kwargs)
139
- if kwargs["role"] == "juror vote generator":
140
  result.text = "the jury refuses structured output"
141
  return result
142
 
 
3
 
4
  import pytest
5
 
6
+ from sovereign_bench.engine import JUDGE_NAME, JUROR_PERSONAS, RequiredModelError, run_trial, stream_trial
7
+ from sovereign_bench.llm import ModelCall, ModelResult, build_role_messages, clean_model_text
8
+ from sovereign_bench.models import CasePacket, EvidenceItem, TrialRequest
9
 
10
 
11
+ def _juror_json(kwargs, vote: str = "liable") -> str:
12
+ evidence_ids = re.findall(r"^([A-Z]+-[A-Z]\d+):", kwargs["evidence_summary"], flags=re.M)
13
+ evidence_id = (evidence_ids or ["SOC-E1"])[0]
14
  return json.dumps(
15
  {
16
+ "juror": kwargs["agent"],
17
+ "persona": kwargs["persona"],
18
+ "vote": vote,
19
+ "reason": f"{kwargs['agent']} applies {kwargs['persona']} to exhibit {evidence_id}.",
20
+ "evidence_ids": [evidence_id],
 
 
 
 
 
21
  }
22
  )
23
 
24
 
25
  def fake_model_runner(**kwargs):
26
  text = (
27
+ _juror_json(kwargs, vote="liable" if list(JUROR_PERSONAS).index(kwargs["agent"]) < 4 else "not_liable")
28
+ if kwargs["role"] == "juror"
29
  else f"{kwargs['agent']} responds to: {kwargs['task']}"
30
  )
31
  prompt = (
32
  f"SYSTEM:\nFake live model for tests.\n\nUSER:\n"
33
+ f"Agent: {kwargs['agent']}\nRole: {kwargs['role']}\n"
34
+ f"Persona: {kwargs.get('persona', '')}\nObjective: {kwargs.get('objective', '')}\n"
35
+ f"History: {kwargs.get('trial_history', '')}\nTask: {kwargs['task']}\n\nASSISTANT:\n"
36
  )
37
  return ModelResult(
38
  text=text,
 
51
  expected_speakers = [
52
  "Clerk Meridian",
53
  JUDGE_NAME,
54
+ "Mike OSS",
55
+ "Harvey Vector",
 
56
  JUDGE_NAME,
57
+ "Mike OSS",
58
+ "Harvey Vector",
59
  "Nemotron Jury",
60
  *list(JUROR_PERSONAS),
61
  JUDGE_NAME,
 
63
  for case_id in ["socrates", "barnaby"]:
64
  events = run_trial(TrialRequest(case_id=case_id), model_runner=fake_model_runner)
65
 
66
+ assert [event.turns[0].agent for event in events if event.turns] == expected_speakers
67
+ evidence_event = next(event for event in events if event.phase == "evidence")
68
+ assert evidence_event.title == "The Evidence Record"
69
+ assert evidence_event.turns == []
70
  assert [event.phase for event in events].count("deliberation") == 7
71
  assert events[0].turns[0].input
72
  assert "SYSTEM:" in events[0].turns[0].input
 
80
 
81
  for event in events:
82
  agents = {turn.agent for turn in event.turns}
83
+ assert not {"Mike OSS", "Harvey Vector"}.issubset(agents)
84
 
85
 
86
  def test_juror_vote_events_have_fixed_personas_and_evidence():
87
  events = run_trial(TrialRequest(case_id="socrates"), model_runner=fake_model_runner)
88
+ juror_events = [event for event in events if event.turns and event.turns[0].agent in JUROR_PERSONAS]
89
 
90
  assert len(juror_events) == 6
91
  for event in juror_events:
 
93
  assert vote.juror == event.turns[0].agent
94
  assert vote.persona == JUROR_PERSONAS[vote.juror]
95
  assert vote.vote in {"liable", "not_liable", "uncertain"}
96
+ assert event.turns[0].content.startswith("I vote ")
97
  assert vote.reason
98
  assert vote.evidence_ids
99
 
 
102
  assert [vote.juror for vote in final.votes] == list(JUROR_PERSONAS)
103
 
104
 
105
+ def test_jurors_are_called_independently_with_personas_and_trial_history():
106
+ calls = []
107
+
108
+ def recording_runner(**kwargs):
109
+ calls.append(kwargs.copy())
110
+ return fake_model_runner(**kwargs)
111
+
112
+ run_trial(TrialRequest(case_id="socrates"), model_runner=recording_runner)
113
+
114
+ juror_calls = [call for call in calls if call["role"] == "juror"]
115
+ assert [call["agent"] for call in juror_calls] == list(JUROR_PERSONAS)
116
+ assert len(juror_calls) == 6
117
+ for call in juror_calls:
118
+ assert call["persona"] == JUROR_PERSONAS[call["agent"]]
119
+ assert "Claimant Opening" in call["trial_history"]
120
+ assert "Respondent Opening" in call["trial_history"]
121
+ assert "The Evidence Record" in call["trial_history"]
122
+ assert "historical worldview" in call["objective"]
123
+
124
+
125
+ def test_lawyers_and_judge_receive_trial_history_and_objectives():
126
+ calls = []
127
+
128
+ def recording_runner(**kwargs):
129
+ calls.append(kwargs.copy())
130
+ return fake_model_runner(**kwargs)
131
+
132
+ run_trial(TrialRequest(case_id="socrates"), model_runner=recording_runner)
133
+
134
+ claimant_answer = next(call for call in calls if call["agent"] == "Mike OSS" and "hinge question" in call["task"])
135
+ respondent_answer = next(call for call in calls if call["agent"] == "Harvey Vector" and "hinge question" in call["task"])
136
+ verdict_call = next(call for call in calls if call["role"] == "verdict writer")
137
+
138
+ assert "The Hinge Question" in claimant_answer["trial_history"]
139
+ assert "The Hinge Question" in respondent_answer["trial_history"]
140
+ assert "most favorable to the claimant" in claimant_answer["objective"]
141
+ assert "most favorable to the respondent" in respondent_answer["objective"]
142
+ assert all(name in verdict_call["trial_history"] for name in JUROR_PERSONAS)
143
+ assert "do not override the jury" in verdict_call["objective"]
144
+
145
+
146
+ def test_custom_case_context_and_evidence_reach_lawyer_prompts():
147
+ custom = CasePacket(
148
+ id="custom",
149
+ title="Custom Trial",
150
+ subtitle="Entered by user.",
151
+ claimant="Claimant",
152
+ respondent="Respondent",
153
+ charge="Whether the custom record favors the claimant.",
154
+ setting="A custom courtroom.",
155
+ context="A bicycle disappeared after a disputed garage visit.",
156
+ claimant_claim="The claimant says the visit explains the missing bicycle.",
157
+ respondent_claim="The respondent says the timing and evidence are ambiguous.",
158
+ source_note="Custom test packet.",
159
+ evidence=[
160
+ EvidenceItem(
161
+ id="CUS-F1",
162
+ title="Garage Text",
163
+ source="Custom",
164
+ excerpt="The respondent asked to enter the garage.",
165
+ supports="claimant",
166
+ reliability=0.65,
167
+ note="Supports access.",
168
+ ),
169
+ EvidenceItem(
170
+ id="CUS-A1",
171
+ title="Neighbor Sighting",
172
+ source="Custom",
173
+ excerpt="A neighbor saw the bicycle later that day.",
174
+ supports="respondent",
175
+ reliability=0.65,
176
+ note="Supports alternative timing.",
177
+ ),
178
+ ],
179
+ )
180
+ calls = []
181
+
182
+ def recording_runner(**kwargs):
183
+ calls.append(kwargs.copy())
184
+ return fake_model_runner(**kwargs)
185
+
186
+ run_trial(TrialRequest(case_id="custom", custom_case=custom), model_runner=recording_runner)
187
+
188
+ claimant_opening = next(call for call in calls if call["agent"] == "Mike OSS" and call["role"] == "claimant advocate")
189
+ assert "A bicycle disappeared" in claimant_opening["case_summary"]
190
+ assert "CUS-F1" in claimant_opening["evidence_summary"]
191
+ assert "CUS-A1" in claimant_opening["evidence_summary"]
192
+
193
+
194
  def test_jury_contract_uses_public_history_personas():
195
  assert JUDGE_NAME == "Marcus Aurelius"
196
  assert JUROR_PERSONAS == {
 
203
  }
204
 
205
 
206
+ def test_role_prompt_requires_first_person_in_character_speech():
207
+ messages = build_role_messages(
208
+ agent="Harvey Vector",
209
+ role="respondent advocate",
210
+ case_summary="A short case summary.",
211
+ evidence_summary="SOC-E1: A record excerpt.",
212
+ task="Answer the bench for the respondent.",
213
+ )
214
+
215
+ system = messages[0]["content"]
216
+ user = messages[1]["content"]
217
+
218
+ assert "Stay fully in character as the assigned Agent and Role." in system
219
+ assert "Output only the words this character says aloud in court." in system
220
+ assert "Do not narrate about yourself in the third person." in system
221
+ assert "Use the case facts and evidence provided below" in system
222
+ assert "Speak as Harvey Vector." in user
223
+ assert "Give only the in-scene court line" in user
224
+ assert "SOC-E1" in user
225
+
226
+
227
+ def test_juror_vote_prompt_uses_persona_history_and_json_contract():
228
+ messages = build_role_messages(
229
+ agent="Karl Marx",
230
+ role="juror",
231
+ case_summary="A short case summary.",
232
+ evidence_summary="SOC-E1: A record excerpt.",
233
+ trial_history="Mike OSS argued from SOC-E1.",
234
+ persona=JUROR_PERSONAS["Karl Marx"],
235
+ objective="Vote as Karl Marx would after watching the trial.",
236
+ task="Return one juror vote as JSON.",
237
+ )
238
+
239
+ system = messages[0]["content"]
240
+ user = messages[1]["content"]
241
+
242
+ assert "Output only the words this character says aloud in court." not in messages[0]["content"]
243
+ assert "You are an individual juror." in system
244
+ assert JUROR_PERSONAS["Karl Marx"] in user
245
+ assert "Mike OSS argued from SOC-E1." in user
246
+ assert "Return only the requested JSON object." in user
247
+
248
+
249
+ def test_model_cleaner_extracts_final_speech_after_analysis_channel():
250
+ text = clean_model_text(
251
+ "analysis\nI should reason about the case first.\n\nfinal\nI stand for the respondent, and SOC-E1 leaves doubt."
252
+ )
253
+
254
+ assert text == "I stand for the respondent, and SOC-E1 leaves doubt."
255
+ assert "analysis" not in text.lower()
256
+
257
+
258
+ def test_model_cleaner_rejects_visible_analysis_without_final_speech():
259
+ def analysis_runner(**kwargs):
260
+ return ModelResult(
261
+ text="analysis: I should think through the case before answering.",
262
+ input_text="SYSTEM:\nanalysis leak",
263
+ call=ModelCall(
264
+ model=kwargs["model"],
265
+ provider=kwargs.get("provider", "test"),
266
+ ok=True,
267
+ latency_ms=1,
268
+ prompt_hash="test-prompt",
269
+ ),
270
+ )
271
+
272
+ with pytest.raises(RequiredModelError):
273
+ next(stream_trial(TrialRequest(case_id="socrates"), model_runner=analysis_runner))
274
+
275
+
276
+ def test_model_cleaner_removes_instruction_echo_when_dialogue_remains():
277
+ text = clean_model_text(
278
+ "I will now announce the case as requested, while maintaining the theatrical but clear tone required. "
279
+ "I will speak as Clerk Meridian in first person, starting with a pronoun.\n\n"
280
+ "I call The Polis v. Socrates before this court."
281
+ )
282
+
283
+ assert text == "I call The Polis v. Socrates before this court."
284
+
285
+
286
+ def test_model_cleaner_rejects_instruction_echo_without_dialogue():
287
+ with pytest.raises(Exception, match="echoed instructions"):
288
+ clean_model_text(
289
+ "I will now announce the case as requested, while maintaining the theatrical but clear tone required. "
290
+ "I will speak as Clerk Meridian in first person, starting with a pronoun."
291
+ )
292
+
293
+
294
  def test_required_model_failure_stops_trial_without_canned_dialogue():
295
  def failing_runner(**kwargs):
296
  return ModelResult(
 
313
  def test_invalid_jury_output_stops_trial_without_fallback_votes():
314
  def invalid_jury_runner(**kwargs):
315
  result = fake_model_runner(**kwargs)
316
+ if kwargs["role"] == "juror":
317
  result.text = "the jury refuses structured output"
318
  return result
319
 
tests/test_ui_rendering.py CHANGED
@@ -1,10 +1,11 @@
1
  import inspect
 
2
  from pathlib import Path
3
 
4
  from PIL import Image
5
 
6
  import app
7
- from sovereign_bench.models import AgentTurn, EvidenceItem, JurorVote, TrialEvent
8
 
9
 
10
  OLD_CARD_CLASSES = [
@@ -71,6 +72,32 @@ def _speaker_event(agent: str, phase: str = "questions") -> TrialEvent:
71
  )
72
 
73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  def test_lower_tab_renderers_emit_plain_text_classes():
75
  event = _event_with_lower_tab_data()
76
  html = "\n".join(
@@ -101,6 +128,12 @@ def test_download_controls_are_not_wired_into_app():
101
  assert "Download agent trace" not in source
102
 
103
 
 
 
 
 
 
 
104
  def test_courtroom_splits_six_jurors_between_side_benches():
105
  html = app.render_court([_event_with_lower_tab_data()], started=True)
106
 
@@ -131,10 +164,15 @@ def test_courtroom_renders_historical_judge_and_juror_assets():
131
 
132
  assert "Marcus Aurelius" in html
133
  assert "assets/characters/marcus-aurelius.png" in html
 
 
 
134
  for name, image in app.JUROR_IMAGES.items():
135
  assert name in html
136
  assert image in html
137
  assert html.count("class='juror-portrait'") == 6
 
 
138
 
139
 
140
  def test_courtroom_renders_foreground_fences_and_judge_table_above_characters():
@@ -146,6 +184,82 @@ def test_courtroom_renders_foreground_fences_and_judge_table_above_characters():
146
  assert ".foreground-props {\n position: absolute;\n inset: 0;\n z-index: 13;" in app.CSS
147
  assert ".puppet {\n --skin: #c99257;" in app.CSS
148
  assert "z-index: 8;" in app.CSS
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
149
 
150
 
151
  def test_foreground_prop_assets_have_real_transparency():
@@ -161,13 +275,67 @@ def test_foreground_prop_assets_have_real_transparency():
161
 
162
 
163
  def test_latest_speaker_sets_stage_class_and_speech_bubble():
164
- html = app.render_court([_speaker_event("Advocate Auric", phase="claims")], started=True)
165
 
166
  assert "speaker-auric" in html
167
- assert "class='speech-bubble'" in html
168
- assert "Advocate Auric has the visible floor." in html
 
 
 
169
  assert "puppet auric active walking" in html
170
  assert "puppet sable active" not in html
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
171
 
172
 
173
  def test_individual_juror_can_be_active_speaker():
@@ -199,7 +367,19 @@ def test_individual_juror_can_be_active_speaker():
199
 
200
  assert "speaker-karl-marx" in html
201
  assert "<a class='juror active'" in html
 
202
  assert "Liable. E1 carries the record." in html
 
 
 
 
 
 
 
 
 
 
 
203
 
204
 
205
  def test_lawyer_movement_css_is_speaker_specific_not_phase_wide():
@@ -209,27 +389,106 @@ def test_lawyer_movement_css_is_speaker_specific_not_phase_wide():
209
  assert ".phase-opening .puppet.sable" not in app.CSS
210
 
211
 
212
- def test_closed_book_is_smaller_and_key_characters_are_lowered():
213
- assert ".episode-book.closed {\n top: 61%;\n width: min(163px, 20vw);" in app.CSS
214
- assert ".puppet.judge {\n left: 50%;\n top: 56%;" in app.CSS
 
 
 
 
215
  assert ".puppet.auric {\n left: 24%;\n top: 87%;" in app.CSS
216
- assert ".speaker-auric .puppet.auric {\n left: 43%;\n top: 91%;" in app.CSS
217
- assert ".puppet.auditor {\n left: 71%;\n top: 80%;" in app.CSS
218
- assert ".episode-book.closed {\n top: 750px;\n width: 140px;" in app.CSS
219
- assert ".puppet.judge {\n top: 717px;" in app.CSS
 
 
 
 
 
220
  assert ".puppet.auric {\n left: 20%;\n top: 970px;" in app.CSS
221
- assert ".puppet.auditor {\n left: 78%;\n top: 860px;" in app.CSS
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
222
 
223
 
224
  def test_run_ui_yields_five_outputs_without_download_status(monkeypatch):
225
  event = _event_with_lower_tab_data()
226
  monkeypatch.setattr(app, "get_events", lambda request: iter([event]))
 
227
 
228
- outputs = list(app.run_ui("Trial of Socrates", "", "", "swift", True))
229
 
230
  assert outputs
231
  assert all(len(output) == 5 for output in outputs)
232
- assert outputs[1][-1] == "Step 1: Jury weighs the record"
 
233
  assert outputs[-1][-1] == "Verdict sealed."
234
  assert "download" not in outputs[-1][-1].lower()
235
 
@@ -241,12 +500,48 @@ def test_run_ui_stops_with_model_unavailable_error(monkeypatch):
241
 
242
  monkeypatch.setattr(app, "get_events", broken_events)
243
 
244
- outputs = list(app.run_ui("Trial of Socrates", "", "", "swift", True))
245
 
246
  assert outputs[-1][-1] == "Model response required. Trial stopped: Marcus Aurelius unavailable: offline"
247
  assert "Claimant score" not in outputs[-1][0]
248
 
249
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
250
  def test_court_renders_sound_toggle():
251
  html = app.render_court([])
252
 
 
1
  import inspect
2
+ import json
3
  from pathlib import Path
4
 
5
  from PIL import Image
6
 
7
  import app
8
+ from sovereign_bench.models import AgentTurn, EvidenceItem, JurorVote, TrialEvent, Verdict
9
 
10
 
11
  OLD_CARD_CLASSES = [
 
72
  )
73
 
74
 
75
+ def _verdict_event(finding: str = "liable") -> TrialEvent:
76
+ return TrialEvent(
77
+ phase="verdict",
78
+ title="The Court Announces Judgment",
79
+ body="Judgment is announced.",
80
+ verdict=Verdict(
81
+ finding=finding,
82
+ decree="The court enters the final judgment.",
83
+ rationale="The jury majority decides the record.",
84
+ evidence_ids=["E1"],
85
+ uncertainty="Some uncertainty remains.",
86
+ remedy="Record the judgment.",
87
+ ),
88
+ turns=[
89
+ AgentTurn(
90
+ agent=app.JUDGE_NAME,
91
+ role="verdict writer",
92
+ content="The judgment of the court is guilty.",
93
+ model="test-model",
94
+ confidence=0.9,
95
+ input="SYSTEM:\nAnnounce verdict.",
96
+ )
97
+ ],
98
+ )
99
+
100
+
101
  def test_lower_tab_renderers_emit_plain_text_classes():
102
  event = _event_with_lower_tab_data()
103
  html = "\n".join(
 
128
  assert "Download agent trace" not in source
129
 
130
 
131
+ def test_case_dropdown_only_exposes_demo_and_custom_cases():
132
+ assert list(app.CASE_OPTIONS) == ["Trial of Socrates", "Greg Heffley vs Mom", "Custom"]
133
+ assert "The People v. Barnaby Buttons" not in app.CASE_OPTIONS
134
+ assert "Live Search Tribunal" not in app.CASE_OPTIONS
135
+
136
+
137
  def test_courtroom_splits_six_jurors_between_side_benches():
138
  html = app.render_court([_event_with_lower_tab_data()], started=True)
139
 
 
164
 
165
  assert "Marcus Aurelius" in html
166
  assert "assets/characters/marcus-aurelius.png" in html
167
+ assert "<img class='puppet-portrait' src='/gradio_api/file=assets/characters/marcus-aurelius.png'" in html
168
+ assert ".puppet.judge::before,\n.puppet.judge::after {\n display: none;\n}" in app.CSS
169
+ assert ".puppet.judge .mouth {\n display: none;\n}" in app.CSS
170
  for name, image in app.JUROR_IMAGES.items():
171
  assert name in html
172
  assert image in html
173
  assert html.count("class='juror-portrait'") == 6
174
+ assert "class='juror-face'" not in html
175
+ assert "class='juror-body'" not in html
176
 
177
 
178
  def test_courtroom_renders_foreground_fences_and_judge_table_above_characters():
 
184
  assert ".foreground-props {\n position: absolute;\n inset: 0;\n z-index: 13;" in app.CSS
185
  assert ".puppet {\n --skin: #c99257;" in app.CSS
186
  assert "z-index: 8;" in app.CSS
187
+ assert ".puppet.clerk {\n left: 43%;\n top: 66%;\n z-index: 14;" in app.CSS
188
+
189
+
190
+ def test_trial_progress_defaults_to_pretrial_and_renders_all_stages():
191
+ html = app.render_court([])
192
+
193
+ assert "class='trial-progress'" in html
194
+ assert "data-phase='pretrial' aria-current='step'" in html
195
+ for _key, label in app.TRIAL_PROGRESS_STAGES:
196
+ assert label in html
197
+
198
+
199
+ def test_trial_progress_marks_questions_current():
200
+ html = app.render_court([_speaker_event("Mike OSS", phase="questions")], started=True)
201
+
202
+ assert "class='trial-progress-segment current' data-phase='questions' aria-current='step'" in html
203
+ assert "data-phase='evidence'" in html
204
+
205
+
206
+ def test_trial_progress_marks_deliberation_current():
207
+ html = app.render_court([_event_with_lower_tab_data()], started=True)
208
+
209
+ assert "class='trial-progress-segment current' data-phase='deliberation' aria-current='step'" in html
210
+ assert "class='trial-progress-segment complete' data-phase='questions'" in html
211
+
212
+
213
+ def test_trial_progress_marks_verdict_current_and_complete():
214
+ html = app.render_court([_speaker_event(app.JUDGE_NAME, phase="verdict")], started=True)
215
+
216
+ assert "class='trial-progress-segment current complete' data-phase='verdict' aria-current='step'" in html
217
+ assert "class='trial-progress-segment complete' data-phase='deliberation'" in html
218
+
219
+
220
+ def test_verdict_popup_renders_only_when_final_verdict_is_revealed():
221
+ event = _verdict_event("liable")
222
+
223
+ announcement = app.render_court([event], started=True)
224
+ sealed = app.render_court([event], started=True, show_verdict_popup=True)
225
+
226
+ assert "class='speech-bubble active-dialogue speaker-judge'" in announcement
227
+ assert "class='verdict-popup'" not in announcement
228
+ assert "class='speech-bubble active-dialogue speaker-judge'" in sealed
229
+ assert "class='verdict-popup'" in sealed
230
+ assert "data-finding='liable'" in sealed
231
+ assert "Verdict: Guilty" in sealed
232
+
233
+
234
+ def test_run_ui_reveals_verdict_popup_after_judge_speech(monkeypatch):
235
+ event = _verdict_event("not_liable")
236
+ monkeypatch.setattr(app, "get_events", lambda request: iter([event]))
237
+ monkeypatch.setattr(app, "_reading_duration", lambda text: 0)
238
+
239
+ outputs = list(app.run_ui("Trial of Socrates", "", "", "", "swift", True))
240
+
241
+ assert "class='speech-bubble active-dialogue speaker-judge'" in outputs[1][0]
242
+ assert "class='verdict-popup'" not in outputs[1][0]
243
+ assert outputs[-1][-1] == "Verdict sealed."
244
+ assert "class='verdict-popup'" in outputs[-1][0]
245
+ assert "Verdict: Not Guilty" in outputs[-1][0]
246
+
247
+
248
+ def test_trial_progress_ignores_unknown_phase_without_extra_segment():
249
+ html = app.render_court([_speaker_event("Clerk Meridian", phase="appeal")], started=True)
250
+
251
+ assert "class='trial-progress'" in html
252
+ assert html.count("class='trial-progress-segment") == len(app.TRIAL_PROGRESS_STAGES)
253
+ assert "aria-current='step'" not in html
254
+ assert "class='trial-progress-segment' data-phase='appeal'" not in html
255
+
256
+
257
+ def test_trial_progress_css_is_fixed_and_translucent_theme_matched():
258
+ assert ".trial-progress {\n position: fixed;\n top: 0;" in app.CSS
259
+ assert "background: rgba(23, 13, 8, .58);" in app.CSS
260
+ assert "backdrop-filter: blur(8px);" in app.CSS
261
+ assert "background: #ffd675;" in app.CSS
262
+ assert ".trial-progress-abbrev {\n display: inline;" in app.CSS
263
 
264
 
265
  def test_foreground_prop_assets_have_real_transparency():
 
275
 
276
 
277
  def test_latest_speaker_sets_stage_class_and_speech_bubble():
278
+ html = app.render_court([_speaker_event("Mike OSS", phase="claims")], started=True)
279
 
280
  assert "speaker-auric" in html
281
+ assert "class='speech-bubble active-dialogue speaker-auric'" in html
282
+ assert "data-speaker='Mike OSS'" in html
283
+ assert "<strong>Mike OSS</strong>" in html
284
+ assert "test speaker" in html
285
+ assert "Mike OSS has the visible floor." in html
286
  assert "puppet auric active walking" in html
287
  assert "puppet sable active" not in html
288
+ assert html.count("class='speech-bubble") == 1
289
+ assert html.find("class='foreground-props'") < html.find("class='speech-bubble active-dialogue")
290
+ assert ".speech-bubble.active-dialogue,\n.speech-bubble.active-dialogue * {\n color: #141413 !important;\n}" in app.CSS
291
+ assert "border: 2px solid #141413;" in app.CSS
292
+ assert "font-size: 12px;" in app.CSS
293
+
294
+
295
+ def test_speech_bubble_uses_full_turn_content_not_event_body():
296
+ long_text = " ".join(["The record speaks plainly"] * 18) + " with a final visible phrase."
297
+ event = TrialEvent(
298
+ phase="questions",
299
+ title="Counsel answers",
300
+ body="Narration only, not spoken dialogue.",
301
+ turns=[
302
+ AgentTurn(
303
+ agent="Harvey Vector",
304
+ role="respondent advocate",
305
+ content=long_text,
306
+ model="test-model",
307
+ confidence=0.9,
308
+ )
309
+ ],
310
+ )
311
+ html = app.render_court([event], started=True)
312
+ bubble = html[html.index("<div class='speech-bubble") : html.index("<div class='gallery-benches")]
313
+
314
+ assert "with a final visible phrase." in bubble
315
+ assert "Narration only" not in bubble
316
+ assert "..." not in bubble
317
+
318
+
319
+ def test_pending_speaker_renders_single_preparing_bubble():
320
+ pending = app.SpeakerCue(
321
+ name="Harvey Vector",
322
+ role="respondent advocate",
323
+ text="Harvey Vector is preparing a response.",
324
+ pending=True,
325
+ )
326
+ html = app.render_court([], started=True, pending_speaker=pending)
327
+
328
+ assert "class='speech-bubble active-dialogue speaker-sable pending'" in html
329
+ assert "data-pending='true'" in html
330
+ assert "Harvey Vector is preparing a response." in html
331
+ assert "puppet sable active walking" in html
332
+ assert html.count("class='speech-bubble") == 1
333
+
334
+
335
+ def test_reading_duration_scales_with_words_and_caps():
336
+ assert app._reading_duration("short line") == app.MIN_READ_SECONDS
337
+ assert app._reading_duration("word " * 18) > app.MIN_READ_SECONDS
338
+ assert app._reading_duration("word " * 200) == app.MAX_READ_SECONDS
339
 
340
 
341
  def test_individual_juror_can_be_active_speaker():
 
367
 
368
  assert "speaker-karl-marx" in html
369
  assert "<a class='juror active'" in html
370
+ assert "class='speech-bubble active-dialogue speaker-karl-marx juror-dialogue'" in html
371
  assert "Liable. E1 carries the record." in html
372
+ assert html.count("class='speech-bubble") == 1
373
+
374
+
375
+ def test_juror_speech_bubbles_anchor_above_side_benches():
376
+ assert ".speech-bubble.active-dialogue.juror-dialogue {\n top: 42%;" in app.CSS
377
+ assert ".speech-bubble.active-dialogue.speaker-karl-marx,\n.speech-bubble.active-dialogue.speaker-john-stuart-mill,\n.speech-bubble.active-dialogue.speaker-confucius {\n left: 1.5%;" in app.CSS
378
+ assert ".speech-bubble.active-dialogue.speaker-cleopatra-vii,\n.speech-bubble.active-dialogue.speaker-niccolo-machiavelli,\n.speech-bubble.active-dialogue.speaker-jensen-huang {\n right: 1.5%;" in app.CSS
379
+ assert "--bubble-tail-x: 19%;" in app.CSS
380
+ assert "--bubble-tail-x: 81%;" in app.CSS
381
+ assert ".speech-bubble.active-dialogue.juror-dialogue,\n .speech-bubble.active-dialogue.speaker-karl-marx" in app.CSS
382
+ assert "top: 500px;" in app.CSS
383
 
384
 
385
  def test_lawyer_movement_css_is_speaker_specific_not_phase_wide():
 
389
  assert ".phase-opening .puppet.sable" not in app.CSS
390
 
391
 
392
+ def test_closed_book_and_key_characters_align_with_judge_table():
393
+ assert ".episode-book {\n position: absolute;\n left: 50%;\n top: 122px;\n z-index: 14;" in app.CSS
394
+ assert "width: min(980px, calc(100% - 32px));" in app.CSS
395
+ assert ".episode-book.closed {\n top: 50%;\n width: min(163px, 20vw);" in app.CSS
396
+ assert ".foreground-fence {\n bottom: -6.5%;\n width: 47%;" in app.CSS
397
+ assert ".judge-table-foreground {\n left: 50%;\n top: 20%;\n z-index: 1;\n width: 39.1%;" in app.CSS
398
+ assert ".puppet.judge {\n left: 50%;\n top: calc(40% + 156px);" in app.CSS
399
  assert ".puppet.auric {\n left: 24%;\n top: 87%;" in app.CSS
400
+ assert ".speaker-auric .puppet.auric {\n left: 43%;\n top: 87%;" in app.CSS
401
+ assert ".puppet.sable {\n left: 75%;\n top: 87%;" in app.CSS
402
+ assert ".speaker-sable .puppet.sable {\n left: 75%;\n top: 87%;" in app.CSS
403
+ assert ".puppet.clerk {\n left: 43%;\n top: 66%;" in app.CSS
404
+ assert ".puppet.auditor" not in app.CSS
405
+ assert ".episode-book.closed {\n top: 640px;\n width: 140px;" in app.CSS
406
+ assert ".episode-book {\n top: 218px;\n width: min(680px, calc(100% - 20px));" in app.CSS
407
+ assert ".foreground-fence {\n bottom: -66px;\n width: 64%;" in app.CSS
408
+ assert ".judge-table-foreground {\n top: 213px;\n width: 646px;" in app.CSS
409
  assert ".puppet.auric {\n left: 20%;\n top: 970px;" in app.CSS
410
+ assert ".puppet.sable {\n left: 80%;\n top: 970px;" in app.CSS
411
+ assert ".speaker-sable .puppet.sable {\n left: 80%;\n top: 970px;" in app.CSS
412
+ assert ".puppet.judge {\n top: 576px;" not in app.CSS
413
+ assert ".puppet.sable {\n left: 80%;\n top: 640px;" not in app.CSS
414
+ assert ".speaker-sable .puppet.sable {\n left: 80%;\n top: 640px;" not in app.CSS
415
+ assert ".puppet.clerk {\n left: 35%;\n top: 880px;" in app.CSS
416
+ assert ".speech-bubble.active-dialogue.speaker-auditor" not in app.CSS
417
+
418
+
419
+ def test_open_docket_book_renders_text_above_book_art():
420
+ html = app.render_court([])
421
+
422
+ assert "class='episode-book'" in html
423
+ assert "class='book-open-content'" in html
424
+ assert "Trial details" in html
425
+ assert "Evidence" in html
426
+
427
+
428
+ def test_greg_case_preview_uses_cached_context_and_evidence_columns():
429
+ html = app.render_case_preview("Greg Heffley vs Mom")
430
+
431
+ assert "Greg Heffley v. Mom" in html
432
+ assert "diary" in html
433
+ assert "Evidence for Greg Heffley" in html
434
+ assert "Evidence for Susan Heffley" in html
435
+
436
+
437
+ def test_custom_case_preview_renders_fillable_book_fields():
438
+ html = app.render_case_preview("Custom")
439
+
440
+ assert "episode-book custom-book" in html
441
+ assert "book-context-field" in html
442
+ assert html.count("book-claimant-field") == 3
443
+ assert html.count("book-respondent-field") == 3
444
+
445
+
446
+ def test_custom_payload_builds_trial_request_packet(monkeypatch):
447
+ captured = {}
448
+
449
+ def fake_events(request):
450
+ captured["request"] = request
451
+ return iter([_event_with_lower_tab_data()])
452
+
453
+ monkeypatch.setattr(app, "get_events", fake_events)
454
+ monkeypatch.setattr(app, "_reading_duration", lambda text: 0)
455
+ payload = json.dumps(
456
+ {
457
+ "context": "A missing bicycle is traced to a disputed garage visit.",
458
+ "claimant_evidence": ["Garage text", "", "Scuffed tire mark"],
459
+ "respondent_evidence": ["Neighbor saw bike later", "", ""],
460
+ }
461
+ )
462
+
463
+ outputs = list(app.run_ui("Custom", "", "", payload, "swift", True))
464
+
465
+ assert outputs[-1][-1] == "Verdict sealed."
466
+ request = captured["request"]
467
+ assert request.case_id == "custom"
468
+ assert request.custom_case is not None
469
+ assert request.custom_case.context.startswith("A missing bicycle")
470
+ assert [item.supports for item in request.custom_case.evidence] == ["claimant", "claimant", "respondent"]
471
+
472
+
473
+ def test_custom_payload_requires_context_and_both_evidence_sides():
474
+ payload = json.dumps({"context": "", "claimant_evidence": ["Only one side"], "respondent_evidence": []})
475
+
476
+ outputs = list(app.run_ui("Custom", "", "", payload, "swift", True))
477
+
478
+ assert outputs[-1][-1] == "Custom requires a trial details paragraph."
479
 
480
 
481
  def test_run_ui_yields_five_outputs_without_download_status(monkeypatch):
482
  event = _event_with_lower_tab_data()
483
  monkeypatch.setattr(app, "get_events", lambda request: iter([event]))
484
+ monkeypatch.setattr(app, "_reading_duration", lambda text: 0)
485
 
486
+ outputs = list(app.run_ui("Trial of Socrates", "", "", "", "swift", True))
487
 
488
  assert outputs
489
  assert all(len(output) == 5 for output in outputs)
490
+ assert outputs[0][-1] == "Clerk Meridian is preparing their response."
491
+ assert outputs[1][-1] == "Step 1: Nemotron Jury - Jury weighs the record"
492
  assert outputs[-1][-1] == "Verdict sealed."
493
  assert "download" not in outputs[-1][-1].lower()
494
 
 
500
 
501
  monkeypatch.setattr(app, "get_events", broken_events)
502
 
503
+ outputs = list(app.run_ui("Trial of Socrates", "", "", "", "swift", True))
504
 
505
  assert outputs[-1][-1] == "Model response required. Trial stopped: Marcus Aurelius unavailable: offline"
506
  assert "Claimant score" not in outputs[-1][0]
507
 
508
 
509
+ def test_remote_events_uses_default_modal_endpoint_without_local_token(monkeypatch):
510
+ captured = {}
511
+
512
+ class FakeResponse:
513
+ def __enter__(self):
514
+ return self
515
+
516
+ def __exit__(self, exc_type, exc, traceback):
517
+ return False
518
+
519
+ def raise_for_status(self):
520
+ return None
521
+
522
+ def iter_lines(self):
523
+ event = _speaker_event("Clerk Meridian", phase="intake")
524
+ yield json.dumps(event.model_dump())
525
+
526
+ def fake_stream(method, endpoint, json, timeout):
527
+ captured["method"] = method
528
+ captured["endpoint"] = endpoint
529
+ captured["payload"] = json
530
+ captured["timeout"] = timeout
531
+ return FakeResponse()
532
+
533
+ monkeypatch.delenv("MODAL_TRIAL_URL", raising=False)
534
+ monkeypatch.delenv("HF_TOKEN", raising=False)
535
+ monkeypatch.setattr(app.httpx, "stream", fake_stream)
536
+
537
+ event = next(app.get_events(app.TrialRequest(case_id="socrates"), delay=0.0))
538
+
539
+ assert captured["method"] == "POST"
540
+ assert captured["endpoint"] == app.DEFAULT_MODAL_TRIAL_URL
541
+ assert captured["timeout"] == 900.0
542
+ assert event.turns[0].agent == "Clerk Meridian"
543
+
544
+
545
  def test_court_renders_sound_toggle():
546
  html = app.render_court([])
547