Spaces:
Sleeping
Sleeping
Sync JudgeGPT app from GitHub
#4
by AliIqbal05 - opened
- README.md +173 -105
- modal_app.py +212 -193
README.md
CHANGED
|
@@ -1,105 +1,173 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: Judge-GPT
|
| 3 |
-
emoji: ⚖️
|
| 4 |
-
colorFrom: yellow
|
| 5 |
-
colorTo: red
|
| 6 |
-
sdk: gradio
|
| 7 |
-
sdk_version: 6.17.3
|
| 8 |
-
app_file: app.py
|
| 9 |
-
pinned: false
|
| 10 |
-
license: mit
|
| 11 |
-
short_description: AI-native miniature trials under 32B.
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
-
|
| 30 |
-
-
|
| 31 |
-
-
|
| 32 |
-
-
|
| 33 |
-
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
``
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
``
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
- `
|
| 92 |
-
- `sovereign_bench/
|
| 93 |
-
- `
|
| 94 |
-
- `
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
```
|
| 104 |
-
|
| 105 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Judge-GPT
|
| 3 |
+
emoji: ⚖️
|
| 4 |
+
colorFrom: yellow
|
| 5 |
+
colorTo: red
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 6.17.3
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
license: mit
|
| 11 |
+
short_description: AI-native miniature trials under 32B.
|
| 12 |
+
tags:
|
| 13 |
+
- track:wood
|
| 14 |
+
- sponsor:openai
|
| 15 |
+
- sponsor:nvidia
|
| 16 |
+
- sponsor:modal
|
| 17 |
+
- achievement:offbrand
|
| 18 |
+
- achievement:fieldnotes
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
# Judge-GPT
|
| 22 |
+
|
| 23 |
+
Judge-GPT is a cinematic Gradio courtroom for the Build Small Hackathon's Thousand Token Wood track. It turns a compact evidence packet into a two-minute AI-native trial: a clerk opens the docket, two lawyers argue opposite sides, Marcus Aurelius presides, six fixed-perspective jurors vote, and the court seals a verdict.
|
| 24 |
+
|
| 25 |
+
The point is not legal advice. It is a small-model theater for structured disagreement: evidence is visible, roles are constrained, hidden reasoning is stripped, and every trial leaves a trace of which agent said what.
|
| 26 |
+
|
| 27 |
+
## Submission Links
|
| 28 |
+
|
| 29 |
+
- Hugging Face Space: https://huggingface.co/spaces/build-small-hackathon/JudgeGPT
|
| 30 |
+
- Demo video: https://drive.google.com/drive/folders/10pWJ7NVCsnVV7wOlqm4MGWg4Kmh4rMY2?usp=sharing
|
| 31 |
+
- Social post: TODO paste final public social post URL
|
| 32 |
+
- GitHub repo: https://github.com/aliiqbal24/BuildSmallfinal
|
| 33 |
+
- Field guide validator: https://build-small-hackathon-field-guide.hf.space/submit
|
| 34 |
+
|
| 35 |
+
## What Judges Should Try
|
| 36 |
+
|
| 37 |
+
1. Open the Space and keep the default `Trial of Socrates`.
|
| 38 |
+
2. Click `Begin Trial`.
|
| 39 |
+
3. Watch the courtroom progress from intake to verdict.
|
| 40 |
+
4. Hover the judge, clerk, lawyers, and jurors to inspect model/agent threads.
|
| 41 |
+
5. Open the `Evidence Drawer` and `Juror Panel` tabs after the verdict.
|
| 42 |
+
6. Try `Greg Heffley vs Mom` for a lighter family-court case.
|
| 43 |
+
7. Try `Custom` to write a short dispute and up to three pieces of evidence per side directly into the docket book.
|
| 44 |
+
|
| 45 |
+
## Why It Fits Build Small
|
| 46 |
+
|
| 47 |
+
- **Thousand Token Wood:** the app is whimsical, theatrical, and AI-native rather than a generic chatbot.
|
| 48 |
+
- **Best Use of Codex:** Codex was used throughout implementation, debugging, UI iteration, tests, and commit prep in the connected GitHub repo.
|
| 49 |
+
- **Nemotron Hardware Prize:** Nemotron is a core runtime model for the jury and juror vote generation.
|
| 50 |
+
- **Best Use of Modal:** the Gradio Space delegates live model inference to a Modal GPU streaming endpoint.
|
| 51 |
+
- **Off-Brand:** the UI pushes past stock Gradio with a custom courtroom, animated puppets, docket book, evidence props, audio cues, and verdict staging.
|
| 52 |
+
- **Field Notes:** this README documents the build idea, model choices, runtime architecture, limitations, and submission checklist.
|
| 53 |
+
|
| 54 |
+
## Small-Model Budget
|
| 55 |
+
|
| 56 |
+
Every named model is under the 32B parameter cap.
|
| 57 |
+
|
| 58 |
+
| Role | Model | Budgeted size | Used for |
|
| 59 |
+
| --- | --- | ---: | --- |
|
| 60 |
+
| Presiding advocate | `openai/gpt-oss-20b` | 20B | Judge, claimant lawyer, respondent lawyer, verdict voice |
|
| 61 |
+
| Clerk of style | `openbmb/AgentCPM-Explore` | 4B | Clerk/stage voice |
|
| 62 |
+
| Jury ring | `nvidia/Nemotron-Orchestrator-8B` | 8B | Jury panel and six juror votes |
|
| 63 |
+
|
| 64 |
+
Displayed aggregate budget: 32B. The app does not use a model above 32B.
|
| 65 |
+
|
| 66 |
+
## How It Works
|
| 67 |
+
|
| 68 |
+
Judge-GPT runs a deterministic courtroom sequence over a `CasePacket`:
|
| 69 |
+
|
| 70 |
+
1. Clerk opens the docket.
|
| 71 |
+
2. Judge frames the dispute.
|
| 72 |
+
3. Mike OSS argues for the claimant.
|
| 73 |
+
4. Harvey Vector argues for the respondent.
|
| 74 |
+
5. The evidence record is displayed without adding a third lawyer.
|
| 75 |
+
6. The judge asks a hinge question.
|
| 76 |
+
7. Each lawyer answers from their side.
|
| 77 |
+
8. Nemotron Jury retires the panel.
|
| 78 |
+
9. Six named jurors vote from distinct worldviews.
|
| 79 |
+
10. The judge announces the final verdict.
|
| 80 |
+
|
| 81 |
+
The shipped demo cases are:
|
| 82 |
+
|
| 83 |
+
- `The Polis v. Socrates`
|
| 84 |
+
- `Greg Heffley v. Mom`
|
| 85 |
+
- `Custom`, built from the docket-book fields in the UI
|
| 86 |
+
|
| 87 |
+
## Runtime Architecture
|
| 88 |
+
|
| 89 |
+
- `app.py` renders the Gradio UI, courtroom HTML/CSS, audio hooks, case preview book, and live event stream.
|
| 90 |
+
- `sovereign_bench/engine.py` orchestrates trial phases, model calls, evidence events, jury votes, verdict assembly, and trace metadata.
|
| 91 |
+
- `sovereign_bench/llm.py` builds role prompts, calls Hugging Face-compatible chat models, and rejects hidden reasoning or instruction echoes.
|
| 92 |
+
- `sovereign_bench/cases.py` contains the cached demo case packets.
|
| 93 |
+
- `modal_app.py` hosts the GPU-backed streaming endpoint used by the Space.
|
| 94 |
+
- `tests/` contains engine, case, and rendering regression tests.
|
| 95 |
+
|
| 96 |
+
The Gradio app uses `MODAL_TRIAL_URL` when set, otherwise it uses the built-in deployed Modal endpoint. The Modal app owns the Hugging Face token through a Modal secret named `huggingface`; no real credentials are committed.
|
| 97 |
+
|
| 98 |
+
## Run Locally
|
| 99 |
+
|
| 100 |
+
```powershell
|
| 101 |
+
python -m pip install -r requirements.txt
|
| 102 |
+
python app.py
|
| 103 |
+
```
|
| 104 |
+
|
| 105 |
+
Open:
|
| 106 |
+
|
| 107 |
+
```text
|
| 108 |
+
http://127.0.0.1:7860
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
## Deploy Modal Backend
|
| 112 |
+
|
| 113 |
+
```powershell
|
| 114 |
+
python -m modal deploy modal_app.py
|
| 115 |
+
```
|
| 116 |
+
|
| 117 |
+
After deployment, pre-warm every configured courtroom model in the deployed `sovereign-bench` app so the first trial does not wait for all GPU containers to cold start. Run this after each deploy because deployments reset Modal autoscaler overrides:
|
| 118 |
+
|
| 119 |
+
```powershell
|
| 120 |
+
python -m modal run modal_app.py::warm_models
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
If the endpoint changes, set the Hugging Face Space variable:
|
| 124 |
+
|
| 125 |
+
```text
|
| 126 |
+
MODAL_TRIAL_URL=https://your-modal-endpoint.example
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
## Deploy Hugging Face Space
|
| 130 |
+
|
| 131 |
+
Create or upload this repo as a Gradio Space inside the official Build Small org:
|
| 132 |
+
|
| 133 |
+
```text
|
| 134 |
+
build-small-hackathon/<your-space-name>
|
| 135 |
+
```
|
| 136 |
+
|
| 137 |
+
Space settings:
|
| 138 |
+
|
| 139 |
+
- SDK: Gradio
|
| 140 |
+
- App file: `app.py`
|
| 141 |
+
- Python requirements: `requirements.txt`
|
| 142 |
+
- Optional variable: `MODAL_TRIAL_URL`
|
| 143 |
+
- No Space secret is required if using the hosted Modal endpoint.
|
| 144 |
+
|
| 145 |
+
## Verification
|
| 146 |
+
|
| 147 |
+
```powershell
|
| 148 |
+
python -m pytest
|
| 149 |
+
```
|
| 150 |
+
|
| 151 |
+
Focused checks used during final prep:
|
| 152 |
+
|
| 153 |
+
```powershell
|
| 154 |
+
python -m pytest tests/test_engine.py tests/test_ui_rendering.py
|
| 155 |
+
```
|
| 156 |
+
|
| 157 |
+
## Limitations
|
| 158 |
+
|
| 159 |
+
- Judge-GPT is not legal advice and should not be used for real legal decisions.
|
| 160 |
+
- The demo packets are compact, staged evidence packets, not exhaustive source research.
|
| 161 |
+
- Model, Modal, or retrieval failures stop the current trial instead of substituting fake dialogue.
|
| 162 |
+
- Trial results are not persisted across sessions.
|
| 163 |
+
- Custom trials require a short case context and evidence from both sides.
|
| 164 |
+
|
| 165 |
+
## Final Submission Checklist
|
| 166 |
+
|
| 167 |
+
- [ ] Push the repo to the Build Small Hugging Face org as a Gradio Space.
|
| 168 |
+
- [ ] Confirm the Space launches and can complete `Trial of Socrates`.
|
| 169 |
+
- [ ] Record a short demo video showing the trial flow and verdict.
|
| 170 |
+
- [ ] Replace the `Demo video` TODO above with the final public URL.
|
| 171 |
+
- [ ] Publish one social post about the app.
|
| 172 |
+
- [ ] Replace the `Social post` TODO above with the final public URL.
|
| 173 |
+
- [ ] Run the README through the Build Small validator.
|
modal_app.py
CHANGED
|
@@ -1,193 +1,212 @@
|
|
| 1 |
-
import os
|
| 2 |
-
import time
|
| 3 |
-
|
| 4 |
-
import modal
|
| 5 |
-
|
| 6 |
-
from sovereign_bench.engine import stream_trial_jsonl
|
| 7 |
-
from sovereign_bench.llm import (
|
| 8 |
-
ModelCall,
|
| 9 |
-
ModelResult,
|
| 10 |
-
build_role_messages,
|
| 11 |
-
messages_hash,
|
| 12 |
-
)
|
| 13 |
-
from sovereign_bench.models import TrialRequest
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
.
|
| 24 |
-
)
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
.
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
"
|
| 35 |
-
"
|
| 36 |
-
"
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
"
|
| 43 |
-
"
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
)
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import time
|
| 3 |
+
|
| 4 |
+
import modal
|
| 5 |
+
|
| 6 |
+
from sovereign_bench.engine import MODEL_BUDGET, stream_trial_jsonl
|
| 7 |
+
from sovereign_bench.llm import (
|
| 8 |
+
ModelCall,
|
| 9 |
+
ModelResult,
|
| 10 |
+
build_role_messages,
|
| 11 |
+
messages_hash,
|
| 12 |
+
)
|
| 13 |
+
from sovereign_bench.models import TrialRequest
|
| 14 |
+
|
| 15 |
+
MODAL_APP_NAME = "sovereign-bench"
|
| 16 |
+
app = modal.App(MODAL_APP_NAME)
|
| 17 |
+
GPU_NAME = "H100"
|
| 18 |
+
GPU_TIMEOUT_SECONDS = 20 * 60
|
| 19 |
+
HF_CACHE_DIR = "/root/.cache/huggingface"
|
| 20 |
+
USED_MODEL_IDS = tuple(dict.fromkeys(model for _, model, _ in MODEL_BUDGET))
|
| 21 |
+
|
| 22 |
+
image = (
|
| 23 |
+
modal.Image.debian_slim(python_version="3.12")
|
| 24 |
+
.pip_install("fastapi", "huggingface_hub", "httpx", "pydantic")
|
| 25 |
+
.add_local_dir("sovereign_bench", remote_path="/root/sovereign_bench")
|
| 26 |
+
)
|
| 27 |
+
|
| 28 |
+
model_cache = modal.Volume.from_name("sovereign-bench-model-cache", create_if_missing=True)
|
| 29 |
+
|
| 30 |
+
vllm_image = (
|
| 31 |
+
modal.Image.from_registry("nvidia/cuda:12.8.1-devel-ubuntu22.04", add_python="3.12")
|
| 32 |
+
.entrypoint([])
|
| 33 |
+
.uv_pip_install(
|
| 34 |
+
"vllm==0.18.1",
|
| 35 |
+
"huggingface_hub[hf_transfer]==0.36.0",
|
| 36 |
+
"transformers",
|
| 37 |
+
"httpx",
|
| 38 |
+
"pydantic",
|
| 39 |
+
)
|
| 40 |
+
.env(
|
| 41 |
+
{
|
| 42 |
+
"HF_HUB_ENABLE_HF_TRANSFER": "1",
|
| 43 |
+
"HF_HOME": HF_CACHE_DIR,
|
| 44 |
+
"VLLM_WORKER_MULTIPROC_METHOD": "spawn",
|
| 45 |
+
"VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8": "1",
|
| 46 |
+
}
|
| 47 |
+
)
|
| 48 |
+
.add_local_dir("sovereign_bench", remote_path="/root/sovereign_bench")
|
| 49 |
+
)
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
@app.cls(
|
| 53 |
+
image=vllm_image,
|
| 54 |
+
gpu=GPU_NAME,
|
| 55 |
+
secrets=[modal.Secret.from_name("huggingface")],
|
| 56 |
+
volumes={HF_CACHE_DIR: model_cache},
|
| 57 |
+
timeout=GPU_TIMEOUT_SECONDS,
|
| 58 |
+
scaledown_window=10 * 60,
|
| 59 |
+
max_containers=3,
|
| 60 |
+
)
|
| 61 |
+
class VllmModel:
|
| 62 |
+
model_id: str = modal.parameter()
|
| 63 |
+
|
| 64 |
+
@modal.enter()
|
| 65 |
+
def load(self) -> None:
|
| 66 |
+
from vllm import LLM, SamplingParams
|
| 67 |
+
|
| 68 |
+
self.SamplingParams = SamplingParams
|
| 69 |
+
self.llm = LLM(
|
| 70 |
+
model=self.model_id,
|
| 71 |
+
trust_remote_code=True,
|
| 72 |
+
max_model_len=4096,
|
| 73 |
+
gpu_memory_utilization=0.9,
|
| 74 |
+
)
|
| 75 |
+
|
| 76 |
+
@modal.method()
|
| 77 |
+
def generate(self, payload: dict) -> dict:
|
| 78 |
+
from sovereign_bench.llm import ModelCallError, clean_model_text
|
| 79 |
+
|
| 80 |
+
started = time.perf_counter()
|
| 81 |
+
messages = payload["messages"]
|
| 82 |
+
max_tokens = int(payload.get("max_tokens") or 120)
|
| 83 |
+
temperature = float(payload.get("temperature") or 0.45)
|
| 84 |
+
sampling_params = self.SamplingParams(
|
| 85 |
+
max_tokens=max_tokens,
|
| 86 |
+
temperature=temperature,
|
| 87 |
+
top_p=0.9,
|
| 88 |
+
)
|
| 89 |
+
retry_messages = messages + [
|
| 90 |
+
{
|
| 91 |
+
"role": "user",
|
| 92 |
+
"content": (
|
| 93 |
+
"Your previous response did not include visible courtroom dialogue. "
|
| 94 |
+
"Return only the final answer now. Do not mention prompts, tasks, requirements, or that you are following instructions. "
|
| 95 |
+
"Do not include <think>, analysis, reasoning, markdown, narration, or notes. /no_think"
|
| 96 |
+
),
|
| 97 |
+
}
|
| 98 |
+
]
|
| 99 |
+
last_error: Exception | None = None
|
| 100 |
+
text = ""
|
| 101 |
+
for attempt_messages in (messages, retry_messages):
|
| 102 |
+
outputs = self.llm.chat(
|
| 103 |
+
[attempt_messages],
|
| 104 |
+
sampling_params=sampling_params,
|
| 105 |
+
use_tqdm=False,
|
| 106 |
+
chat_template_kwargs={"enable_thinking": False},
|
| 107 |
+
)
|
| 108 |
+
raw_text = outputs[0].outputs[0].text.strip()
|
| 109 |
+
try:
|
| 110 |
+
text = clean_model_text(raw_text)
|
| 111 |
+
break
|
| 112 |
+
except ModelCallError as exc:
|
| 113 |
+
last_error = exc
|
| 114 |
+
if not text and last_error:
|
| 115 |
+
raise last_error
|
| 116 |
+
return {
|
| 117 |
+
"text": text,
|
| 118 |
+
"latency_ms": int((time.perf_counter() - started) * 1000),
|
| 119 |
+
}
|
| 120 |
+
|
| 121 |
+
@modal.method()
|
| 122 |
+
def warm(self) -> dict:
|
| 123 |
+
return {"model": self.model_id, "status": "warm"}
|
| 124 |
+
|
| 125 |
+
|
| 126 |
+
def modal_gpu_enabled() -> bool:
|
| 127 |
+
return os.getenv("SOVEREIGN_DISABLE_MODAL_GPU", "").lower() not in {"1", "true", "yes"}
|
| 128 |
+
|
| 129 |
+
|
| 130 |
+
def modal_gpu_runner(**kwargs) -> ModelResult:
|
| 131 |
+
messages = build_role_messages(
|
| 132 |
+
agent=kwargs["agent"],
|
| 133 |
+
role=kwargs["role"],
|
| 134 |
+
case_summary=kwargs["case_summary"],
|
| 135 |
+
task=kwargs["task"],
|
| 136 |
+
evidence_summary=kwargs["evidence_summary"],
|
| 137 |
+
trial_history=kwargs.get("trial_history", ""),
|
| 138 |
+
persona=kwargs.get("persona", ""),
|
| 139 |
+
objective=kwargs.get("objective", ""),
|
| 140 |
+
)
|
| 141 |
+
requested_model = kwargs["model"]
|
| 142 |
+
prompt_hash = messages_hash(messages)
|
| 143 |
+
|
| 144 |
+
if modal_gpu_enabled():
|
| 145 |
+
output = VllmModel(model_id=requested_model).generate.remote(
|
| 146 |
+
{
|
| 147 |
+
"messages": messages,
|
| 148 |
+
"max_tokens": kwargs.get("max_tokens", 120),
|
| 149 |
+
"temperature": 0.45,
|
| 150 |
+
}
|
| 151 |
+
)
|
| 152 |
+
return ModelResult(
|
| 153 |
+
text=output["text"],
|
| 154 |
+
input_text="\n\n".join(f"{item.get('role', 'user').upper()}:\n{item.get('content', '')}" for item in messages)
|
| 155 |
+
+ "\n\nASSISTANT:\n",
|
| 156 |
+
call=ModelCall(
|
| 157 |
+
model=requested_model,
|
| 158 |
+
provider="modal-gpu-vllm",
|
| 159 |
+
ok=True,
|
| 160 |
+
latency_ms=output["latency_ms"],
|
| 161 |
+
prompt_hash=prompt_hash,
|
| 162 |
+
requested_model=requested_model,
|
| 163 |
+
runtime="modal-gpu-vllm",
|
| 164 |
+
gpu=GPU_NAME,
|
| 165 |
+
),
|
| 166 |
+
)
|
| 167 |
+
|
| 168 |
+
raise RuntimeError("Modal GPU is disabled; no provider fallback is allowed.")
|
| 169 |
+
|
| 170 |
+
|
| 171 |
+
@app.function(image=image, secrets=[modal.Secret.from_name("huggingface")])
|
| 172 |
+
def check_huggingface_connection() -> str:
|
| 173 |
+
token = os.getenv("HF_TOKEN")
|
| 174 |
+
if not token:
|
| 175 |
+
return "HF_TOKEN is not available inside Modal."
|
| 176 |
+
|
| 177 |
+
from huggingface_hub import HfApi
|
| 178 |
+
|
| 179 |
+
user = HfApi(token=token).whoami()["name"]
|
| 180 |
+
return f"Connected to Hugging Face as {user}."
|
| 181 |
+
|
| 182 |
+
|
| 183 |
+
@app.function(
|
| 184 |
+
image=image,
|
| 185 |
+
secrets=[modal.Secret.from_name("huggingface")],
|
| 186 |
+
min_containers=1,
|
| 187 |
+
timeout=GPU_TIMEOUT_SECONDS,
|
| 188 |
+
)
|
| 189 |
+
@modal.fastapi_endpoint(method="POST", label="trial-stream")
|
| 190 |
+
def trial_stream(payload: dict):
|
| 191 |
+
from fastapi.responses import StreamingResponse
|
| 192 |
+
|
| 193 |
+
request = TrialRequest.model_validate(payload)
|
| 194 |
+
delay = {"swift": 0.02, "measured": 0.12, "ceremonial": 0.25}[request.speed]
|
| 195 |
+
return StreamingResponse(
|
| 196 |
+
stream_trial_jsonl(request, delay=delay, model_runner=modal_gpu_runner),
|
| 197 |
+
media_type="application/x-ndjson",
|
| 198 |
+
)
|
| 199 |
+
|
| 200 |
+
|
| 201 |
+
@app.local_entrypoint()
|
| 202 |
+
def main():
|
| 203 |
+
print(check_huggingface_connection.remote())
|
| 204 |
+
|
| 205 |
+
|
| 206 |
+
@app.local_entrypoint()
|
| 207 |
+
def warm_models():
|
| 208 |
+
deployed_model = modal.Cls.from_name(MODAL_APP_NAME, "VllmModel")
|
| 209 |
+
for model_id in USED_MODEL_IDS:
|
| 210 |
+
model = deployed_model(model_id=model_id)
|
| 211 |
+
model.update_autoscaler(min_containers=1)
|
| 212 |
+
print(model.warm.remote())
|