File size: 26,162 Bytes
053ee0d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 | # Split-Brain Speculative Co-Pilot
### Build Small Hackathon — Complete Build Instructions
> **Concept:** A 1B model runs entirely in the user's browser via WebGPU + transformers.js, streaming code instantly. A 14B model on Modal verifies the draft in the background. When the verifier catches a bug, the UI rolls back the local generation and replaces it with the corrected cloud block — live, visually.
>
> **Models:** `Qwen2.5-Coder-1.5B` (browser, WebGPU) + `Qwen2.5-Coder-14B-Instruct` (Modal, GGUF via llama.cpp) — combined 15.5B, well under the 32B cap.
>
> **Bonus badges targeted:** Off the Grid · Llama Champion · Off-Brand · Field Notes
---
## 0. Prerequisites
- Python 3.11+
- Node.js 18+ (for local frontend testing)
- Modal account with `modal` CLI installed and authenticated (`modal token new`)
- Hugging Face account, joined the `build-small-hackathon` org, HF token with write access
- `huggingface-cli` installed and logged in (`huggingface-cli login`)
- Chrome 113+ (WebGPU required — Firefox and Safari are out, document this clearly)
- Git
---
## 1. Repository Structure
Set up the project layout before writing any code.
```
split-brain-copilot/
├── app.py # Gradio app entry point (HF Space root)
├── modal_backend/
│ ├── __init__.py
│ ├── verifier.py # Modal app: 14B inference endpoint
│ └── sandbox.py # Modal app: code execution sandbox
├── static/
│ ├── engine.js # transformers.js WebGPU inference engine
│ ├── ui.js # Stream rendering, rollback animation, diff logic
│ └── style.css # Custom UI (required for Off-Brand badge)
├── requirements.txt
└── README.md # HF Space card + demo video embed
```
Initialize git and create a HF Space repo (check whether we have huggingface cli installed and token set or logged in):
```bash
git init
huggingface-cli repo create split-brain-copilot --type space --space-sdk gradio
git remote add origin https://huggingface.co/spaces/YOUR_HF_USERNAME/split-brain-copilot
```
---
## 2. Modal Backend — 14B Verifier Endpoint
### 2.1 Download the GGUF model to a Modal Volume
The 14B model is too large to bake into the image. Use a Modal Volume for persistent storage.
```python
# modal_backend/verifier.py
import modal
app = modal.App("split-brain-verifier")
# Persistent volume — survives cold starts
model_volume = modal.Volume.from_name("qwen-14b-volume", create_if_missing=True)
MODEL_DIR = "/models"
MODEL_FILENAME = "qwen2.5-coder-14b-instruct-q4_k_m.gguf"
# Source: bartowski/Qwen2.5-Coder-14B-Instruct-GGUF on HuggingFace
MODEL_REPO = "bartowski/Qwen2.5-Coder-14B-Instruct-GGUF"
```
Create a one-time download function:
```python
@app.function(
volumes={MODEL_DIR: model_volume},
timeout=3600,
secrets=[modal.Secret.from_name("huggingface-secret")],
)
def download_model():
from huggingface_hub import hf_hub_download
import os
hf_hub_download(
repo_id=MODEL_REPO,
filename=MODEL_FILENAME,
local_dir=MODEL_DIR,
)
model_volume.commit()
print(f"Downloaded to {MODEL_DIR}/{MODEL_FILENAME}")
```
Run this once: `modal run modal_backend/verifier.py::download_model`
### 2.2 Build the llama.cpp image
```python
llama_image = (
modal.Image.debian_slim(python_version="3.11")
.apt_install("build-essential", "cmake", "git", "libgomp1")
.run_commands(
"git clone https://github.com/ggerganov/llama.cpp /llama.cpp",
"cd /llama.cpp && cmake -B build -DLLAMA_CURL=OFF && cmake --build build --config Release -j$(nproc)",
"cd /llama.cpp && pip install -e .",
)
.pip_install("llama-cpp-python==0.3.4", "fastapi", "uvicorn")
)
```
### 2.3 Verifier inference function
The verifier receives the speculated code draft and the original prompt, and returns a verdict: `PASS`, `FIX`, or `REWRITE` with corrected code.
```python
@app.cls(
image=llama_image,
gpu=modal.gpu.A10G(),
volumes={MODEL_DIR: model_volume},
container_idle_timeout=300,
allow_concurrent_inputs=10,
)
class Verifier:
@modal.enter()
def load_model(self):
from llama_cpp import Llama
self.llm = Llama(
model_path=f"{MODEL_DIR}/{MODEL_FILENAME}",
n_gpu_layers=-1, # all layers on GPU
n_ctx=8192,
n_batch=512,
verbose=False,
)
@modal.method()
def verify(self, prompt: str, draft_code: str, language: str = "python") -> dict:
system = f"""You are a code verifier. A smaller model drafted the following {language} code.
Your job:
1. Check for bugs, logic errors, type errors, off-by-one errors, and security issues.
2. If the code is correct, respond with exactly: {{"verdict": "PASS"}}
3. If fixable, respond with: {{"verdict": "FIX", "corrected_code": "<fixed code here>", "reason": "<one line>"}}
4. If fundamentally wrong, respond with: {{"verdict": "REWRITE", "corrected_code": "<rewritten code>", "reason": "<one line>"}}
Respond ONLY with valid JSON. No markdown, no explanation outside the JSON."""
user = f"Original prompt:\n{prompt}\n\nDrafted code:\n```{language}\n{draft_code}\n```"
response = self.llm.create_chat_completion(
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user},
],
max_tokens=2048,
temperature=0.1,
)
import json
raw = response["choices"][0]["message"]["content"].strip()
try:
return json.loads(raw)
except json.JSONDecodeError:
# Fallback: treat as PASS if we can't parse
return {"verdict": "PASS"}
```
### 2.4 Modal Sandbox — code execution (optional but impressive)
Sandboxed execution confirms the corrected code actually runs. This is what earns you extra credibility in the demo.
```python
# modal_backend/sandbox.py
import modal
app = modal.App("split-brain-sandbox")
@app.function(timeout=30)
def execute_python(code: str) -> dict:
"""Run untrusted code in a Modal sandbox and return stdout/stderr."""
sandbox = modal.Sandbox.create(
"python3", "-c", code,
image=modal.Image.debian_slim().pip_install("numpy"),
timeout=10,
cpu=0.5,
)
sandbox.wait()
return {
"stdout": sandbox.stdout.read(),
"stderr": sandbox.stderr.read(),
"returncode": sandbox.returncode,
}
```
### 2.5 Deploy the Modal backend
```bash
modal deploy modal_backend/verifier.py
modal deploy modal_backend/sandbox.py
```
Note the endpoint URLs printed after deploy. You'll need them in `app.py`.
Store your Modal token and HF token as Modal secrets:
```bash
modal secret create huggingface-secret HF_TOKEN=hf_xxx
```
---
## 3. Browser Engine — transformers.js + WebGPU
### 3.1 Model choice for the browser
Use `Qwen2.5-Coder-1.5B-Instruct` in ONNX/WebGPU format. Xenova and onnx-community maintain these on HF Hub. Target:
`onnx-community/Qwen2.5-Coder-1.5B-Instruct` with `dtype: "q4"` for fast WebGPU loading (~800MB, fits comfortably in browser VRAM on a modern GPU).
### 3.2 engine.js — WebGPU inference
```javascript
// static/engine.js
import { pipeline, TextStreamer } from "https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.5.0/dist/transformers.min.js";
const MODEL_ID = "onnx-community/Qwen2.5-Coder-1.5B-Instruct";
let generator = null;
let isLoaded = false;
export async function loadModel(onProgress) {
if (isLoaded) return;
generator = await pipeline("text-generation", MODEL_ID, {
dtype: "q4",
device: "webgpu",
progress_callback: onProgress,
});
isLoaded = true;
}
export async function generateCode(prompt, language, onToken, onComplete) {
if (!generator) throw new Error("Model not loaded");
const messages = [
{
role: "system",
content: `You are an expert ${language} programmer. Write clean, correct, production-ready code. Output ONLY the code block, no explanation.`
},
{ role: "user", content: prompt }
];
const streamer = new TextStreamer(generator.tokenizer, {
skip_prompt: true,
callback_function: (token) => {
onToken(token);
},
});
const result = await generator(messages, {
max_new_tokens: 1024,
temperature: 0.2,
do_sample: true,
streamer,
});
const fullCode = result[0].generated_text.at(-1).content;
onComplete(fullCode);
return fullCode;
}
export function isWebGPUSupported() {
return !!navigator.gpu;
}
```
### 3.3 ui.js — stream rendering + rollback animation
```javascript
// static/ui.js
let currentTokens = [];
let streamBuffer = "";
export function initEditor(containerId) {
// Attach to the Gradio custom HTML component
const container = document.getElementById(containerId);
container.innerHTML = `
<div id="stream-display" class="code-stream"></div>
<div id="status-bar" class="status-bar">
<span id="status-text">Ready</span>
<span id="token-count">0 tok/s</span>
<span id="verifier-status"></span>
</div>
`;
}
export function appendToken(token) {
streamBuffer += token;
currentTokens.push(token);
const display = document.getElementById("stream-display");
if (display) display.textContent = streamBuffer;
}
export function setStatus(text, type = "neutral") {
const el = document.getElementById("status-text");
if (el) {
el.textContent = text;
el.className = `status-${type}`;
}
}
export function setVerifierStatus(verdict) {
const el = document.getElementById("verifier-status");
if (!el) return;
const icons = { PASS: "✅ Verified", FIX: "🔧 Fixed", REWRITE: "🔄 Rewritten", CHECKING: "🔍 Verifying..." };
el.textContent = icons[verdict] || "";
}
export async function rollbackAndReplace(correctedCode, reason) {
const display = document.getElementById("stream-display");
if (!display) return;
// Flash red to signal rollback
display.classList.add("rollback-flash");
setVerifierStatus("FIX");
setStatus(`Verifier corrected: ${reason}`, "warning");
await sleep(400);
display.classList.remove("rollback-flash");
// Type in corrected code character by character
display.textContent = "";
streamBuffer = correctedCode;
currentTokens = [];
for (let i = 0; i < correctedCode.length; i++) {
display.textContent += correctedCode[i];
if (i % 5 === 0) await sleep(8); // smooth typewriter
}
setVerifierStatus("PASS");
setStatus("Ready", "neutral");
}
export function getCurrentCode() {
return streamBuffer;
}
export function reset() {
streamBuffer = "";
currentTokens = [];
const display = document.getElementById("stream-display");
if (display) display.textContent = "";
}
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
```
### 3.4 style.css — custom UI (Off-Brand badge)
```css
/* static/style.css */
:root {
--bg: #0d1117;
--surface: #161b22;
--border: #30363d;
--accent: #58a6ff;
--accent-warn: #f0883e;
--text: #e6edf3;
--text-muted: #8b949e;
--green: #3fb950;
--red: #f85149;
}
body { background: var(--bg); color: var(--text); font-family: 'JetBrains Mono', monospace; }
.code-stream {
background: var(--surface);
border: 1px solid var(--border);
border-radius: 8px;
padding: 16px;
min-height: 300px;
font-family: 'JetBrains Mono', monospace;
font-size: 13px;
line-height: 1.6;
white-space: pre-wrap;
overflow-y: auto;
transition: border-color 0.2s;
}
.rollback-flash {
border-color: var(--red) !important;
background: rgba(248, 81, 73, 0.08) !important;
animation: flash 0.4s ease;
}
@keyframes flash {
0% { background: rgba(248, 81, 73, 0.25); }
100% { background: rgba(248, 81, 73, 0.08); }
}
.status-bar {
display: flex;
justify-content: space-between;
padding: 8px 12px;
background: var(--surface);
border: 1px solid var(--border);
border-top: none;
border-radius: 0 0 8px 8px;
font-size: 12px;
color: var(--text-muted);
}
.status-warning { color: var(--accent-warn); }
.status-success { color: var(--green); }
.status-neutral { color: var(--text-muted); }
/* Gradio overrides */
.gradio-container { background: var(--bg) !important; }
footer { display: none !important; }
/* WebGPU loading bar */
.loading-bar {
height: 3px;
background: var(--border);
border-radius: 2px;
overflow: hidden;
margin: 8px 0;
}
.loading-bar-fill {
height: 100%;
background: var(--accent);
transition: width 0.3s ease;
}
```
---
## 4. Gradio App — app.py
This is the HF Space entry point. Gradio acts as the shell; the real UI lives in the custom HTML component injected via `gr.HTML`.
```python
# app.py
import gradio as gr
import httpx
import json
import os
import asyncio
from pathlib import Path
MODAL_VERIFIER_URL = os.environ.get("MODAL_VERIFIER_URL") # set as HF Space secret
MODAL_SANDBOX_URL = os.environ.get("MODAL_SANDBOX_URL") # set as HF Space secret
LANGUAGES = ["Python", "JavaScript", "TypeScript", "Rust", "Go", "C++"]
def load_static(filename):
return Path(f"static/{filename}").read_text()
custom_html = f"""
<!DOCTYPE html>
<html>
<head>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link href="https://fonts.googleapis.com/css2?family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">
<style>{load_static('style.css')}</style>
</head>
<body>
<div id="split-brain-root">
<div class="webgpu-notice" id="webgpu-warning" style="display:none; color:#f85149; padding:8px; border:1px solid #f85149; border-radius:6px; margin-bottom:12px;">
⚠️ WebGPU not detected. Please use Chrome 113+ on desktop for local inference.
</div>
<div id="load-section">
<button id="load-btn" onclick="initEngine()">⚡ Load 1.5B Model (WebGPU)</button>
<div class="loading-bar"><div class="loading-bar-fill" id="load-progress" style="width:0%"></div></div>
<span id="load-status" style="font-size:12px; color:#8b949e;"></span>
</div>
<div id="stream-display" class="code-stream" style="margin-top:12px;">Waiting for model load...</div>
<div class="status-bar">
<span id="status-text">Idle</span>
<span id="token-count"></span>
<span id="verifier-status"></span>
</div>
</div>
<script type="module">
{load_static('engine.js')}
{load_static('ui.js')}
// Check WebGPU on load
if (!isWebGPUSupported()) {{
document.getElementById('webgpu-warning').style.display = 'block';
document.getElementById('load-btn').disabled = true;
}}
window.initEngine = async function() {{
document.getElementById('load-btn').disabled = true;
document.getElementById('load-status').textContent = 'Loading model weights...';
await loadModel((progress) => {{
if (progress.progress) {{
document.getElementById('load-progress').style.width = progress.progress + '%';
document.getElementById('load-status').textContent = `${{progress.file || 'Loading'}} — ${{Math.round(progress.progress)}}%`;
}}
}});
document.getElementById('load-status').textContent = '✅ Model ready — WebGPU active';
document.getElementById('load-section').style.opacity = '0.5';
}};
// Gradio will call this via the hidden trigger
window.runLocalGeneration = async function(prompt, language) {{
reset();
setStatus('Generating locally (WebGPU)...', 'neutral');
let tokenCount = 0;
const startTime = Date.now();
const fullCode = await generateCode(prompt, language,
(token) => {{
appendToken(token);
tokenCount++;
const elapsed = (Date.now() - startTime) / 1000;
const tps = Math.round(tokenCount / elapsed);
document.getElementById('token-count').textContent = `${{tps}} tok/s`;
}},
(code) => {{
setStatus('Local generation complete. Verifying...', 'neutral');
}}
);
// Send to Gradio backend for verification
// Use the hidden Gradio state to trigger the verify function
document.getElementById('draft-output-hidden').value = fullCode;
document.getElementById('trigger-verify-btn').click();
}};
window.applyVerification = function(verdictJson) {{
const verdict = JSON.parse(verdictJson);
if (verdict.verdict === 'PASS') {{
setVerifierStatus('PASS');
setStatus('✅ Verified clean', 'success');
}} else {{
rollbackAndReplace(verdict.corrected_code, verdict.reason);
}}
}};
</script>
</body>
</html>
"""
async def verify_with_modal(prompt: str, draft_code: str, language: str) -> str:
"""Call Modal verifier endpoint and return JSON string."""
if not MODAL_VERIFIER_URL:
return json.dumps({"verdict": "PASS"})
async with httpx.AsyncClient(timeout=60.0) as client:
resp = await client.post(
MODAL_VERIFIER_URL,
json={"prompt": prompt, "draft_code": draft_code, "language": language},
)
resp.raise_for_status()
return resp.text
async def execute_in_sandbox(code: str) -> dict:
"""Call Modal sandbox and return execution result."""
if not MODAL_SANDBOX_URL:
return {"stdout": "", "stderr": "Sandbox not configured", "returncode": -1}
async with httpx.AsyncClient(timeout=30.0) as client:
resp = await client.post(MODAL_SANDBOX_URL, json={"code": code})
return resp.json()
with gr.Blocks(
title="Split-Brain Co-Pilot",
css="footer {display:none}",
theme=gr.themes.Base(
primary_hue="blue",
neutral_hue="slate",
),
) as demo:
gr.HTML("<h1 style='text-align:center; color:#58a6ff;'>⚡ Split-Brain Co-Pilot</h1>")
gr.HTML("<p style='text-align:center; color:#8b949e;'>1.5B model runs in your browser (WebGPU). 14B model on Modal verifies and corrects.</p>")
with gr.Row():
with gr.Column(scale=2):
prompt_input = gr.Textbox(
label="What do you want to build?",
placeholder="e.g. A function that parses a CSV and returns the top 5 rows by a given column",
lines=3,
)
language_select = gr.Dropdown(
choices=LANGUAGES,
value="Python",
label="Language",
)
generate_btn = gr.Button("⚡ Generate (WebGPU → Verify)", variant="primary")
with gr.Column(scale=3):
# Custom HTML component for streaming display
gr.HTML(custom_html)
# Hidden elements for JS ↔ Gradio bridge
draft_hidden = gr.Textbox(visible=False, elem_id="draft-output-hidden")
verify_trigger = gr.Button("verify", visible=False, elem_id="trigger-verify-btn")
verdict_output = gr.Textbox(visible=False, label="verdict")
with gr.Row():
sandbox_output = gr.Code(label="Sandbox Execution Output", language="python", visible=False)
# Gradio event: user clicks Generate → JS takes over for local inference
generate_btn.click(
fn=None,
inputs=[prompt_input, language_select],
outputs=[],
js="(prompt, lang) => { window.runLocalGeneration(prompt, lang); return []; }",
)
# Gradio event: JS triggers verify after local generation completes
async def run_verification(prompt, draft_code, language):
verdict_json = await verify_with_modal(prompt, draft_code, language)
return verdict_json
verify_trigger.click(
fn=run_verification,
inputs=[prompt_input, draft_hidden, language_select],
outputs=[verdict_output],
)
# Apply verdict back to JS
verdict_output.change(
fn=None,
inputs=[verdict_output],
outputs=[],
js="(verdict) => { window.applyVerification(verdict); return []; }",
)
if __name__ == "__main__":
demo.launch()
```
---
## 5. Modal Web Endpoint Wrapper
The Modal functions need to be exposed as HTTP endpoints that `app.py` can call via httpx. Add this to `verifier.py`:
```python
from fastapi import FastAPI
from pydantic import BaseModel
web_app = FastAPI()
class VerifyRequest(BaseModel):
prompt: str
draft_code: str
language: str = "python"
@app.function(
image=llama_image,
gpu=modal.gpu.A10G(),
volumes={MODEL_DIR: model_volume},
container_idle_timeout=300,
)
@modal.asgi_app()
def verifier_endpoint():
verifier = Verifier()
@web_app.post("/verify")
async def verify(req: VerifyRequest):
result = verifier.verify.remote(req.prompt, req.draft_code, req.language)
return result
return web_app
```
After deploying, Modal gives you a URL like `https://your-username--split-brain-verifier-verifier-endpoint.modal.run`. Set this as the HF Space secret `MODAL_VERIFIER_URL`.
---
## 6. HF Space Configuration
### 6.1 README.md (Space card)
```yaml
---
title: Split-Brain Co-Pilot
emoji: ⚡
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
- code-generation
- webgpu
- speculative-decoding
- llama.cpp
- local-first
---
```
### 6.2 requirements.txt
```
gradio==5.30.0
httpx==0.27.0
modal==0.73.0
huggingface-hub==0.23.0
```
### 6.3 HF Space secrets
Set these in the Space Settings → Repository secrets:
| Secret name | Value |
|---|---|
| `MODAL_VERIFIER_URL` | Your Modal verifier endpoint URL |
| `MODAL_SANDBOX_URL` | Your Modal sandbox endpoint URL |
| `MODAL_TOKEN_ID` | From `modal token show` |
| `MODAL_TOKEN_SECRET` | From `modal token show` |
---
## 7. Cold Start Mitigation
Modal A10G containers take 10–40 seconds to cold start. Handle this gracefully:
In `verify_with_modal`, add a keep-warm ping. Add this to `verifier.py`:
```python
@app.function(schedule=modal.Cron("*/5 * * * *"))
def keep_warm():
"""Ping the verifier every 5 minutes to avoid cold starts during the demo window."""
Verifier().verify.remote("test", "print('hello')", "python")
```
Deploy this separately: `modal deploy modal_backend/verifier.py`
In the UI, show "Verifier warming up..." in the status bar while the first request is in flight and display a spinner. Do not let the UI appear broken during cold start.
---
## 8. Demo Video Script
The demo video is a submission requirement. Plan it around these beats:
- Open Chrome, show the app. Explain the split-brain concept in one sentence.
- Click "Load 1.5B Model" — show the WebGPU loading progress bar.
- Type a non-trivial prompt: "Write a Python function that finds all prime numbers up to n using a segmented sieve, handling edge cases."
- Hit Generate — show tokens streaming at 80–120 tok/s with the token counter live.
- Show the "Verifying..." status kick in immediately after local generation completes.
- If the verifier returns FIX or REWRITE: show the red flash rollback animation and the corrected code typing in.
- Show the sandbox execution output (stdout) confirming the corrected code runs.
- End on the split status bar: "Local: WebGPU · Cloud: Modal A10G · Verdict: ✅ Verified"
Keep the video under 3 minutes. Record with OBS or Loom. No cuts during the generation — the live stream is the point.
---
## 9. Bonus Badge Checklist
| Badge | How you earn it | Status |
|---|---|---|
| **Off the Grid** | 1.5B runs 100% in browser, no cloud API for inference | ✅ Automatic |
| **Llama Champion** | 14B served via llama.cpp on Modal | ✅ Automatic |
| **Off-Brand** | Custom dark theme, rollback animation, token counter, status bar | ✅ Build it |
| **Field Notes** | Write a blog post on HF or Dev.to explaining the speculative split-brain architecture | ✅ Write it post-build |
---
## 10. Submission Checklist
Before June 15 deadline:
- [ ] Modal verifier deployed and endpoint URL confirmed working
- [ ] HF Space live and publicly accessible under `build-small-hackathon` org
- [ ] WebGPU model loads in Chrome without errors
- [ ] Token streaming visible in UI
- [ ] Rollback animation triggers on at least one FIX/REWRITE verdict
- [ ] Sandbox execution output shown in demo
- [ ] Demo video recorded and uploaded (YouTube unlisted or HF)
- [ ] Social media post published (Twitter/X or LinkedIn) with Space link and demo video
- [ ] README.md Space card complete with description, tags, and video embed
- [ ] Field Notes blog post published and linked in README
---
## 11. Known Gotchas
**WebGPU VRAM:** The 1.5B Q4 ONNX model needs ~1GB VRAM. On machines with integrated graphics sharing system RAM, this works but may be slow. Document the Chrome + dedicated GPU requirement.
**CORS:** Modal's ASGI endpoints allow cross-origin by default, but if you hit CORS errors in the browser JS, add `fastapi.middleware.cors.CORSMiddleware` to the web_app with `allow_origins=["*"]`.
**transformers.js version:** Pin to `3.5.x`. Breaking changes in 3.x are frequent. The CDN import in `engine.js` uses the pinned version — don't use `@latest`.
**Gradio JS bridge:** The `gr.Button(visible=False)` trigger pattern is the cleanest way to fire a Python function from browser JS in Gradio 5.x without websocket hacks. Do not use `gr.Request` for this — it won't work from inside a custom HTML block.
**Modal Volume first deploy:** The volume download must complete before the verifier function can load the model. Run `download_model` manually once and confirm with `modal volume ls qwen-14b-volume /models` before deploying the endpoint.
**HF Space cold start:** HF Spaces themselves also cold start. If the Space hasn't been visited recently, Gradio takes 20–30 seconds to boot. Add a loading spinner at the Gradio level using `gr.HTML` with a brief "Space initializing..." message that auto-hides once the page is interactive.
|