File size: 26,162 Bytes
053ee0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
# Split-Brain Speculative Co-Pilot
### Build Small Hackathon — Complete Build Instructions

> **Concept:** A 1B model runs entirely in the user's browser via WebGPU + transformers.js, streaming code instantly. A 14B model on Modal verifies the draft in the background. When the verifier catches a bug, the UI rolls back the local generation and replaces it with the corrected cloud block — live, visually.
>
> **Models:** `Qwen2.5-Coder-1.5B` (browser, WebGPU) + `Qwen2.5-Coder-14B-Instruct` (Modal, GGUF via llama.cpp) — combined 15.5B, well under the 32B cap.
>
> **Bonus badges targeted:** Off the Grid · Llama Champion · Off-Brand · Field Notes

---

## 0. Prerequisites

- Python 3.11+
- Node.js 18+ (for local frontend testing)
- Modal account with `modal` CLI installed and authenticated (`modal token new`)
- Hugging Face account, joined the `build-small-hackathon` org, HF token with write access
- `huggingface-cli` installed and logged in (`huggingface-cli login`)
- Chrome 113+ (WebGPU required — Firefox and Safari are out, document this clearly)
- Git

---

## 1. Repository Structure

Set up the project layout before writing any code.

```
split-brain-copilot/
├── app.py                  # Gradio app entry point (HF Space root)
├── modal_backend/
│   ├── __init__.py
│   ├── verifier.py         # Modal app: 14B inference endpoint
│   └── sandbox.py          # Modal app: code execution sandbox
├── static/
│   ├── engine.js           # transformers.js WebGPU inference engine
│   ├── ui.js               # Stream rendering, rollback animation, diff logic
│   └── style.css           # Custom UI (required for Off-Brand badge)
├── requirements.txt
└── README.md               # HF Space card + demo video embed
```

Initialize git and create a HF Space repo (check whether we have huggingface cli installed and token set or logged in):

```bash
git init
huggingface-cli repo create split-brain-copilot --type space --space-sdk gradio
git remote add origin https://huggingface.co/spaces/YOUR_HF_USERNAME/split-brain-copilot
```

---

## 2. Modal Backend — 14B Verifier Endpoint

### 2.1 Download the GGUF model to a Modal Volume

The 14B model is too large to bake into the image. Use a Modal Volume for persistent storage.

```python
# modal_backend/verifier.py
import modal

app = modal.App("split-brain-verifier")

# Persistent volume — survives cold starts
model_volume = modal.Volume.from_name("qwen-14b-volume", create_if_missing=True)

MODEL_DIR = "/models"
MODEL_FILENAME = "qwen2.5-coder-14b-instruct-q4_k_m.gguf"
# Source: bartowski/Qwen2.5-Coder-14B-Instruct-GGUF on HuggingFace
MODEL_REPO = "bartowski/Qwen2.5-Coder-14B-Instruct-GGUF"
```

Create a one-time download function:

```python
@app.function(
    volumes={MODEL_DIR: model_volume},
    timeout=3600,
    secrets=[modal.Secret.from_name("huggingface-secret")],
)
def download_model():
    from huggingface_hub import hf_hub_download
    import os
    hf_hub_download(
        repo_id=MODEL_REPO,
        filename=MODEL_FILENAME,
        local_dir=MODEL_DIR,
    )
    model_volume.commit()
    print(f"Downloaded to {MODEL_DIR}/{MODEL_FILENAME}")
```

Run this once: `modal run modal_backend/verifier.py::download_model`

### 2.2 Build the llama.cpp image

```python
llama_image = (
    modal.Image.debian_slim(python_version="3.11")
    .apt_install("build-essential", "cmake", "git", "libgomp1")
    .run_commands(
        "git clone https://github.com/ggerganov/llama.cpp /llama.cpp",
        "cd /llama.cpp && cmake -B build -DLLAMA_CURL=OFF && cmake --build build --config Release -j$(nproc)",
        "cd /llama.cpp && pip install -e .",
    )
    .pip_install("llama-cpp-python==0.3.4", "fastapi", "uvicorn")
)
```

### 2.3 Verifier inference function

The verifier receives the speculated code draft and the original prompt, and returns a verdict: `PASS`, `FIX`, or `REWRITE` with corrected code.

```python
@app.cls(
    image=llama_image,
    gpu=modal.gpu.A10G(),
    volumes={MODEL_DIR: model_volume},
    container_idle_timeout=300,
    allow_concurrent_inputs=10,
)
class Verifier:
    @modal.enter()
    def load_model(self):
        from llama_cpp import Llama
        self.llm = Llama(
            model_path=f"{MODEL_DIR}/{MODEL_FILENAME}",
            n_gpu_layers=-1,      # all layers on GPU
            n_ctx=8192,
            n_batch=512,
            verbose=False,
        )

    @modal.method()
    def verify(self, prompt: str, draft_code: str, language: str = "python") -> dict:
        system = f"""You are a code verifier. A smaller model drafted the following {language} code.
Your job:
1. Check for bugs, logic errors, type errors, off-by-one errors, and security issues.
2. If the code is correct, respond with exactly: {{"verdict": "PASS"}}
3. If fixable, respond with: {{"verdict": "FIX", "corrected_code": "<fixed code here>", "reason": "<one line>"}}
4. If fundamentally wrong, respond with: {{"verdict": "REWRITE", "corrected_code": "<rewritten code>", "reason": "<one line>"}}
Respond ONLY with valid JSON. No markdown, no explanation outside the JSON."""

        user = f"Original prompt:\n{prompt}\n\nDrafted code:\n```{language}\n{draft_code}\n```"

        response = self.llm.create_chat_completion(
            messages=[
                {"role": "system", "content": system},
                {"role": "user", "content": user},
            ],
            max_tokens=2048,
            temperature=0.1,
        )
        import json
        raw = response["choices"][0]["message"]["content"].strip()
        try:
            return json.loads(raw)
        except json.JSONDecodeError:
            # Fallback: treat as PASS if we can't parse
            return {"verdict": "PASS"}
```

### 2.4 Modal Sandbox — code execution (optional but impressive)

Sandboxed execution confirms the corrected code actually runs. This is what earns you extra credibility in the demo.

```python
# modal_backend/sandbox.py
import modal

app = modal.App("split-brain-sandbox")

@app.function(timeout=30)
def execute_python(code: str) -> dict:
    """Run untrusted code in a Modal sandbox and return stdout/stderr."""
    sandbox = modal.Sandbox.create(
        "python3", "-c", code,
        image=modal.Image.debian_slim().pip_install("numpy"),
        timeout=10,
        cpu=0.5,
    )
    sandbox.wait()
    return {
        "stdout": sandbox.stdout.read(),
        "stderr": sandbox.stderr.read(),
        "returncode": sandbox.returncode,
    }
```

### 2.5 Deploy the Modal backend

```bash
modal deploy modal_backend/verifier.py
modal deploy modal_backend/sandbox.py
```

Note the endpoint URLs printed after deploy. You'll need them in `app.py`.

Store your Modal token and HF token as Modal secrets:
```bash
modal secret create huggingface-secret HF_TOKEN=hf_xxx
```

---

## 3. Browser Engine — transformers.js + WebGPU

### 3.1 Model choice for the browser

Use `Qwen2.5-Coder-1.5B-Instruct` in ONNX/WebGPU format. Xenova and onnx-community maintain these on HF Hub. Target:
`onnx-community/Qwen2.5-Coder-1.5B-Instruct` with `dtype: "q4"` for fast WebGPU loading (~800MB, fits comfortably in browser VRAM on a modern GPU).

### 3.2 engine.js — WebGPU inference

```javascript
// static/engine.js
import { pipeline, TextStreamer } from "https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.5.0/dist/transformers.min.js";

const MODEL_ID = "onnx-community/Qwen2.5-Coder-1.5B-Instruct";
let generator = null;
let isLoaded = false;

export async function loadModel(onProgress) {
    if (isLoaded) return;
    generator = await pipeline("text-generation", MODEL_ID, {
        dtype: "q4",
        device: "webgpu",
        progress_callback: onProgress,
    });
    isLoaded = true;
}

export async function generateCode(prompt, language, onToken, onComplete) {
    if (!generator) throw new Error("Model not loaded");

    const messages = [
        {
            role: "system",
            content: `You are an expert ${language} programmer. Write clean, correct, production-ready code. Output ONLY the code block, no explanation.`
        },
        { role: "user", content: prompt }
    ];

    const streamer = new TextStreamer(generator.tokenizer, {
        skip_prompt: true,
        callback_function: (token) => {
            onToken(token);
        },
    });

    const result = await generator(messages, {
        max_new_tokens: 1024,
        temperature: 0.2,
        do_sample: true,
        streamer,
    });

    const fullCode = result[0].generated_text.at(-1).content;
    onComplete(fullCode);
    return fullCode;
}

export function isWebGPUSupported() {
    return !!navigator.gpu;
}
```

### 3.3 ui.js — stream rendering + rollback animation

```javascript
// static/ui.js

let currentTokens = [];
let streamBuffer = "";

export function initEditor(containerId) {
    // Attach to the Gradio custom HTML component
    const container = document.getElementById(containerId);
    container.innerHTML = `
        <div id="stream-display" class="code-stream"></div>
        <div id="status-bar" class="status-bar">
            <span id="status-text">Ready</span>
            <span id="token-count">0 tok/s</span>
            <span id="verifier-status"></span>
        </div>
    `;
}

export function appendToken(token) {
    streamBuffer += token;
    currentTokens.push(token);
    const display = document.getElementById("stream-display");
    if (display) display.textContent = streamBuffer;
}

export function setStatus(text, type = "neutral") {
    const el = document.getElementById("status-text");
    if (el) {
        el.textContent = text;
        el.className = `status-${type}`;
    }
}

export function setVerifierStatus(verdict) {
    const el = document.getElementById("verifier-status");
    if (!el) return;
    const icons = { PASS: "✅ Verified", FIX: "🔧 Fixed", REWRITE: "🔄 Rewritten", CHECKING: "🔍 Verifying..." };
    el.textContent = icons[verdict] || "";
}

export async function rollbackAndReplace(correctedCode, reason) {
    const display = document.getElementById("stream-display");
    if (!display) return;

    // Flash red to signal rollback
    display.classList.add("rollback-flash");
    setVerifierStatus("FIX");
    setStatus(`Verifier corrected: ${reason}`, "warning");

    await sleep(400);
    display.classList.remove("rollback-flash");

    // Type in corrected code character by character
    display.textContent = "";
    streamBuffer = correctedCode;
    currentTokens = [];

    for (let i = 0; i < correctedCode.length; i++) {
        display.textContent += correctedCode[i];
        if (i % 5 === 0) await sleep(8); // smooth typewriter
    }

    setVerifierStatus("PASS");
    setStatus("Ready", "neutral");
}

export function getCurrentCode() {
    return streamBuffer;
}

export function reset() {
    streamBuffer = "";
    currentTokens = [];
    const display = document.getElementById("stream-display");
    if (display) display.textContent = "";
}

function sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
}
```

### 3.4 style.css — custom UI (Off-Brand badge)

```css
/* static/style.css */
:root {
    --bg: #0d1117;
    --surface: #161b22;
    --border: #30363d;
    --accent: #58a6ff;
    --accent-warn: #f0883e;
    --text: #e6edf3;
    --text-muted: #8b949e;
    --green: #3fb950;
    --red: #f85149;
}

body { background: var(--bg); color: var(--text); font-family: 'JetBrains Mono', monospace; }

.code-stream {
    background: var(--surface);
    border: 1px solid var(--border);
    border-radius: 8px;
    padding: 16px;
    min-height: 300px;
    font-family: 'JetBrains Mono', monospace;
    font-size: 13px;
    line-height: 1.6;
    white-space: pre-wrap;
    overflow-y: auto;
    transition: border-color 0.2s;
}

.rollback-flash {
    border-color: var(--red) !important;
    background: rgba(248, 81, 73, 0.08) !important;
    animation: flash 0.4s ease;
}

@keyframes flash {
    0%   { background: rgba(248, 81, 73, 0.25); }
    100% { background: rgba(248, 81, 73, 0.08); }
}

.status-bar {
    display: flex;
    justify-content: space-between;
    padding: 8px 12px;
    background: var(--surface);
    border: 1px solid var(--border);
    border-top: none;
    border-radius: 0 0 8px 8px;
    font-size: 12px;
    color: var(--text-muted);
}

.status-warning { color: var(--accent-warn); }
.status-success { color: var(--green); }
.status-neutral { color: var(--text-muted); }

/* Gradio overrides */
.gradio-container { background: var(--bg) !important; }
footer { display: none !important; }

/* WebGPU loading bar */
.loading-bar {
    height: 3px;
    background: var(--border);
    border-radius: 2px;
    overflow: hidden;
    margin: 8px 0;
}
.loading-bar-fill {
    height: 100%;
    background: var(--accent);
    transition: width 0.3s ease;
}
```

---

## 4. Gradio App — app.py

This is the HF Space entry point. Gradio acts as the shell; the real UI lives in the custom HTML component injected via `gr.HTML`.

```python
# app.py
import gradio as gr
import httpx
import json
import os
import asyncio
from pathlib import Path

MODAL_VERIFIER_URL = os.environ.get("MODAL_VERIFIER_URL")  # set as HF Space secret
MODAL_SANDBOX_URL = os.environ.get("MODAL_SANDBOX_URL")    # set as HF Space secret

LANGUAGES = ["Python", "JavaScript", "TypeScript", "Rust", "Go", "C++"]

def load_static(filename):
    return Path(f"static/{filename}").read_text()

custom_html = f"""
<!DOCTYPE html>
<html>
<head>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link href="https://fonts.googleapis.com/css2?family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">
<style>{load_static('style.css')}</style>
</head>
<body>
<div id="split-brain-root">
    <div class="webgpu-notice" id="webgpu-warning" style="display:none; color:#f85149; padding:8px; border:1px solid #f85149; border-radius:6px; margin-bottom:12px;">
        ⚠️ WebGPU not detected. Please use Chrome 113+ on desktop for local inference.
    </div>
    <div id="load-section">
        <button id="load-btn" onclick="initEngine()">⚡ Load 1.5B Model (WebGPU)</button>
        <div class="loading-bar"><div class="loading-bar-fill" id="load-progress" style="width:0%"></div></div>
        <span id="load-status" style="font-size:12px; color:#8b949e;"></span>
    </div>
    <div id="stream-display" class="code-stream" style="margin-top:12px;">Waiting for model load...</div>
    <div class="status-bar">
        <span id="status-text">Idle</span>
        <span id="token-count"></span>
        <span id="verifier-status"></span>
    </div>
</div>
<script type="module">
{load_static('engine.js')}
{load_static('ui.js')}

// Check WebGPU on load
if (!isWebGPUSupported()) {{
    document.getElementById('webgpu-warning').style.display = 'block';
    document.getElementById('load-btn').disabled = true;
}}

window.initEngine = async function() {{
    document.getElementById('load-btn').disabled = true;
    document.getElementById('load-status').textContent = 'Loading model weights...';
    await loadModel((progress) => {{
        if (progress.progress) {{
            document.getElementById('load-progress').style.width = progress.progress + '%';
            document.getElementById('load-status').textContent = `${{progress.file || 'Loading'}} — ${{Math.round(progress.progress)}}%`;
        }}
    }});
    document.getElementById('load-status').textContent = '✅ Model ready — WebGPU active';
    document.getElementById('load-section').style.opacity = '0.5';
}};

// Gradio will call this via the hidden trigger
window.runLocalGeneration = async function(prompt, language) {{
    reset();
    setStatus('Generating locally (WebGPU)...', 'neutral');

    let tokenCount = 0;
    const startTime = Date.now();

    const fullCode = await generateCode(prompt, language,
        (token) => {{
            appendToken(token);
            tokenCount++;
            const elapsed = (Date.now() - startTime) / 1000;
            const tps = Math.round(tokenCount / elapsed);
            document.getElementById('token-count').textContent = `${{tps}} tok/s`;
        }},
        (code) => {{
            setStatus('Local generation complete. Verifying...', 'neutral');
        }}
    );

    // Send to Gradio backend for verification
    // Use the hidden Gradio state to trigger the verify function
    document.getElementById('draft-output-hidden').value = fullCode;
    document.getElementById('trigger-verify-btn').click();
}};

window.applyVerification = function(verdictJson) {{
    const verdict = JSON.parse(verdictJson);
    if (verdict.verdict === 'PASS') {{
        setVerifierStatus('PASS');
        setStatus('✅ Verified clean', 'success');
    }} else {{
        rollbackAndReplace(verdict.corrected_code, verdict.reason);
    }}
}};
</script>
</body>
</html>
"""

async def verify_with_modal(prompt: str, draft_code: str, language: str) -> str:
    """Call Modal verifier endpoint and return JSON string."""
    if not MODAL_VERIFIER_URL:
        return json.dumps({"verdict": "PASS"})
    async with httpx.AsyncClient(timeout=60.0) as client:
        resp = await client.post(
            MODAL_VERIFIER_URL,
            json={"prompt": prompt, "draft_code": draft_code, "language": language},
        )
        resp.raise_for_status()
        return resp.text

async def execute_in_sandbox(code: str) -> dict:
    """Call Modal sandbox and return execution result."""
    if not MODAL_SANDBOX_URL:
        return {"stdout": "", "stderr": "Sandbox not configured", "returncode": -1}
    async with httpx.AsyncClient(timeout=30.0) as client:
        resp = await client.post(MODAL_SANDBOX_URL, json={"code": code})
        return resp.json()

with gr.Blocks(
    title="Split-Brain Co-Pilot",
    css="footer {display:none}",
    theme=gr.themes.Base(
        primary_hue="blue",
        neutral_hue="slate",
    ),
) as demo:
    gr.HTML("<h1 style='text-align:center; color:#58a6ff;'>⚡ Split-Brain Co-Pilot</h1>")
    gr.HTML("<p style='text-align:center; color:#8b949e;'>1.5B model runs in your browser (WebGPU). 14B model on Modal verifies and corrects.</p>")

    with gr.Row():
        with gr.Column(scale=2):
            prompt_input = gr.Textbox(
                label="What do you want to build?",
                placeholder="e.g. A function that parses a CSV and returns the top 5 rows by a given column",
                lines=3,
            )
            language_select = gr.Dropdown(
                choices=LANGUAGES,
                value="Python",
                label="Language",
            )
            generate_btn = gr.Button("⚡ Generate (WebGPU → Verify)", variant="primary")

        with gr.Column(scale=3):
            # Custom HTML component for streaming display
            gr.HTML(custom_html)

            # Hidden elements for JS ↔ Gradio bridge
            draft_hidden = gr.Textbox(visible=False, elem_id="draft-output-hidden")
            verify_trigger = gr.Button("verify", visible=False, elem_id="trigger-verify-btn")
            verdict_output = gr.Textbox(visible=False, label="verdict")

    with gr.Row():
        sandbox_output = gr.Code(label="Sandbox Execution Output", language="python", visible=False)

    # Gradio event: user clicks Generate → JS takes over for local inference
    generate_btn.click(
        fn=None,
        inputs=[prompt_input, language_select],
        outputs=[],
        js="(prompt, lang) => { window.runLocalGeneration(prompt, lang); return []; }",
    )

    # Gradio event: JS triggers verify after local generation completes
    async def run_verification(prompt, draft_code, language):
        verdict_json = await verify_with_modal(prompt, draft_code, language)
        return verdict_json

    verify_trigger.click(
        fn=run_verification,
        inputs=[prompt_input, draft_hidden, language_select],
        outputs=[verdict_output],
    )

    # Apply verdict back to JS
    verdict_output.change(
        fn=None,
        inputs=[verdict_output],
        outputs=[],
        js="(verdict) => { window.applyVerification(verdict); return []; }",
    )

if __name__ == "__main__":
    demo.launch()
```

---

## 5. Modal Web Endpoint Wrapper

The Modal functions need to be exposed as HTTP endpoints that `app.py` can call via httpx. Add this to `verifier.py`:

```python
from fastapi import FastAPI
from pydantic import BaseModel

web_app = FastAPI()

class VerifyRequest(BaseModel):
    prompt: str
    draft_code: str
    language: str = "python"

@app.function(
    image=llama_image,
    gpu=modal.gpu.A10G(),
    volumes={MODEL_DIR: model_volume},
    container_idle_timeout=300,
)
@modal.asgi_app()
def verifier_endpoint():
    verifier = Verifier()

    @web_app.post("/verify")
    async def verify(req: VerifyRequest):
        result = verifier.verify.remote(req.prompt, req.draft_code, req.language)
        return result

    return web_app
```

After deploying, Modal gives you a URL like `https://your-username--split-brain-verifier-verifier-endpoint.modal.run`. Set this as the HF Space secret `MODAL_VERIFIER_URL`.

---

## 6. HF Space Configuration

### 6.1 README.md (Space card)

```yaml
---
title: Split-Brain Co-Pilot
emoji: ⚡
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
  - code-generation
  - webgpu
  - speculative-decoding
  - llama.cpp
  - local-first
---
```

### 6.2 requirements.txt

```
gradio==5.30.0
httpx==0.27.0
modal==0.73.0
huggingface-hub==0.23.0
```

### 6.3 HF Space secrets

Set these in the Space Settings → Repository secrets:

| Secret name | Value |
|---|---|
| `MODAL_VERIFIER_URL` | Your Modal verifier endpoint URL |
| `MODAL_SANDBOX_URL` | Your Modal sandbox endpoint URL |
| `MODAL_TOKEN_ID` | From `modal token show` |
| `MODAL_TOKEN_SECRET` | From `modal token show` |

---

## 7. Cold Start Mitigation

Modal A10G containers take 10–40 seconds to cold start. Handle this gracefully:

In `verify_with_modal`, add a keep-warm ping. Add this to `verifier.py`:

```python
@app.function(schedule=modal.Cron("*/5 * * * *"))
def keep_warm():
    """Ping the verifier every 5 minutes to avoid cold starts during the demo window."""
    Verifier().verify.remote("test", "print('hello')", "python")
```

Deploy this separately: `modal deploy modal_backend/verifier.py`

In the UI, show "Verifier warming up..." in the status bar while the first request is in flight and display a spinner. Do not let the UI appear broken during cold start.

---

## 8. Demo Video Script

The demo video is a submission requirement. Plan it around these beats:

- Open Chrome, show the app. Explain the split-brain concept in one sentence.
- Click "Load 1.5B Model" — show the WebGPU loading progress bar.
- Type a non-trivial prompt: "Write a Python function that finds all prime numbers up to n using a segmented sieve, handling edge cases."
- Hit Generate — show tokens streaming at 80–120 tok/s with the token counter live.
- Show the "Verifying..." status kick in immediately after local generation completes.
- If the verifier returns FIX or REWRITE: show the red flash rollback animation and the corrected code typing in.
- Show the sandbox execution output (stdout) confirming the corrected code runs.
- End on the split status bar: "Local: WebGPU · Cloud: Modal A10G · Verdict: ✅ Verified"

Keep the video under 3 minutes. Record with OBS or Loom. No cuts during the generation — the live stream is the point.

---

## 9. Bonus Badge Checklist

| Badge | How you earn it | Status |
|---|---|---|
| **Off the Grid** | 1.5B runs 100% in browser, no cloud API for inference | ✅ Automatic |
| **Llama Champion** | 14B served via llama.cpp on Modal | ✅ Automatic |
| **Off-Brand** | Custom dark theme, rollback animation, token counter, status bar | ✅ Build it |
| **Field Notes** | Write a blog post on HF or Dev.to explaining the speculative split-brain architecture | ✅ Write it post-build |

---

## 10. Submission Checklist

Before June 15 deadline:

- [ ] Modal verifier deployed and endpoint URL confirmed working
- [ ] HF Space live and publicly accessible under `build-small-hackathon` org
- [ ] WebGPU model loads in Chrome without errors
- [ ] Token streaming visible in UI
- [ ] Rollback animation triggers on at least one FIX/REWRITE verdict
- [ ] Sandbox execution output shown in demo
- [ ] Demo video recorded and uploaded (YouTube unlisted or HF)
- [ ] Social media post published (Twitter/X or LinkedIn) with Space link and demo video
- [ ] README.md Space card complete with description, tags, and video embed
- [ ] Field Notes blog post published and linked in README

---

## 11. Known Gotchas

**WebGPU VRAM:** The 1.5B Q4 ONNX model needs ~1GB VRAM. On machines with integrated graphics sharing system RAM, this works but may be slow. Document the Chrome + dedicated GPU requirement.

**CORS:** Modal's ASGI endpoints allow cross-origin by default, but if you hit CORS errors in the browser JS, add `fastapi.middleware.cors.CORSMiddleware` to the web_app with `allow_origins=["*"]`.

**transformers.js version:** Pin to `3.5.x`. Breaking changes in 3.x are frequent. The CDN import in `engine.js` uses the pinned version — don't use `@latest`.

**Gradio JS bridge:** The `gr.Button(visible=False)` trigger pattern is the cleanest way to fire a Python function from browser JS in Gradio 5.x without websocket hacks. Do not use `gr.Request` for this — it won't work from inside a custom HTML block.

**Modal Volume first deploy:** The volume download must complete before the verifier function can load the model. Run `download_model` manually once and confirm with `modal volume ls qwen-14b-volume /models` before deploying the endpoint.

**HF Space cold start:** HF Spaces themselves also cold start. If the Space hasn't been visited recently, Gradio takes 20–30 seconds to boot. Add a loading spinner at the Gradio level using `gr.HTML` with a brief "Space initializing..." message that auto-hides once the page is interactive.