Spaces:

build-small-hackathon
/

split-brain-copilot

Running

App Files Files Community

split-brain-copilot / AGENTS.md

blessingmwiti

Build split-brain copilot scaffold

053ee0d 7 days ago

preview code

raw

history blame contribute delete

26.2 kB

	# Split-Brain Speculative Co-Pilot
	### Build Small Hackathon — Complete Build Instructions

	> Concept: A 1B model runs entirely in the user's browser via WebGPU + transformers.js, streaming code instantly. A 14B model on Modal verifies the draft in the background. When the verifier catches a bug, the UI rolls back the local generation and replaces it with the corrected cloud block — live, visually.
	>
	> Models: `Qwen2.5-Coder-1.5B` (browser, WebGPU) + `Qwen2.5-Coder-14B-Instruct` (Modal, GGUF via llama.cpp) — combined 15.5B, well under the 32B cap.
	>
	> Bonus badges targeted: Off the Grid · Llama Champion · Off-Brand · Field Notes

	---

	## 0. Prerequisites

	- Python 3.11+
	- Node.js 18+ (for local frontend testing)
	- Modal account with `modal` CLI installed and authenticated (`modal token new`)
	- Hugging Face account, joined the `build-small-hackathon` org, HF token with write access
	- `huggingface-cli` installed and logged in (`huggingface-cli login`)
	- Chrome 113+ (WebGPU required — Firefox and Safari are out, document this clearly)
	- Git

	---

	## 1. Repository Structure

	Set up the project layout before writing any code.

	```
	split-brain-copilot/
	├── app.py # Gradio app entry point (HF Space root)
	├── modal_backend/
	│ ├── __init__.py
	│ ├── verifier.py # Modal app: 14B inference endpoint
	│ └── sandbox.py # Modal app: code execution sandbox
	├── static/
	│ ├── engine.js # transformers.js WebGPU inference engine
	│ ├── ui.js # Stream rendering, rollback animation, diff logic
	│ └── style.css # Custom UI (required for Off-Brand badge)
	├── requirements.txt
	└── README.md # HF Space card + demo video embed
	```

	Initialize git and create a HF Space repo (check whether we have huggingface cli installed and token set or logged in):

	```bash
	git init
	huggingface-cli repo create split-brain-copilot --type space --space-sdk gradio
	git remote add origin https://huggingface.co/spaces/YOUR_HF_USERNAME/split-brain-copilot
	```

	---

	## 2. Modal Backend — 14B Verifier Endpoint

	### 2.1 Download the GGUF model to a Modal Volume

	The 14B model is too large to bake into the image. Use a Modal Volume for persistent storage.

	```python
	# modal_backend/verifier.py
	import modal

	app = modal.App("split-brain-verifier")

	# Persistent volume — survives cold starts
	model_volume = modal.Volume.from_name("qwen-14b-volume", create_if_missing=True)

	MODEL_DIR = "/models"
	MODEL_FILENAME = "qwen2.5-coder-14b-instruct-q4_k_m.gguf"
	# Source: bartowski/Qwen2.5-Coder-14B-Instruct-GGUF on HuggingFace
	MODEL_REPO = "bartowski/Qwen2.5-Coder-14B-Instruct-GGUF"
	```

	Create a one-time download function:

	```python
	@app.function(
	volumes={MODEL_DIR: model_volume},
	timeout=3600,
	secrets=[modal.Secret.from_name("huggingface-secret")],
	)
	def download_model():
	from huggingface_hub import hf_hub_download
	import os
	hf_hub_download(
	repo_id=MODEL_REPO,
	filename=MODEL_FILENAME,
	local_dir=MODEL_DIR,
	)
	model_volume.commit()
	print(f"Downloaded to {MODEL_DIR}/{MODEL_FILENAME}")
	```

	Run this once: `modal run modal_backend/verifier.py::download_model`

	### 2.2 Build the llama.cpp image

	```python
	llama_image = (
	modal.Image.debian_slim(python_version="3.11")
	.apt_install("build-essential", "cmake", "git", "libgomp1")
	.run_commands(
	"git clone https://github.com/ggerganov/llama.cpp /llama.cpp",
	"cd /llama.cpp && cmake -B build -DLLAMA_CURL=OFF && cmake --build build --config Release -j$(nproc)",
	"cd /llama.cpp && pip install -e .",
	)
	.pip_install("llama-cpp-python==0.3.4", "fastapi", "uvicorn")
	)
	```

	### 2.3 Verifier inference function

	The verifier receives the speculated code draft and the original prompt, and returns a verdict: `PASS`, `FIX`, or `REWRITE` with corrected code.

	```python
	@app.cls(
	image=llama_image,
	gpu=modal.gpu.A10G(),
	volumes={MODEL_DIR: model_volume},
	container_idle_timeout=300,
	allow_concurrent_inputs=10,
	)
	class Verifier:
	@modal.enter()
	def load_model(self):
	from llama_cpp import Llama
	self.llm = Llama(
	model_path=f"{MODEL_DIR}/{MODEL_FILENAME}",
	n_gpu_layers=-1, # all layers on GPU
	n_ctx=8192,
	n_batch=512,
	verbose=False,
	)

	@modal.method()
	def verify(self, prompt: str, draft_code: str, language: str = "python") -> dict:
	system = f"""You are a code verifier. A smaller model drafted the following {language} code.
	Your job:
	1. Check for bugs, logic errors, type errors, off-by-one errors, and security issues.
	2. If the code is correct, respond with exactly: {{"verdict": "PASS"}}
	3. If fixable, respond with: {{"verdict": "FIX", "corrected_code": "<fixed code here>", "reason": "<one line>"}}
	4. If fundamentally wrong, respond with: {{"verdict": "REWRITE", "corrected_code": "<rewritten code>", "reason": "<one line>"}}
	Respond ONLY with valid JSON. No markdown, no explanation outside the JSON."""

	user = f"Original prompt:\n{prompt}\n\nDrafted code:\n```{language}\n{draft_code}\n```"

	response = self.llm.create_chat_completion(
	messages=[
	{"role": "system", "content": system},
	{"role": "user", "content": user},
	],
	max_tokens=2048,
	temperature=0.1,
	)
	import json
	raw = response["choices"][0]["message"]["content"].strip()
	try:
	return json.loads(raw)
	except json.JSONDecodeError:
	# Fallback: treat as PASS if we can't parse
	return {"verdict": "PASS"}
	```

	### 2.4 Modal Sandbox — code execution (optional but impressive)

	Sandboxed execution confirms the corrected code actually runs. This is what earns you extra credibility in the demo.

	```python
	# modal_backend/sandbox.py
	import modal

	app = modal.App("split-brain-sandbox")

	@app.function(timeout=30)
	def execute_python(code: str) -> dict:
	"""Run untrusted code in a Modal sandbox and return stdout/stderr."""
	sandbox = modal.Sandbox.create(
	"python3", "-c", code,
	image=modal.Image.debian_slim().pip_install("numpy"),
	timeout=10,
	cpu=0.5,
	)
	sandbox.wait()
	return {
	"stdout": sandbox.stdout.read(),
	"stderr": sandbox.stderr.read(),
	"returncode": sandbox.returncode,
	}
	```

	### 2.5 Deploy the Modal backend

	```bash
	modal deploy modal_backend/verifier.py
	modal deploy modal_backend/sandbox.py
	```

	Note the endpoint URLs printed after deploy. You'll need them in `app.py`.

	Store your Modal token and HF token as Modal secrets:
	```bash
	modal secret create huggingface-secret HF_TOKEN=hf_xxx
	```

	---

	## 3. Browser Engine — transformers.js + WebGPU

	### 3.1 Model choice for the browser

	Use `Qwen2.5-Coder-1.5B-Instruct` in ONNX/WebGPU format. Xenova and onnx-community maintain these on HF Hub. Target:
	`onnx-community/Qwen2.5-Coder-1.5B-Instruct` with `dtype: "q4"` for fast WebGPU loading (~800MB, fits comfortably in browser VRAM on a modern GPU).

	### 3.2 engine.js — WebGPU inference

	```javascript
	// static/engine.js
	import { pipeline, TextStreamer } from "https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.5.0/dist/transformers.min.js";

	const MODEL_ID = "onnx-community/Qwen2.5-Coder-1.5B-Instruct";
	let generator = null;
	let isLoaded = false;

	export async function loadModel(onProgress) {
	if (isLoaded) return;
	generator = await pipeline("text-generation", MODEL_ID, {
	dtype: "q4",
	device: "webgpu",
	progress_callback: onProgress,
	});
	isLoaded = true;
	}

	export async function generateCode(prompt, language, onToken, onComplete) {
	if (!generator) throw new Error("Model not loaded");

	const messages = [
	{
	role: "system",
	content: `You are an expert ${language} programmer. Write clean, correct, production-ready code. Output ONLY the code block, no explanation.`
	},
	{ role: "user", content: prompt }
	];

	const streamer = new TextStreamer(generator.tokenizer, {
	skip_prompt: true,
	callback_function: (token) => {
	onToken(token);
	},
	});

	const result = await generator(messages, {
	max_new_tokens: 1024,
	temperature: 0.2,
	do_sample: true,
	streamer,
	});

	const fullCode = result[0].generated_text.at(-1).content;
	onComplete(fullCode);
	return fullCode;
	}

	export function isWebGPUSupported() {
	return !!navigator.gpu;
	}
	```

	### 3.3 ui.js — stream rendering + rollback animation

	```javascript
	// static/ui.js

	let currentTokens = [];
	let streamBuffer = "";

	export function initEditor(containerId) {
	// Attach to the Gradio custom HTML component
	const container = document.getElementById(containerId);
	container.innerHTML = `
	<div id="stream-display" class="code-stream"></div>
	<div id="status-bar" class="status-bar">
	<span id="status-text">Ready</span>
	<span id="token-count">0 tok/s</span>
	<span id="verifier-status"></span>
	</div>
	`;
	}

	export function appendToken(token) {
	streamBuffer += token;
	currentTokens.push(token);
	const display = document.getElementById("stream-display");
	if (display) display.textContent = streamBuffer;
	}

	export function setStatus(text, type = "neutral") {
	const el = document.getElementById("status-text");
	if (el) {
	el.textContent = text;
	el.className = `status-${type}`;
	}
	}

	export function setVerifierStatus(verdict) {
	const el = document.getElementById("verifier-status");
	if (!el) return;
	const icons = { PASS: "✅ Verified", FIX: "🔧 Fixed", REWRITE: "🔄 Rewritten", CHECKING: "🔍 Verifying..." };
	el.textContent = icons[verdict] \|\| "";
	}

	export async function rollbackAndReplace(correctedCode, reason) {
	const display = document.getElementById("stream-display");
	if (!display) return;

	// Flash red to signal rollback
	display.classList.add("rollback-flash");
	setVerifierStatus("FIX");
	setStatus(`Verifier corrected: ${reason}`, "warning");

	await sleep(400);
	display.classList.remove("rollback-flash");

	// Type in corrected code character by character
	display.textContent = "";
	streamBuffer = correctedCode;
	currentTokens = [];

	for (let i = 0; i < correctedCode.length; i++) {
	display.textContent += correctedCode[i];
	if (i % 5 === 0) await sleep(8); // smooth typewriter
	}

	setVerifierStatus("PASS");
	setStatus("Ready", "neutral");
	}

	export function getCurrentCode() {
	return streamBuffer;
	}

	export function reset() {
	streamBuffer = "";
	currentTokens = [];
	const display = document.getElementById("stream-display");
	if (display) display.textContent = "";
	}

	function sleep(ms) {
	return new Promise(resolve => setTimeout(resolve, ms));
	}
	```

	### 3.4 style.css — custom UI (Off-Brand badge)

	```css
	/* static/style.css */
	:root {
	--bg: #0d1117;
	--surface: #161b22;
	--border: #30363d;
	--accent: #58a6ff;
	--accent-warn: #f0883e;
	--text: #e6edf3;
	--text-muted: #8b949e;
	--green: #3fb950;
	--red: #f85149;
	}

	body { background: var(--bg); color: var(--text); font-family: 'JetBrains Mono', monospace; }

	.code-stream {
	background: var(--surface);
	border: 1px solid var(--border);
	border-radius: 8px;
	padding: 16px;
	min-height: 300px;
	font-family: 'JetBrains Mono', monospace;
	font-size: 13px;
	line-height: 1.6;
	white-space: pre-wrap;
	overflow-y: auto;
	transition: border-color 0.2s;
	}

	.rollback-flash {
	border-color: var(--red) !important;
	background: rgba(248, 81, 73, 0.08) !important;
	animation: flash 0.4s ease;
	}

	@keyframes flash {
	0% { background: rgba(248, 81, 73, 0.25); }
	100% { background: rgba(248, 81, 73, 0.08); }
	}

	.status-bar {
	display: flex;
	justify-content: space-between;
	padding: 8px 12px;
	background: var(--surface);
	border: 1px solid var(--border);
	border-top: none;
	border-radius: 0 0 8px 8px;
	font-size: 12px;
	color: var(--text-muted);
	}

	.status-warning { color: var(--accent-warn); }
	.status-success { color: var(--green); }
	.status-neutral { color: var(--text-muted); }

	/* Gradio overrides */
	.gradio-container { background: var(--bg) !important; }
	footer { display: none !important; }

	/* WebGPU loading bar */
	.loading-bar {
	height: 3px;
	background: var(--border);
	border-radius: 2px;
	overflow: hidden;
	margin: 8px 0;
	}
	.loading-bar-fill {
	height: 100%;
	background: var(--accent);
	transition: width 0.3s ease;
	}
	```

	---

	## 4. Gradio App — app.py

	This is the HF Space entry point. Gradio acts as the shell; the real UI lives in the custom HTML component injected via `gr.HTML`.

	```python
	# app.py
	import gradio as gr
	import httpx
	import json
	import os
	import asyncio
	from pathlib import Path

	MODAL_VERIFIER_URL = os.environ.get("MODAL_VERIFIER_URL") # set as HF Space secret
	MODAL_SANDBOX_URL = os.environ.get("MODAL_SANDBOX_URL") # set as HF Space secret

	LANGUAGES = ["Python", "JavaScript", "TypeScript", "Rust", "Go", "C++"]

	def load_static(filename):
	return Path(f"static/{filename}").read_text()

	custom_html = f"""
	<!DOCTYPE html>
	<html>
	<head>
	<link rel="preconnect" href="https://fonts.googleapis.com">
	<link href="https://fonts.googleapis.com/css2?family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">
	<style>{load_static('style.css')}</style>
	</head>
	<body>
	<div id="split-brain-root">
	<div class="webgpu-notice" id="webgpu-warning" style="display:none; color:#f85149; padding:8px; border:1px solid #f85149; border-radius:6px; margin-bottom:12px;">
	⚠️ WebGPU not detected. Please use Chrome 113+ on desktop for local inference.
	</div>
	<div id="load-section">
	<button id="load-btn" onclick="initEngine()">⚡ Load 1.5B Model (WebGPU)</button>
	<div class="loading-bar"><div class="loading-bar-fill" id="load-progress" style="width:0%"></div></div>
	<span id="load-status" style="font-size:12px; color:#8b949e;"></span>
	</div>
	<div id="stream-display" class="code-stream" style="margin-top:12px;">Waiting for model load...</div>
	<div class="status-bar">
	<span id="status-text">Idle</span>
	<span id="token-count"></span>
	<span id="verifier-status"></span>
	</div>
	</div>
	<script type="module">
	{load_static('engine.js')}
	{load_static('ui.js')}

	// Check WebGPU on load
	if (!isWebGPUSupported()) {{
	document.getElementById('webgpu-warning').style.display = 'block';
	document.getElementById('load-btn').disabled = true;
	}}

	window.initEngine = async function() {{
	document.getElementById('load-btn').disabled = true;
	document.getElementById('load-status').textContent = 'Loading model weights...';
	await loadModel((progress) => {{
	if (progress.progress) {{
	document.getElementById('load-progress').style.width = progress.progress + '%';
	document.getElementById('load-status').textContent = `${{progress.file \|\| 'Loading'}} — ${{Math.round(progress.progress)}}%`;
	}}
	}});
	document.getElementById('load-status').textContent = '✅ Model ready — WebGPU active';
	document.getElementById('load-section').style.opacity = '0.5';
	}};

	// Gradio will call this via the hidden trigger
	window.runLocalGeneration = async function(prompt, language) {{
	reset();
	setStatus('Generating locally (WebGPU)...', 'neutral');

	let tokenCount = 0;
	const startTime = Date.now();

	const fullCode = await generateCode(prompt, language,
	(token) => {{
	appendToken(token);
	tokenCount++;
	const elapsed = (Date.now() - startTime) / 1000;
	const tps = Math.round(tokenCount / elapsed);
	document.getElementById('token-count').textContent = `${{tps}} tok/s`;
	}},
	(code) => {{
	setStatus('Local generation complete. Verifying...', 'neutral');
	}}
	);

	// Send to Gradio backend for verification
	// Use the hidden Gradio state to trigger the verify function
	document.getElementById('draft-output-hidden').value = fullCode;
	document.getElementById('trigger-verify-btn').click();
	}};

	window.applyVerification = function(verdictJson) {{
	const verdict = JSON.parse(verdictJson);
	if (verdict.verdict === 'PASS') {{
	setVerifierStatus('PASS');
	setStatus('✅ Verified clean', 'success');
	}} else {{
	rollbackAndReplace(verdict.corrected_code, verdict.reason);
	}}
	}};
	</script>
	</body>
	</html>
	"""

	async def verify_with_modal(prompt: str, draft_code: str, language: str) -> str:
	"""Call Modal verifier endpoint and return JSON string."""
	if not MODAL_VERIFIER_URL:
	return json.dumps({"verdict": "PASS"})
	async with httpx.AsyncClient(timeout=60.0) as client:
	resp = await client.post(
	MODAL_VERIFIER_URL,
	json={"prompt": prompt, "draft_code": draft_code, "language": language},
	)
	resp.raise_for_status()
	return resp.text

	async def execute_in_sandbox(code: str) -> dict:
	"""Call Modal sandbox and return execution result."""
	if not MODAL_SANDBOX_URL:
	return {"stdout": "", "stderr": "Sandbox not configured", "returncode": -1}
	async with httpx.AsyncClient(timeout=30.0) as client:
	resp = await client.post(MODAL_SANDBOX_URL, json={"code": code})
	return resp.json()

	with gr.Blocks(
	title="Split-Brain Co-Pilot",
	css="footer {display:none}",
	theme=gr.themes.Base(
	primary_hue="blue",
	neutral_hue="slate",
	),
	) as demo:
	gr.HTML("<h1 style='text-align:center; color:#58a6ff;'>⚡ Split-Brain Co-Pilot</h1>")
	gr.HTML("<p style='text-align:center; color:#8b949e;'>1.5B model runs in your browser (WebGPU). 14B model on Modal verifies and corrects.</p>")

	with gr.Row():
	with gr.Column(scale=2):
	prompt_input = gr.Textbox(
	label="What do you want to build?",
	placeholder="e.g. A function that parses a CSV and returns the top 5 rows by a given column",
	lines=3,
	)
	language_select = gr.Dropdown(
	choices=LANGUAGES,
	value="Python",
	label="Language",
	)
	generate_btn = gr.Button("⚡ Generate (WebGPU → Verify)", variant="primary")

	with gr.Column(scale=3):
	# Custom HTML component for streaming display
	gr.HTML(custom_html)

	# Hidden elements for JS ↔ Gradio bridge
	draft_hidden = gr.Textbox(visible=False, elem_id="draft-output-hidden")
	verify_trigger = gr.Button("verify", visible=False, elem_id="trigger-verify-btn")
	verdict_output = gr.Textbox(visible=False, label="verdict")

	with gr.Row():
	sandbox_output = gr.Code(label="Sandbox Execution Output", language="python", visible=False)

	# Gradio event: user clicks Generate → JS takes over for local inference
	generate_btn.click(
	fn=None,
	inputs=[prompt_input, language_select],
	outputs=[],
	js="(prompt, lang) => { window.runLocalGeneration(prompt, lang); return []; }",
	)

	# Gradio event: JS triggers verify after local generation completes
	async def run_verification(prompt, draft_code, language):
	verdict_json = await verify_with_modal(prompt, draft_code, language)
	return verdict_json

	verify_trigger.click(
	fn=run_verification,
	inputs=[prompt_input, draft_hidden, language_select],
	outputs=[verdict_output],
	)

	# Apply verdict back to JS
	verdict_output.change(
	fn=None,
	inputs=[verdict_output],
	outputs=[],
	js="(verdict) => { window.applyVerification(verdict); return []; }",
	)

	if __name__ == "__main__":
	demo.launch()
	```

	---

	## 5. Modal Web Endpoint Wrapper

	The Modal functions need to be exposed as HTTP endpoints that `app.py` can call via httpx. Add this to `verifier.py`:

	```python
	from fastapi import FastAPI
	from pydantic import BaseModel

	web_app = FastAPI()

	class VerifyRequest(BaseModel):
	prompt: str
	draft_code: str
	language: str = "python"

	@app.function(
	image=llama_image,
	gpu=modal.gpu.A10G(),
	volumes={MODEL_DIR: model_volume},
	container_idle_timeout=300,
	)
	@modal.asgi_app()
	def verifier_endpoint():
	verifier = Verifier()

	@web_app.post("/verify")
	async def verify(req: VerifyRequest):
	result = verifier.verify.remote(req.prompt, req.draft_code, req.language)
	return result

	return web_app
	```

	After deploying, Modal gives you a URL like `https://your-username--split-brain-verifier-verifier-endpoint.modal.run`. Set this as the HF Space secret `MODAL_VERIFIER_URL`.

	---

	## 6. HF Space Configuration

	### 6.1 README.md (Space card)

	```yaml
	---
	title: Split-Brain Co-Pilot
	emoji: ⚡
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.30.0
	app_file: app.py
	pinned: true
	license: apache-2.0
	tags:
	- code-generation
	- webgpu
	- speculative-decoding
	- llama.cpp
	- local-first
	---
	```

	### 6.2 requirements.txt

	```
	gradio==5.30.0
	httpx==0.27.0
	modal==0.73.0
	huggingface-hub==0.23.0
	```

	### 6.3 HF Space secrets

	Set these in the Space Settings → Repository secrets:

	\| Secret name \| Value \|
	\|---\|---\|
	\| `MODAL_VERIFIER_URL` \| Your Modal verifier endpoint URL \|
	\| `MODAL_SANDBOX_URL` \| Your Modal sandbox endpoint URL \|
	\| `MODAL_TOKEN_ID` \| From `modal token show` \|
	\| `MODAL_TOKEN_SECRET` \| From `modal token show` \|

	---

	## 7. Cold Start Mitigation

	Modal A10G containers take 10–40 seconds to cold start. Handle this gracefully:

	In `verify_with_modal`, add a keep-warm ping. Add this to `verifier.py`:

	```python
	@app.function(schedule=modal.Cron("/5 * * *"))
	def keep_warm():
	"""Ping the verifier every 5 minutes to avoid cold starts during the demo window."""
	Verifier().verify.remote("test", "print('hello')", "python")
	```

	Deploy this separately: `modal deploy modal_backend/verifier.py`

	In the UI, show "Verifier warming up..." in the status bar while the first request is in flight and display a spinner. Do not let the UI appear broken during cold start.

	---

	## 8. Demo Video Script

	The demo video is a submission requirement. Plan it around these beats:

	- Open Chrome, show the app. Explain the split-brain concept in one sentence.
	- Click "Load 1.5B Model" — show the WebGPU loading progress bar.
	- Type a non-trivial prompt: "Write a Python function that finds all prime numbers up to n using a segmented sieve, handling edge cases."
	- Hit Generate — show tokens streaming at 80–120 tok/s with the token counter live.
	- Show the "Verifying..." status kick in immediately after local generation completes.
	- If the verifier returns FIX or REWRITE: show the red flash rollback animation and the corrected code typing in.
	- Show the sandbox execution output (stdout) confirming the corrected code runs.
	- End on the split status bar: "Local: WebGPU · Cloud: Modal A10G · Verdict: ✅ Verified"

	Keep the video under 3 minutes. Record with OBS or Loom. No cuts during the generation — the live stream is the point.

	---

	## 9. Bonus Badge Checklist

	\| Badge \| How you earn it \| Status \|
	\|---\|---\|---\|
	\| Off the Grid \| 1.5B runs 100% in browser, no cloud API for inference \| ✅ Automatic \|
	\| Llama Champion \| 14B served via llama.cpp on Modal \| ✅ Automatic \|
	\| Off-Brand \| Custom dark theme, rollback animation, token counter, status bar \| ✅ Build it \|
	\| Field Notes \| Write a blog post on HF or Dev.to explaining the speculative split-brain architecture \| ✅ Write it post-build \|

	---

	## 10. Submission Checklist

	Before June 15 deadline:

	- [ ] Modal verifier deployed and endpoint URL confirmed working
	- [ ] HF Space live and publicly accessible under `build-small-hackathon` org
	- [ ] WebGPU model loads in Chrome without errors
	- [ ] Token streaming visible in UI
	- [ ] Rollback animation triggers on at least one FIX/REWRITE verdict
	- [ ] Sandbox execution output shown in demo
	- [ ] Demo video recorded and uploaded (YouTube unlisted or HF)
	- [ ] Social media post published (Twitter/X or LinkedIn) with Space link and demo video
	- [ ] README.md Space card complete with description, tags, and video embed
	- [ ] Field Notes blog post published and linked in README

	---

	## 11. Known Gotchas

	WebGPU VRAM: The 1.5B Q4 ONNX model needs ~1GB VRAM. On machines with integrated graphics sharing system RAM, this works but may be slow. Document the Chrome + dedicated GPU requirement.

	CORS: Modal's ASGI endpoints allow cross-origin by default, but if you hit CORS errors in the browser JS, add `fastapi.middleware.cors.CORSMiddleware` to the web_app with `allow_origins=["*"]`.

	transformers.js version: Pin to `3.5.x`. Breaking changes in 3.x are frequent. The CDN import in `engine.js` uses the pinned version — don't use `@latest`.

	Gradio JS bridge: The `gr.Button(visible=False)` trigger pattern is the cleanest way to fire a Python function from browser JS in Gradio 5.x without websocket hacks. Do not use `gr.Request` for this — it won't work from inside a custom HTML block.

	Modal Volume first deploy: The volume download must complete before the verifier function can load the model. Run `download_model` manually once and confirm with `modal volume ls qwen-14b-volume /models` before deploying the endpoint.

	HF Space cold start: HF Spaces themselves also cold start. If the Space hasn't been visited recently, Gradio takes 20–30 seconds to boot. Add a loading spinner at the Gradio level using `gr.HTML` with a brief "Space initializing..." message that auto-hides once the page is interactive.