Spaces:

build-small-hackathon
/

split-brain-copilot

Running

App Files Files Community

split-brain-copilot / README.md

blessingmwiti

Add hackathon submission tags

e1f3fa8 about 7 hours ago

preview code

raw

history blame contribute delete

5.85 kB

	---
	title: Split-Brain Co-Pilot
	emoji: ⚡
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: 5.30.0
	app_file: app.py
	pinned: true
	license: apache-2.0
	tags:
	- code-generation
	- webgpu
	- small-models
	- llama.cpp
	- modal
	- local-first
	- transformers.js
	- track:wood
	- sponsor:openai
	- sponsor:modal
	- achievement:offbrand
	- achievement:llama
	- achievement:fieldnotes
	---

	# Split-Brain Co-Pilot

	A small-model coding assistant for the Build Small Hackathon. A 1.5B code model drafts instantly inside Chrome with WebGPU, while a 14B Qwen verifier on Modal checks the draft in the background. When the verifier catches a problem, the UI flashes, rolls back, and types in the corrected cloud block live.

	The result is a split-brain workflow: fast local generation first, slower cloud verification second, and a sandbox proof step when the final answer is Python.

	Try it in Chrome 113+ on desktop: load the local model, enter a coding prompt, generate, verify, then run the sandbox.

	## Demo and Socials

	- Demo video: https://youtu.be/tLZ1Y0ldAe0
	- LinkedIn post: https://www.linkedin.com/posts/blessingmwiti_buildsmallhackathon-webgpu-modal-ugcPost-7469852940250468354-_TQq
	- X/Twitter post: https://x.com/BlessingMwiti/status/2064088119008219337

	## Why it fits Build Small

	This project is built for the An Adventure in Thousand Token Wood track: the AI behavior is the experience. The fun part is not just that it writes code, but that two small models disagree, verify, and visibly reconcile their answers.

	- Small models only: `Qwen2.5-Coder-1.5B` + `Qwen2.5-Coder-14B-Instruct` = 15.5B total parameters, under the 32B cap.
	- Built on Gradio: the app is a Gradio Space with a custom HTML/CSS/JS surface.
	- Show, don't tell: token streaming, verifier state, rollback animation, and sandbox output are all visible in the app.
	- Modal-powered: the 14B verifier runs on Modal A10G and the Python sandbox runs as a Modal endpoint.

	## Architecture

	- Local brain: `onnx-community/Qwen2.5-Coder-1.5B-Instruct` through transformers.js `3.5.x`, WebGPU, quantized browser weights.
	- Cloud brain: `bartowski/Qwen2.5-Coder-14B-Instruct-GGUF` (`Qwen2.5-Coder-14B-Instruct-Q4_K_M.gguf`) served on Modal A10G through llama.cpp.
	- Shell: Gradio 5 Space with a custom HTML/CSS/JS streaming surface.
	- Proof step: Modal sandbox execution endpoint for generated Python code.

	```mermaid
	flowchart LR
	Prompt["User prompt"] --> Local["1.5B browser model<br/>WebGPU + transformers.js"]
	Local --> Draft["Streaming draft code"]
	Draft --> Verify["14B Modal verifier<br/>llama.cpp on A10G"]
	Verify -->\|PASS\| Final["Verified code"]
	Verify -->\|FIX / REWRITE\| Rollback["Rollback animation<br/>corrected block"]
	Rollback --> Final
	Final --> Sandbox["Python sandbox proof<br/>Modal Sandbox"]
	```

	## Requirements

	Use Chrome 113+ on desktop. Firefox and Safari do not currently support the WebGPU path this demo needs. The browser model needs roughly 1 GB of available GPU memory, so dedicated GPU machines will feel much better than older integrated graphics.

	## Local Run

	```bash
	python3 -m venv .venv
	source .venv/bin/activate
	pip install -r requirements.txt
	python app.py
	```

	Without `MODAL_VERIFIER_URL`, the app uses a PASS fallback so the WebGPU UI can be tested locally.

	Copy `.env.example` to `.env` for local secrets. The `.env` file is ignored by git.

	## Modal Setup

	Install and authenticate the Modal CLI:

	```bash
	pip install modal
	modal token new
	modal secret create huggingface-secret HF_TOKEN=hf_xxx
	```

	Download the 14B GGUF model into the persistent volume once:

	```bash
	modal run modal_backend/verifier.py::download_model
	```

	Deploy the verifier and sandbox:

	```bash
	modal deploy modal_backend/verifier.py
	modal deploy modal_backend/sandbox.py
	```

	The verifier is intentionally lazy to save Modal credits. It cold-starts only after a user generates code and the app calls `/verify`. For a live recording or judging window, you can warm it once:

	```bash
	modal run modal_backend/verifier.py::warm_once
	```

	Avoid scheduled keep-warm jobs unless you are actively demoing; keeping the 14B verifier warm can burn credits quickly.

	Set these Space secrets after deploy:

	\| Secret \| Value \|
	\| --- \| --- \|
	\| `MODAL_VERIFIER_URL` \| Modal verifier endpoint URL, with or without `/verify` \|
	\| `MODAL_SANDBOX_URL` \| Modal sandbox endpoint URL, with or without `/execute` \|
	\| `MODAL_TOKEN_ID` \| From `modal token show` \|
	\| `MODAL_TOKEN_SECRET` \| From `modal token show` \|

	This project uses `modal==1.4.3`; older `0.73.x` clients are now rejected by Modal as deprecated.

	## Demo Beat

	Prompt idea: "Write a Python function that finds all prime numbers up to n using a segmented sieve, handling edge cases."

	Show the model loading bar, token streaming, verifier status, rollback animation on a FIX/REWRITE verdict, and the final verified state. Then click Run Python Sandbox so the demo ends with executable proof, not just generated text.

	## Badge Targets

	- Llama Champion: 14B verifier served through llama.cpp.
	- Off-Brand: custom split-brain UI, rollback flash, status rail, token counter, and sandbox output.
	- Field Notes: [repo draft](FIELD_NOTES.md), ready to publish as a Hugging Face Article or external post.
	- Modal Awards: verifier and sandbox are both Modal-powered.

	The app is local-first, but not fully Off the Grid: the draft model runs in-browser, while verification intentionally uses Modal.

	## Current Status

	- HF Space: live under the `build-small-hackathon` org.
	- Local model: browser WebGPU loading works with quantized weights.
	- Verifier: Modal endpoint deployed.
	- Sandbox: Modal Python execution endpoint deployed.
	- Demo video and social posts: published.
	- Remaining submission work: public Field Notes URL and submission form.