---
title: Split-Brain Co-Pilot
emoji: ⚡
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
- code-generation
- webgpu
- small-models
- llama.cpp
- modal
- local-first
- transformers.js
- track:wood
- sponsor:openai
- sponsor:modal
- achievement:offbrand
- achievement:llama
- achievement:fieldnotes
---
# Split-Brain Co-Pilot
A small-model coding assistant for the Build Small Hackathon. A 1.5B code model drafts instantly inside Chrome with WebGPU, while a 14B Qwen verifier on Modal checks the draft in the background. When the verifier catches a problem, the UI flashes, rolls back, and types in the corrected cloud block live.
The result is a split-brain workflow: fast local generation first, slower cloud verification second, and a sandbox proof step when the final answer is Python.
Try it in Chrome 113+ on desktop: load the local model, enter a coding prompt, generate, verify, then run the sandbox.
## Demo and Socials
- Demo video: https://youtu.be/tLZ1Y0ldAe0
- LinkedIn post: https://www.linkedin.com/posts/blessingmwiti_buildsmallhackathon-webgpu-modal-ugcPost-7469852940250468354-_TQq
- X/Twitter post: https://x.com/BlessingMwiti/status/2064088119008219337
## Why it fits Build Small
This project is built for the **An Adventure in Thousand Token Wood** track: the AI behavior is the experience. The fun part is not just that it writes code, but that two small models disagree, verify, and visibly reconcile their answers.
- **Small models only:** `Qwen2.5-Coder-1.5B` + `Qwen2.5-Coder-14B-Instruct` = **15.5B total parameters**, under the 32B cap.
- **Built on Gradio:** the app is a Gradio Space with a custom HTML/CSS/JS surface.
- **Show, don't tell:** token streaming, verifier state, rollback animation, and sandbox output are all visible in the app.
- **Modal-powered:** the 14B verifier runs on Modal A10G and the Python sandbox runs as a Modal endpoint.
## Architecture
- Local brain: `onnx-community/Qwen2.5-Coder-1.5B-Instruct` through transformers.js `3.5.x`, WebGPU, quantized browser weights.
- Cloud brain: `bartowski/Qwen2.5-Coder-14B-Instruct-GGUF` (`Qwen2.5-Coder-14B-Instruct-Q4_K_M.gguf`) served on Modal A10G through llama.cpp.
- Shell: Gradio 5 Space with a custom HTML/CSS/JS streaming surface.
- Proof step: Modal sandbox execution endpoint for generated Python code.
```mermaid
flowchart LR
Prompt["User prompt"] --> Local["1.5B browser model
WebGPU + transformers.js"]
Local --> Draft["Streaming draft code"]
Draft --> Verify["14B Modal verifier
llama.cpp on A10G"]
Verify -->|PASS| Final["Verified code"]
Verify -->|FIX / REWRITE| Rollback["Rollback animation
corrected block"]
Rollback --> Final
Final --> Sandbox["Python sandbox proof
Modal Sandbox"]
```
## Requirements
Use Chrome 113+ on desktop. Firefox and Safari do not currently support the WebGPU path this demo needs. The browser model needs roughly 1 GB of available GPU memory, so dedicated GPU machines will feel much better than older integrated graphics.
## Local Run
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python app.py
```
Without `MODAL_VERIFIER_URL`, the app uses a PASS fallback so the WebGPU UI can be tested locally.
Copy `.env.example` to `.env` for local secrets. The `.env` file is ignored by git.
## Modal Setup
Install and authenticate the Modal CLI:
```bash
pip install modal
modal token new
modal secret create huggingface-secret HF_TOKEN=hf_xxx
```
Download the 14B GGUF model into the persistent volume once:
```bash
modal run modal_backend/verifier.py::download_model
```
Deploy the verifier and sandbox:
```bash
modal deploy modal_backend/verifier.py
modal deploy modal_backend/sandbox.py
```
The verifier is intentionally lazy to save Modal credits. It cold-starts only after a user generates code and the app calls `/verify`. For a live recording or judging window, you can warm it once:
```bash
modal run modal_backend/verifier.py::warm_once
```
Avoid scheduled keep-warm jobs unless you are actively demoing; keeping the 14B verifier warm can burn credits quickly.
Set these Space secrets after deploy:
| Secret | Value |
| --- | --- |
| `MODAL_VERIFIER_URL` | Modal verifier endpoint URL, with or without `/verify` |
| `MODAL_SANDBOX_URL` | Modal sandbox endpoint URL, with or without `/execute` |
| `MODAL_TOKEN_ID` | From `modal token show` |
| `MODAL_TOKEN_SECRET` | From `modal token show` |
This project uses `modal==1.4.3`; older `0.73.x` clients are now rejected by Modal as deprecated.
## Demo Beat
Prompt idea: "Write a Python function that finds all prime numbers up to n using a segmented sieve, handling edge cases."
Show the model loading bar, token streaming, verifier status, rollback animation on a FIX/REWRITE verdict, and the final verified state. Then click **Run Python Sandbox** so the demo ends with executable proof, not just generated text.
## Badge Targets
- Llama Champion: 14B verifier served through llama.cpp.
- Off-Brand: custom split-brain UI, rollback flash, status rail, token counter, and sandbox output.
- Field Notes: [repo draft](FIELD_NOTES.md), ready to publish as a Hugging Face Article or external post.
- Modal Awards: verifier and sandbox are both Modal-powered.
The app is **local-first**, but not fully Off the Grid: the draft model runs in-browser, while verification intentionally uses Modal.
## Current Status
- HF Space: live under the `build-small-hackathon` org.
- Local model: browser WebGPU loading works with quantized weights.
- Verifier: Modal endpoint deployed.
- Sandbox: Modal Python execution endpoint deployed.
- Demo video and social posts: published.
- Remaining submission work: public Field Notes URL and submission form.