File size: 5,847 Bytes
053ee0d 9f9873b 053ee0d 9f9873b 053ee0d 9f9873b 053ee0d 9f9873b e1f3fa8 053ee0d 9f9873b c89ae4a 9f9873b 053ee0d 9f9873b 053ee0d 9f9873b 053ee0d cd451e7 053ee0d 9f9873b 053ee0d 9f9873b 285964f 9f9873b c89ae4a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | ---
title: Split-Brain Co-Pilot
emoji: ⚡
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
- code-generation
- webgpu
- small-models
- llama.cpp
- modal
- local-first
- transformers.js
- track:wood
- sponsor:openai
- sponsor:modal
- achievement:offbrand
- achievement:llama
- achievement:fieldnotes
---
# Split-Brain Co-Pilot
A small-model coding assistant for the Build Small Hackathon. A 1.5B code model drafts instantly inside Chrome with WebGPU, while a 14B Qwen verifier on Modal checks the draft in the background. When the verifier catches a problem, the UI flashes, rolls back, and types in the corrected cloud block live.
The result is a split-brain workflow: fast local generation first, slower cloud verification second, and a sandbox proof step when the final answer is Python.
Try it in Chrome 113+ on desktop: load the local model, enter a coding prompt, generate, verify, then run the sandbox.
## Demo and Socials
- Demo video: https://youtu.be/tLZ1Y0ldAe0
- LinkedIn post: https://www.linkedin.com/posts/blessingmwiti_buildsmallhackathon-webgpu-modal-ugcPost-7469852940250468354-_TQq
- X/Twitter post: https://x.com/BlessingMwiti/status/2064088119008219337
## Why it fits Build Small
This project is built for the **An Adventure in Thousand Token Wood** track: the AI behavior is the experience. The fun part is not just that it writes code, but that two small models disagree, verify, and visibly reconcile their answers.
- **Small models only:** `Qwen2.5-Coder-1.5B` + `Qwen2.5-Coder-14B-Instruct` = **15.5B total parameters**, under the 32B cap.
- **Built on Gradio:** the app is a Gradio Space with a custom HTML/CSS/JS surface.
- **Show, don't tell:** token streaming, verifier state, rollback animation, and sandbox output are all visible in the app.
- **Modal-powered:** the 14B verifier runs on Modal A10G and the Python sandbox runs as a Modal endpoint.
## Architecture
- Local brain: `onnx-community/Qwen2.5-Coder-1.5B-Instruct` through transformers.js `3.5.x`, WebGPU, quantized browser weights.
- Cloud brain: `bartowski/Qwen2.5-Coder-14B-Instruct-GGUF` (`Qwen2.5-Coder-14B-Instruct-Q4_K_M.gguf`) served on Modal A10G through llama.cpp.
- Shell: Gradio 5 Space with a custom HTML/CSS/JS streaming surface.
- Proof step: Modal sandbox execution endpoint for generated Python code.
```mermaid
flowchart LR
Prompt["User prompt"] --> Local["1.5B browser model<br/>WebGPU + transformers.js"]
Local --> Draft["Streaming draft code"]
Draft --> Verify["14B Modal verifier<br/>llama.cpp on A10G"]
Verify -->|PASS| Final["Verified code"]
Verify -->|FIX / REWRITE| Rollback["Rollback animation<br/>corrected block"]
Rollback --> Final
Final --> Sandbox["Python sandbox proof<br/>Modal Sandbox"]
```
## Requirements
Use Chrome 113+ on desktop. Firefox and Safari do not currently support the WebGPU path this demo needs. The browser model needs roughly 1 GB of available GPU memory, so dedicated GPU machines will feel much better than older integrated graphics.
## Local Run
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python app.py
```
Without `MODAL_VERIFIER_URL`, the app uses a PASS fallback so the WebGPU UI can be tested locally.
Copy `.env.example` to `.env` for local secrets. The `.env` file is ignored by git.
## Modal Setup
Install and authenticate the Modal CLI:
```bash
pip install modal
modal token new
modal secret create huggingface-secret HF_TOKEN=hf_xxx
```
Download the 14B GGUF model into the persistent volume once:
```bash
modal run modal_backend/verifier.py::download_model
```
Deploy the verifier and sandbox:
```bash
modal deploy modal_backend/verifier.py
modal deploy modal_backend/sandbox.py
```
The verifier is intentionally lazy to save Modal credits. It cold-starts only after a user generates code and the app calls `/verify`. For a live recording or judging window, you can warm it once:
```bash
modal run modal_backend/verifier.py::warm_once
```
Avoid scheduled keep-warm jobs unless you are actively demoing; keeping the 14B verifier warm can burn credits quickly.
Set these Space secrets after deploy:
| Secret | Value |
| --- | --- |
| `MODAL_VERIFIER_URL` | Modal verifier endpoint URL, with or without `/verify` |
| `MODAL_SANDBOX_URL` | Modal sandbox endpoint URL, with or without `/execute` |
| `MODAL_TOKEN_ID` | From `modal token show` |
| `MODAL_TOKEN_SECRET` | From `modal token show` |
This project uses `modal==1.4.3`; older `0.73.x` clients are now rejected by Modal as deprecated.
## Demo Beat
Prompt idea: "Write a Python function that finds all prime numbers up to n using a segmented sieve, handling edge cases."
Show the model loading bar, token streaming, verifier status, rollback animation on a FIX/REWRITE verdict, and the final verified state. Then click **Run Python Sandbox** so the demo ends with executable proof, not just generated text.
## Badge Targets
- Llama Champion: 14B verifier served through llama.cpp.
- Off-Brand: custom split-brain UI, rollback flash, status rail, token counter, and sandbox output.
- Field Notes: [repo draft](FIELD_NOTES.md), ready to publish as a Hugging Face Article or external post.
- Modal Awards: verifier and sandbox are both Modal-powered.
The app is **local-first**, but not fully Off the Grid: the draft model runs in-browser, while verification intentionally uses Modal.
## Current Status
- HF Space: live under the `build-small-hackathon` org.
- Local model: browser WebGPU loading works with quantized weights.
- Verifier: Modal endpoint deployed.
- Sandbox: Modal Python execution endpoint deployed.
- Demo video and social posts: published.
- Remaining submission work: public Field Notes URL and submission form.
|