A newer version of the Gradio SDK is available: 6.18.0
title: Split-Brain Co-Pilot
emoji: ⚡
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
- code-generation
- webgpu
- small-models
- llama.cpp
- modal
- local-first
- transformers.js
Split-Brain Co-Pilot
A small-model coding assistant for the Build Small Hackathon. A 1.5B code model drafts instantly inside Chrome with WebGPU, while a 14B Qwen verifier on Modal checks the draft in the background. When the verifier catches a problem, the UI flashes, rolls back, and types in the corrected cloud block live.
The result is a split-brain workflow: fast local generation first, slower cloud verification second, and a sandbox proof step when the final answer is Python.
Try it in Chrome 113+ on desktop: load the local model, enter a coding prompt, generate, verify, then run the sandbox.
Demo and Socials
- Demo video: https://youtu.be/tLZ1Y0ldAe0
- LinkedIn post: https://www.linkedin.com/posts/blessingmwiti_buildsmallhackathon-webgpu-modal-ugcPost-7469852940250468354-_TQq
- X/Twitter post: https://x.com/BlessingMwiti/status/2064088119008219337
Why it fits Build Small
This project is built for the An Adventure in Thousand Token Wood track: the AI behavior is the experience. The fun part is not just that it writes code, but that two small models disagree, verify, and visibly reconcile their answers.
- Small models only:
Qwen2.5-Coder-1.5B+Qwen2.5-Coder-14B-Instruct= 15.5B total parameters, under the 32B cap. - Built on Gradio: the app is a Gradio Space with a custom HTML/CSS/JS surface.
- Show, don't tell: token streaming, verifier state, rollback animation, and sandbox output are all visible in the app.
- Modal-powered: the 14B verifier runs on Modal A10G and the Python sandbox runs as a Modal endpoint.
Architecture
- Local brain:
onnx-community/Qwen2.5-Coder-1.5B-Instructthrough transformers.js3.5.x, WebGPU, quantized browser weights. - Cloud brain:
bartowski/Qwen2.5-Coder-14B-Instruct-GGUF(Qwen2.5-Coder-14B-Instruct-Q4_K_M.gguf) served on Modal A10G through llama.cpp. - Shell: Gradio 5 Space with a custom HTML/CSS/JS streaming surface.
- Proof step: Modal sandbox execution endpoint for generated Python code.
flowchart LR
Prompt["User prompt"] --> Local["1.5B browser model<br/>WebGPU + transformers.js"]
Local --> Draft["Streaming draft code"]
Draft --> Verify["14B Modal verifier<br/>llama.cpp on A10G"]
Verify -->|PASS| Final["Verified code"]
Verify -->|FIX / REWRITE| Rollback["Rollback animation<br/>corrected block"]
Rollback --> Final
Final --> Sandbox["Python sandbox proof<br/>Modal Sandbox"]
Requirements
Use Chrome 113+ on desktop. Firefox and Safari do not currently support the WebGPU path this demo needs. The browser model needs roughly 1 GB of available GPU memory, so dedicated GPU machines will feel much better than older integrated graphics.
Local Run
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python app.py
Without MODAL_VERIFIER_URL, the app uses a PASS fallback so the WebGPU UI can be tested locally.
Copy .env.example to .env for local secrets. The .env file is ignored by git.
Modal Setup
Install and authenticate the Modal CLI:
pip install modal
modal token new
modal secret create huggingface-secret HF_TOKEN=hf_xxx
Download the 14B GGUF model into the persistent volume once:
modal run modal_backend/verifier.py::download_model
Deploy the verifier and sandbox:
modal deploy modal_backend/verifier.py
modal deploy modal_backend/sandbox.py
The verifier is intentionally lazy to save Modal credits. It cold-starts only after a user generates code and the app calls /verify. For a live recording or judging window, you can warm it once:
modal run modal_backend/verifier.py::warm_once
Avoid scheduled keep-warm jobs unless you are actively demoing; keeping the 14B verifier warm can burn credits quickly.
Set these Space secrets after deploy:
| Secret | Value |
|---|---|
MODAL_VERIFIER_URL |
Modal verifier endpoint URL, with or without /verify |
MODAL_SANDBOX_URL |
Modal sandbox endpoint URL, with or without /execute |
MODAL_TOKEN_ID |
From modal token show |
MODAL_TOKEN_SECRET |
From modal token show |
This project uses modal==1.4.3; older 0.73.x clients are now rejected by Modal as deprecated.
Demo Beat
Prompt idea: "Write a Python function that finds all prime numbers up to n using a segmented sieve, handling edge cases."
Show the model loading bar, token streaming, verifier status, rollback animation on a FIX/REWRITE verdict, and the final verified state. Then click Run Python Sandbox so the demo ends with executable proof, not just generated text.
Badge Targets
- Llama Champion: 14B verifier served through llama.cpp.
- Off-Brand: custom split-brain UI, rollback flash, status rail, token counter, and sandbox output.
- Field Notes: repo draft, ready to publish as a Hugging Face Article or external post.
- Modal Awards: verifier and sandbox are both Modal-powered.
The app is local-first, but not fully Off the Grid: the draft model runs in-browser, while verification intentionally uses Modal.
Current Status
- HF Space: live under the
build-small-hackathonorg. - Local model: browser WebGPU loading works with quantized weights.
- Verifier: Modal endpoint deployed.
- Sandbox: Modal Python execution endpoint deployed.
- Demo video and social posts: published.
- Remaining submission work: public Field Notes URL and submission form.