split-brain-copilot / README.md
blessingmwiti's picture
Add demo and social links
c89ae4a

A newer version of the Gradio SDK is available: 6.18.0

Upgrade
metadata
title: Split-Brain Co-Pilot
emoji: 
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
  - code-generation
  - webgpu
  - small-models
  - llama.cpp
  - modal
  - local-first
  - transformers.js

Split-Brain Co-Pilot

A small-model coding assistant for the Build Small Hackathon. A 1.5B code model drafts instantly inside Chrome with WebGPU, while a 14B Qwen verifier on Modal checks the draft in the background. When the verifier catches a problem, the UI flashes, rolls back, and types in the corrected cloud block live.

The result is a split-brain workflow: fast local generation first, slower cloud verification second, and a sandbox proof step when the final answer is Python.

Try it in Chrome 113+ on desktop: load the local model, enter a coding prompt, generate, verify, then run the sandbox.

Demo and Socials

Why it fits Build Small

This project is built for the An Adventure in Thousand Token Wood track: the AI behavior is the experience. The fun part is not just that it writes code, but that two small models disagree, verify, and visibly reconcile their answers.

  • Small models only: Qwen2.5-Coder-1.5B + Qwen2.5-Coder-14B-Instruct = 15.5B total parameters, under the 32B cap.
  • Built on Gradio: the app is a Gradio Space with a custom HTML/CSS/JS surface.
  • Show, don't tell: token streaming, verifier state, rollback animation, and sandbox output are all visible in the app.
  • Modal-powered: the 14B verifier runs on Modal A10G and the Python sandbox runs as a Modal endpoint.

Architecture

  • Local brain: onnx-community/Qwen2.5-Coder-1.5B-Instruct through transformers.js 3.5.x, WebGPU, quantized browser weights.
  • Cloud brain: bartowski/Qwen2.5-Coder-14B-Instruct-GGUF (Qwen2.5-Coder-14B-Instruct-Q4_K_M.gguf) served on Modal A10G through llama.cpp.
  • Shell: Gradio 5 Space with a custom HTML/CSS/JS streaming surface.
  • Proof step: Modal sandbox execution endpoint for generated Python code.
flowchart LR
    Prompt["User prompt"] --> Local["1.5B browser model<br/>WebGPU + transformers.js"]
    Local --> Draft["Streaming draft code"]
    Draft --> Verify["14B Modal verifier<br/>llama.cpp on A10G"]
    Verify -->|PASS| Final["Verified code"]
    Verify -->|FIX / REWRITE| Rollback["Rollback animation<br/>corrected block"]
    Rollback --> Final
    Final --> Sandbox["Python sandbox proof<br/>Modal Sandbox"]

Requirements

Use Chrome 113+ on desktop. Firefox and Safari do not currently support the WebGPU path this demo needs. The browser model needs roughly 1 GB of available GPU memory, so dedicated GPU machines will feel much better than older integrated graphics.

Local Run

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python app.py

Without MODAL_VERIFIER_URL, the app uses a PASS fallback so the WebGPU UI can be tested locally.

Copy .env.example to .env for local secrets. The .env file is ignored by git.

Modal Setup

Install and authenticate the Modal CLI:

pip install modal
modal token new
modal secret create huggingface-secret HF_TOKEN=hf_xxx

Download the 14B GGUF model into the persistent volume once:

modal run modal_backend/verifier.py::download_model

Deploy the verifier and sandbox:

modal deploy modal_backend/verifier.py
modal deploy modal_backend/sandbox.py

The verifier is intentionally lazy to save Modal credits. It cold-starts only after a user generates code and the app calls /verify. For a live recording or judging window, you can warm it once:

modal run modal_backend/verifier.py::warm_once

Avoid scheduled keep-warm jobs unless you are actively demoing; keeping the 14B verifier warm can burn credits quickly.

Set these Space secrets after deploy:

Secret Value
MODAL_VERIFIER_URL Modal verifier endpoint URL, with or without /verify
MODAL_SANDBOX_URL Modal sandbox endpoint URL, with or without /execute
MODAL_TOKEN_ID From modal token show
MODAL_TOKEN_SECRET From modal token show

This project uses modal==1.4.3; older 0.73.x clients are now rejected by Modal as deprecated.

Demo Beat

Prompt idea: "Write a Python function that finds all prime numbers up to n using a segmented sieve, handling edge cases."

Show the model loading bar, token streaming, verifier status, rollback animation on a FIX/REWRITE verdict, and the final verified state. Then click Run Python Sandbox so the demo ends with executable proof, not just generated text.

Badge Targets

  • Llama Champion: 14B verifier served through llama.cpp.
  • Off-Brand: custom split-brain UI, rollback flash, status rail, token counter, and sandbox output.
  • Field Notes: repo draft, ready to publish as a Hugging Face Article or external post.
  • Modal Awards: verifier and sandbox are both Modal-powered.

The app is local-first, but not fully Off the Grid: the draft model runs in-browser, while verification intentionally uses Modal.

Current Status

  • HF Space: live under the build-small-hackathon org.
  • Local model: browser WebGPU loading works with quantized weights.
  • Verifier: Modal endpoint deployed.
  • Sandbox: Modal Python execution endpoint deployed.
  • Demo video and social posts: published.
  • Remaining submission work: public Field Notes URL and submission form.