Spaces:

build-small-hackathon
/

split-brain-copilot

Running

File size: 5,847 Bytes

---
title: Split-Brain Co-Pilot
emoji: ⚡
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
  - code-generation
  - webgpu
  - small-models
  - llama.cpp
  - modal
  - local-first
  - transformers.js
  - track:wood
  - sponsor:openai
  - sponsor:modal
  - achievement:offbrand
  - achievement:llama
  - achievement:fieldnotes
---

# Split-Brain Co-Pilot

A small-model coding assistant for the Build Small Hackathon. A 1.5B code model drafts instantly inside Chrome with WebGPU, while a 14B Qwen verifier on Modal checks the draft in the background. When the verifier catches a problem, the UI flashes, rolls back, and types in the corrected cloud block live.

The result is a split-brain workflow: fast local generation first, slower cloud verification second, and a sandbox proof step when the final answer is Python.

Try it in Chrome 113+ on desktop: load the local model, enter a coding prompt, generate, verify, then run the sandbox.

## Demo and Socials

- Demo video: https://youtu.be/tLZ1Y0ldAe0
- LinkedIn post: https://www.linkedin.com/posts/blessingmwiti_buildsmallhackathon-webgpu-modal-ugcPost-7469852940250468354-_TQq
- X/Twitter post: https://x.com/BlessingMwiti/status/2064088119008219337

## Why it fits Build Small

This project is built for the **An Adventure in Thousand Token Wood** track: the AI behavior is the experience. The fun part is not just that it writes code, but that two small models disagree, verify, and visibly reconcile their answers.

- **Small models only:** `Qwen2.5-Coder-1.5B` + `Qwen2.5-Coder-14B-Instruct` = **15.5B total parameters**, under the 32B cap.
- **Built on Gradio:** the app is a Gradio Space with a custom HTML/CSS/JS surface.
- **Show, don't tell:** token streaming, verifier state, rollback animation, and sandbox output are all visible in the app.
- **Modal-powered:** the 14B verifier runs on Modal A10G and the Python sandbox runs as a Modal endpoint.

## Architecture

- Local brain: `onnx-community/Qwen2.5-Coder-1.5B-Instruct` through transformers.js `3.5.x`, WebGPU, quantized browser weights.
- Cloud brain: `bartowski/Qwen2.5-Coder-14B-Instruct-GGUF` (`Qwen2.5-Coder-14B-Instruct-Q4_K_M.gguf`) served on Modal A10G through llama.cpp.
- Shell: Gradio 5 Space with a custom HTML/CSS/JS streaming surface.
- Proof step: Modal sandbox execution endpoint for generated Python code.

```mermaid
flowchart LR
    Prompt["User prompt"] --> Local["1.5B browser model<br/>WebGPU + transformers.js"]
    Local --> Draft["Streaming draft code"]
    Draft --> Verify["14B Modal verifier<br/>llama.cpp on A10G"]
    Verify -->|PASS| Final["Verified code"]
    Verify -->|FIX / REWRITE| Rollback["Rollback animation<br/>corrected block"]
    Rollback --> Final
    Final --> Sandbox["Python sandbox proof<br/>Modal Sandbox"]
```

## Requirements

Use Chrome 113+ on desktop. Firefox and Safari do not currently support the WebGPU path this demo needs. The browser model needs roughly 1 GB of available GPU memory, so dedicated GPU machines will feel much better than older integrated graphics.

## Local Run

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python app.py
```

Without `MODAL_VERIFIER_URL`, the app uses a PASS fallback so the WebGPU UI can be tested locally.

Copy `.env.example` to `.env` for local secrets. The `.env` file is ignored by git.

## Modal Setup

Install and authenticate the Modal CLI:

```bash
pip install modal
modal token new
modal secret create huggingface-secret HF_TOKEN=hf_xxx
```

Download the 14B GGUF model into the persistent volume once:

```bash
modal run modal_backend/verifier.py::download_model
```

Deploy the verifier and sandbox:

```bash
modal deploy modal_backend/verifier.py
modal deploy modal_backend/sandbox.py
```

The verifier is intentionally lazy to save Modal credits. It cold-starts only after a user generates code and the app calls `/verify`. For a live recording or judging window, you can warm it once:

```bash
modal run modal_backend/verifier.py::warm_once
```

Avoid scheduled keep-warm jobs unless you are actively demoing; keeping the 14B verifier warm can burn credits quickly.

Set these Space secrets after deploy:

| Secret | Value |
| --- | --- |
| `MODAL_VERIFIER_URL` | Modal verifier endpoint URL, with or without `/verify` |
| `MODAL_SANDBOX_URL` | Modal sandbox endpoint URL, with or without `/execute` |
| `MODAL_TOKEN_ID` | From `modal token show` |
| `MODAL_TOKEN_SECRET` | From `modal token show` |

This project uses `modal==1.4.3`; older `0.73.x` clients are now rejected by Modal as deprecated.

## Demo Beat

Prompt idea: "Write a Python function that finds all prime numbers up to n using a segmented sieve, handling edge cases."

Show the model loading bar, token streaming, verifier status, rollback animation on a FIX/REWRITE verdict, and the final verified state. Then click **Run Python Sandbox** so the demo ends with executable proof, not just generated text.

## Badge Targets

- Llama Champion: 14B verifier served through llama.cpp.
- Off-Brand: custom split-brain UI, rollback flash, status rail, token counter, and sandbox output.
- Field Notes: [repo draft](FIELD_NOTES.md), ready to publish as a Hugging Face Article or external post.
- Modal Awards: verifier and sandbox are both Modal-powered.

The app is **local-first**, but not fully Off the Grid: the draft model runs in-browser, while verification intentionally uses Modal.

## Current Status

- HF Space: live under the `build-small-hackathon` org.
- Local model: browser WebGPU loading works with quantized weights.
- Verifier: Modal endpoint deployed.
- Sandbox: Modal Python execution endpoint deployed.
- Demo video and social posts: published.
- Remaining submission work: public Field Notes URL and submission form.