File size: 5,847 Bytes
053ee0d
 
 
 
9f9873b
053ee0d
 
 
 
 
 
 
 
9f9873b
053ee0d
9f9873b
053ee0d
9f9873b
e1f3fa8
 
 
 
 
 
053ee0d
 
 
 
9f9873b
 
 
 
 
 
c89ae4a
 
 
 
 
 
9f9873b
 
 
 
 
 
 
 
053ee0d
 
 
9f9873b
053ee0d
 
9f9873b
 
 
 
 
 
 
 
 
 
 
 
053ee0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cd451e7
 
 
 
 
 
 
 
053ee0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9f9873b
053ee0d
 
 
9f9873b
 
285964f
9f9873b
 
 
 
 
 
 
 
 
 
c89ae4a
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
title: Split-Brain Co-Pilot
emoji: 
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
  - code-generation
  - webgpu
  - small-models
  - llama.cpp
  - modal
  - local-first
  - transformers.js
  - track:wood
  - sponsor:openai
  - sponsor:modal
  - achievement:offbrand
  - achievement:llama
  - achievement:fieldnotes
---

# Split-Brain Co-Pilot

A small-model coding assistant for the Build Small Hackathon. A 1.5B code model drafts instantly inside Chrome with WebGPU, while a 14B Qwen verifier on Modal checks the draft in the background. When the verifier catches a problem, the UI flashes, rolls back, and types in the corrected cloud block live.

The result is a split-brain workflow: fast local generation first, slower cloud verification second, and a sandbox proof step when the final answer is Python.

Try it in Chrome 113+ on desktop: load the local model, enter a coding prompt, generate, verify, then run the sandbox.

## Demo and Socials

- Demo video: https://youtu.be/tLZ1Y0ldAe0
- LinkedIn post: https://www.linkedin.com/posts/blessingmwiti_buildsmallhackathon-webgpu-modal-ugcPost-7469852940250468354-_TQq
- X/Twitter post: https://x.com/BlessingMwiti/status/2064088119008219337

## Why it fits Build Small

This project is built for the **An Adventure in Thousand Token Wood** track: the AI behavior is the experience. The fun part is not just that it writes code, but that two small models disagree, verify, and visibly reconcile their answers.

- **Small models only:** `Qwen2.5-Coder-1.5B` + `Qwen2.5-Coder-14B-Instruct` = **15.5B total parameters**, under the 32B cap.
- **Built on Gradio:** the app is a Gradio Space with a custom HTML/CSS/JS surface.
- **Show, don't tell:** token streaming, verifier state, rollback animation, and sandbox output are all visible in the app.
- **Modal-powered:** the 14B verifier runs on Modal A10G and the Python sandbox runs as a Modal endpoint.

## Architecture

- Local brain: `onnx-community/Qwen2.5-Coder-1.5B-Instruct` through transformers.js `3.5.x`, WebGPU, quantized browser weights.
- Cloud brain: `bartowski/Qwen2.5-Coder-14B-Instruct-GGUF` (`Qwen2.5-Coder-14B-Instruct-Q4_K_M.gguf`) served on Modal A10G through llama.cpp.
- Shell: Gradio 5 Space with a custom HTML/CSS/JS streaming surface.
- Proof step: Modal sandbox execution endpoint for generated Python code.

```mermaid
flowchart LR
    Prompt["User prompt"] --> Local["1.5B browser model<br/>WebGPU + transformers.js"]
    Local --> Draft["Streaming draft code"]
    Draft --> Verify["14B Modal verifier<br/>llama.cpp on A10G"]
    Verify -->|PASS| Final["Verified code"]
    Verify -->|FIX / REWRITE| Rollback["Rollback animation<br/>corrected block"]
    Rollback --> Final
    Final --> Sandbox["Python sandbox proof<br/>Modal Sandbox"]
```

## Requirements

Use Chrome 113+ on desktop. Firefox and Safari do not currently support the WebGPU path this demo needs. The browser model needs roughly 1 GB of available GPU memory, so dedicated GPU machines will feel much better than older integrated graphics.

## Local Run

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python app.py
```

Without `MODAL_VERIFIER_URL`, the app uses a PASS fallback so the WebGPU UI can be tested locally.

Copy `.env.example` to `.env` for local secrets. The `.env` file is ignored by git.

## Modal Setup

Install and authenticate the Modal CLI:

```bash
pip install modal
modal token new
modal secret create huggingface-secret HF_TOKEN=hf_xxx
```

Download the 14B GGUF model into the persistent volume once:

```bash
modal run modal_backend/verifier.py::download_model
```

Deploy the verifier and sandbox:

```bash
modal deploy modal_backend/verifier.py
modal deploy modal_backend/sandbox.py
```

The verifier is intentionally lazy to save Modal credits. It cold-starts only after a user generates code and the app calls `/verify`. For a live recording or judging window, you can warm it once:

```bash
modal run modal_backend/verifier.py::warm_once
```

Avoid scheduled keep-warm jobs unless you are actively demoing; keeping the 14B verifier warm can burn credits quickly.

Set these Space secrets after deploy:

| Secret | Value |
| --- | --- |
| `MODAL_VERIFIER_URL` | Modal verifier endpoint URL, with or without `/verify` |
| `MODAL_SANDBOX_URL` | Modal sandbox endpoint URL, with or without `/execute` |
| `MODAL_TOKEN_ID` | From `modal token show` |
| `MODAL_TOKEN_SECRET` | From `modal token show` |

This project uses `modal==1.4.3`; older `0.73.x` clients are now rejected by Modal as deprecated.

## Demo Beat

Prompt idea: "Write a Python function that finds all prime numbers up to n using a segmented sieve, handling edge cases."

Show the model loading bar, token streaming, verifier status, rollback animation on a FIX/REWRITE verdict, and the final verified state. Then click **Run Python Sandbox** so the demo ends with executable proof, not just generated text.

## Badge Targets

- Llama Champion: 14B verifier served through llama.cpp.
- Off-Brand: custom split-brain UI, rollback flash, status rail, token counter, and sandbox output.
- Field Notes: [repo draft](FIELD_NOTES.md), ready to publish as a Hugging Face Article or external post.
- Modal Awards: verifier and sandbox are both Modal-powered.

The app is **local-first**, but not fully Off the Grid: the draft model runs in-browser, while verification intentionally uses Modal.

## Current Status

- HF Space: live under the `build-small-hackathon` org.
- Local model: browser WebGPU loading works with quantized weights.
- Verifier: Modal endpoint deployed.
- Sandbox: Modal Python execution endpoint deployed.
- Demo video and social posts: published.
- Remaining submission work: public Field Notes URL and submission form.