| --- |
| title: Split-Brain Co-Pilot |
| emoji: ⚡ |
| colorFrom: blue |
| colorTo: green |
| sdk: gradio |
| sdk_version: 5.30.0 |
| app_file: app.py |
| pinned: true |
| license: apache-2.0 |
| tags: |
| - code-generation |
| - webgpu |
| - small-models |
| - llama.cpp |
| - modal |
| - local-first |
| - transformers.js |
| - track:wood |
| - sponsor:openai |
| - sponsor:modal |
| - achievement:offbrand |
| - achievement:llama |
| - achievement:fieldnotes |
| --- |
| |
| # Split-Brain Co-Pilot |
|
|
| A small-model coding assistant for the Build Small Hackathon. A 1.5B code model drafts instantly inside Chrome with WebGPU, while a 14B Qwen verifier on Modal checks the draft in the background. When the verifier catches a problem, the UI flashes, rolls back, and types in the corrected cloud block live. |
|
|
| The result is a split-brain workflow: fast local generation first, slower cloud verification second, and a sandbox proof step when the final answer is Python. |
|
|
| Try it in Chrome 113+ on desktop: load the local model, enter a coding prompt, generate, verify, then run the sandbox. |
|
|
| ## Demo and Socials |
|
|
| - Demo video: https://youtu.be/tLZ1Y0ldAe0 |
| - LinkedIn post: https://www.linkedin.com/posts/blessingmwiti_buildsmallhackathon-webgpu-modal-ugcPost-7469852940250468354-_TQq |
| - X/Twitter post: https://x.com/BlessingMwiti/status/2064088119008219337 |
|
|
| ## Why it fits Build Small |
|
|
| This project is built for the **An Adventure in Thousand Token Wood** track: the AI behavior is the experience. The fun part is not just that it writes code, but that two small models disagree, verify, and visibly reconcile their answers. |
|
|
| - **Small models only:** `Qwen2.5-Coder-1.5B` + `Qwen2.5-Coder-14B-Instruct` = **15.5B total parameters**, under the 32B cap. |
| - **Built on Gradio:** the app is a Gradio Space with a custom HTML/CSS/JS surface. |
| - **Show, don't tell:** token streaming, verifier state, rollback animation, and sandbox output are all visible in the app. |
| - **Modal-powered:** the 14B verifier runs on Modal A10G and the Python sandbox runs as a Modal endpoint. |
|
|
| ## Architecture |
|
|
| - Local brain: `onnx-community/Qwen2.5-Coder-1.5B-Instruct` through transformers.js `3.5.x`, WebGPU, quantized browser weights. |
| - Cloud brain: `bartowski/Qwen2.5-Coder-14B-Instruct-GGUF` (`Qwen2.5-Coder-14B-Instruct-Q4_K_M.gguf`) served on Modal A10G through llama.cpp. |
| - Shell: Gradio 5 Space with a custom HTML/CSS/JS streaming surface. |
| - Proof step: Modal sandbox execution endpoint for generated Python code. |
|
|
| ```mermaid |
| flowchart LR |
| Prompt["User prompt"] --> Local["1.5B browser model<br/>WebGPU + transformers.js"] |
| Local --> Draft["Streaming draft code"] |
| Draft --> Verify["14B Modal verifier<br/>llama.cpp on A10G"] |
| Verify -->|PASS| Final["Verified code"] |
| Verify -->|FIX / REWRITE| Rollback["Rollback animation<br/>corrected block"] |
| Rollback --> Final |
| Final --> Sandbox["Python sandbox proof<br/>Modal Sandbox"] |
| ``` |
|
|
| ## Requirements |
|
|
| Use Chrome 113+ on desktop. Firefox and Safari do not currently support the WebGPU path this demo needs. The browser model needs roughly 1 GB of available GPU memory, so dedicated GPU machines will feel much better than older integrated graphics. |
|
|
| ## Local Run |
|
|
| ```bash |
| python3 -m venv .venv |
| source .venv/bin/activate |
| pip install -r requirements.txt |
| python app.py |
| ``` |
|
|
| Without `MODAL_VERIFIER_URL`, the app uses a PASS fallback so the WebGPU UI can be tested locally. |
|
|
| Copy `.env.example` to `.env` for local secrets. The `.env` file is ignored by git. |
|
|
| ## Modal Setup |
|
|
| Install and authenticate the Modal CLI: |
|
|
| ```bash |
| pip install modal |
| modal token new |
| modal secret create huggingface-secret HF_TOKEN=hf_xxx |
| ``` |
|
|
| Download the 14B GGUF model into the persistent volume once: |
|
|
| ```bash |
| modal run modal_backend/verifier.py::download_model |
| ``` |
|
|
| Deploy the verifier and sandbox: |
|
|
| ```bash |
| modal deploy modal_backend/verifier.py |
| modal deploy modal_backend/sandbox.py |
| ``` |
|
|
| The verifier is intentionally lazy to save Modal credits. It cold-starts only after a user generates code and the app calls `/verify`. For a live recording or judging window, you can warm it once: |
|
|
| ```bash |
| modal run modal_backend/verifier.py::warm_once |
| ``` |
|
|
| Avoid scheduled keep-warm jobs unless you are actively demoing; keeping the 14B verifier warm can burn credits quickly. |
|
|
| Set these Space secrets after deploy: |
|
|
| | Secret | Value | |
| | --- | --- | |
| | `MODAL_VERIFIER_URL` | Modal verifier endpoint URL, with or without `/verify` | |
| | `MODAL_SANDBOX_URL` | Modal sandbox endpoint URL, with or without `/execute` | |
| | `MODAL_TOKEN_ID` | From `modal token show` | |
| | `MODAL_TOKEN_SECRET` | From `modal token show` | |
|
|
| This project uses `modal==1.4.3`; older `0.73.x` clients are now rejected by Modal as deprecated. |
|
|
| ## Demo Beat |
|
|
| Prompt idea: "Write a Python function that finds all prime numbers up to n using a segmented sieve, handling edge cases." |
|
|
| Show the model loading bar, token streaming, verifier status, rollback animation on a FIX/REWRITE verdict, and the final verified state. Then click **Run Python Sandbox** so the demo ends with executable proof, not just generated text. |
|
|
| ## Badge Targets |
|
|
| - Llama Champion: 14B verifier served through llama.cpp. |
| - Off-Brand: custom split-brain UI, rollback flash, status rail, token counter, and sandbox output. |
| - Field Notes: [repo draft](FIELD_NOTES.md), ready to publish as a Hugging Face Article or external post. |
| - Modal Awards: verifier and sandbox are both Modal-powered. |
|
|
| The app is **local-first**, but not fully Off the Grid: the draft model runs in-browser, while verification intentionally uses Modal. |
|
|
| ## Current Status |
|
|
| - HF Space: live under the `build-small-hackathon` org. |
| - Local model: browser WebGPU loading works with quantized weights. |
| - Verifier: Modal endpoint deployed. |
| - Sandbox: Modal Python execution endpoint deployed. |
| - Demo video and social posts: published. |
| - Remaining submission work: public Field Notes URL and submission form. |
|
|