--- title: Split-Brain Co-Pilot emoji: ⚡ colorFrom: blue colorTo: green sdk: gradio sdk_version: 5.30.0 app_file: app.py pinned: true license: apache-2.0 tags: - code-generation - webgpu - small-models - llama.cpp - modal - local-first - transformers.js - track:wood - sponsor:openai - sponsor:modal - achievement:offbrand - achievement:llama - achievement:fieldnotes --- # Split-Brain Co-Pilot A small-model coding assistant for the Build Small Hackathon. A 1.5B code model drafts instantly inside Chrome with WebGPU, while a 14B Qwen verifier on Modal checks the draft in the background. When the verifier catches a problem, the UI flashes, rolls back, and types in the corrected cloud block live. The result is a split-brain workflow: fast local generation first, slower cloud verification second, and a sandbox proof step when the final answer is Python. Try it in Chrome 113+ on desktop: load the local model, enter a coding prompt, generate, verify, then run the sandbox. ## Demo and Socials - Demo video: https://youtu.be/tLZ1Y0ldAe0 - LinkedIn post: https://www.linkedin.com/posts/blessingmwiti_buildsmallhackathon-webgpu-modal-ugcPost-7469852940250468354-_TQq - X/Twitter post: https://x.com/BlessingMwiti/status/2064088119008219337 ## Why it fits Build Small This project is built for the **An Adventure in Thousand Token Wood** track: the AI behavior is the experience. The fun part is not just that it writes code, but that two small models disagree, verify, and visibly reconcile their answers. - **Small models only:** `Qwen2.5-Coder-1.5B` + `Qwen2.5-Coder-14B-Instruct` = **15.5B total parameters**, under the 32B cap. - **Built on Gradio:** the app is a Gradio Space with a custom HTML/CSS/JS surface. - **Show, don't tell:** token streaming, verifier state, rollback animation, and sandbox output are all visible in the app. - **Modal-powered:** the 14B verifier runs on Modal A10G and the Python sandbox runs as a Modal endpoint. ## Architecture - Local brain: `onnx-community/Qwen2.5-Coder-1.5B-Instruct` through transformers.js `3.5.x`, WebGPU, quantized browser weights. - Cloud brain: `bartowski/Qwen2.5-Coder-14B-Instruct-GGUF` (`Qwen2.5-Coder-14B-Instruct-Q4_K_M.gguf`) served on Modal A10G through llama.cpp. - Shell: Gradio 5 Space with a custom HTML/CSS/JS streaming surface. - Proof step: Modal sandbox execution endpoint for generated Python code. ```mermaid flowchart LR Prompt["User prompt"] --> Local["1.5B browser model
WebGPU + transformers.js"] Local --> Draft["Streaming draft code"] Draft --> Verify["14B Modal verifier
llama.cpp on A10G"] Verify -->|PASS| Final["Verified code"] Verify -->|FIX / REWRITE| Rollback["Rollback animation
corrected block"] Rollback --> Final Final --> Sandbox["Python sandbox proof
Modal Sandbox"] ``` ## Requirements Use Chrome 113+ on desktop. Firefox and Safari do not currently support the WebGPU path this demo needs. The browser model needs roughly 1 GB of available GPU memory, so dedicated GPU machines will feel much better than older integrated graphics. ## Local Run ```bash python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt python app.py ``` Without `MODAL_VERIFIER_URL`, the app uses a PASS fallback so the WebGPU UI can be tested locally. Copy `.env.example` to `.env` for local secrets. The `.env` file is ignored by git. ## Modal Setup Install and authenticate the Modal CLI: ```bash pip install modal modal token new modal secret create huggingface-secret HF_TOKEN=hf_xxx ``` Download the 14B GGUF model into the persistent volume once: ```bash modal run modal_backend/verifier.py::download_model ``` Deploy the verifier and sandbox: ```bash modal deploy modal_backend/verifier.py modal deploy modal_backend/sandbox.py ``` The verifier is intentionally lazy to save Modal credits. It cold-starts only after a user generates code and the app calls `/verify`. For a live recording or judging window, you can warm it once: ```bash modal run modal_backend/verifier.py::warm_once ``` Avoid scheduled keep-warm jobs unless you are actively demoing; keeping the 14B verifier warm can burn credits quickly. Set these Space secrets after deploy: | Secret | Value | | --- | --- | | `MODAL_VERIFIER_URL` | Modal verifier endpoint URL, with or without `/verify` | | `MODAL_SANDBOX_URL` | Modal sandbox endpoint URL, with or without `/execute` | | `MODAL_TOKEN_ID` | From `modal token show` | | `MODAL_TOKEN_SECRET` | From `modal token show` | This project uses `modal==1.4.3`; older `0.73.x` clients are now rejected by Modal as deprecated. ## Demo Beat Prompt idea: "Write a Python function that finds all prime numbers up to n using a segmented sieve, handling edge cases." Show the model loading bar, token streaming, verifier status, rollback animation on a FIX/REWRITE verdict, and the final verified state. Then click **Run Python Sandbox** so the demo ends with executable proof, not just generated text. ## Badge Targets - Llama Champion: 14B verifier served through llama.cpp. - Off-Brand: custom split-brain UI, rollback flash, status rail, token counter, and sandbox output. - Field Notes: [repo draft](FIELD_NOTES.md), ready to publish as a Hugging Face Article or external post. - Modal Awards: verifier and sandbox are both Modal-powered. The app is **local-first**, but not fully Off the Grid: the draft model runs in-browser, while verification intentionally uses Modal. ## Current Status - HF Space: live under the `build-small-hackathon` org. - Local model: browser WebGPU loading works with quantized weights. - Verifier: Modal endpoint deployed. - Sandbox: Modal Python execution endpoint deployed. - Demo video and social posts: published. - Remaining submission work: public Field Notes URL and submission form.