Split-Brain Co-Pilot: local-first coding with a 1.5B browser model and a 14B verifier

Community Article Published June 14, 2026

I built Split-Brain Co-Pilot for the Build Small Hackathon as an experiment in making small models feel useful, fast, and a little theatrical.

The app has two brains:

  • A 1.5B Qwen coder model runs locally in the browser through WebGPU and transformers.js.
  • A 14B Qwen coder verifier runs on Modal with llama.cpp and checks the local draft in the background.

The total model budget is 15.5B parameters, comfortably under the 32B hackathon cap. The design goal was not to imitate a giant coding assistant. It was to ask a smaller question: what if a tiny local model could move fast, while a second small model watched its back?

The idea

Most code assistants feel like a single mind. You prompt them, they think somewhere else, and then they return an answer.

I wanted this one to feel split:

  1. The browser model drafts immediately.
  2. The UI streams the code as it appears.
  3. A Modal verifier checks the result.
  4. If the verifier finds a problem, the UI flashes, rolls back, and replaces the draft with corrected code.
  5. For Python, the final code can be sent to a sandbox so the demo ends with execution, not just text.

That visible disagreement is the point. The app is not only generating code; it is showing a small local model being reviewed by a larger, still-small verifier.

Why split the work?

Small models are often good enough at the first draft, especially for compact coding tasks. They are also much more fun when they run close to the user. Loading a 1.5B model in Chrome gives the app a local, physical feeling: the tokens are not arriving from a remote API; they are being produced on the machine in front of you.

But first drafts are fragile. A small model can produce code that looks plausible but has missing edge cases, markdown wrappers, or functions that are defined but never called. That made the verifier useful as a second pass rather than a replacement for the local model.

The split became:

  • Browser: fast speculative draft, visible streaming, low latency.
  • Modal: slower verification, bigger context, llama.cpp-backed review.
  • Sandbox: executable proof for Python outputs.

Architecture

The app is a Gradio Space with a custom HTML/CSS/JS interface.

flowchart LR
    Prompt["Prompt"] --> Local["1.5B browser model<br/>WebGPU + transformers.js"]
    Local --> Draft["Streaming draft"]
    Draft --> Verifier["14B verifier<br/>Modal A10G + llama.cpp"]
    Verifier -->|PASS| Final["Verified code"]
    Verifier -->|FIX / REWRITE| Rollback["Rollback animation<br/>corrected block"]
    Rollback --> Final
    Final --> Sandbox["Modal Python sandbox"]

The browser model is onnx-community/Qwen2.5-Coder-1.5B-Instruct. It runs through transformers.js with WebGPU. The verifier uses Qwen2.5-Coder-14B-Instruct in GGUF format from Bartowski's Hugging Face repo, loaded into a Modal Volume and served with llama.cpp on an A10G.

The verifier returns a small JSON contract:

{ "verdict": "PASS" }

or:

{
  "verdict": "FIX",
  "corrected_code": "...",
  "reason": "..."
}

That JSON contract matters. It lets the UI treat verification as an event, not a chat message. A PASS marks the draft as clean. A FIX or REWRITE triggers the rollback animation and replaces the local block.

What went wrong

The most interesting bugs were not the dramatic ones. They were the small integration edges.

Markdown code fences

The browser model sometimes produced:

```python
def example():
    ...
```

That looked fine in the UI, but the Python sandbox received the backticks and failed with a syntax error. The fix was to clean generated code before sending it to the verifier or sandbox. That sounds obvious after the fact, but it is exactly the kind of boundary bug that appears when generated text becomes executable input.

A successful sandbox can still look empty

One test prompt asked for a function that loops from 1 to 10. The model generated:

def loop_to_ten():
    for i in range(1, 11):
        print(i)

The sandbox returned returncode: 0, with empty stdout and stderr. That was not a failure. Python had successfully defined the function and exited. Nothing called it.

The app now adds a conservative harness for Python sandbox runs: if the code contains only definitions and the first function has no required arguments, the sandbox calls it under:

if __name__ == "__main__":
    loop_to_ten()

It does not guess arguments for functions that require inputs. That would make the demo less honest.

Async Modal endpoints

The verifier initially appeared to hang at "Verifier warming up..." longer than expected. The issue was the ASGI endpoint calling a Modal method in a blocking way from an async route. Switching to the async remote call path made the endpoint behave cleanly and stopped the UI from feeling stuck during verifier startup.

WebGPU dtype fallback

The browser model did not load with the first quantization choice on every machine. The loader now tries a small sequence of dtypes and surfaces the actual active dtype in the UI. For the working demo path, q4f16 loaded successfully.

What I learned

Small models become much more interesting when the interface admits their uncertainty.

Instead of hiding the verifier behind a final answer, the app makes the verification step visible. The UI says: here is the local draft, here is the cloud check, here is the verdict, and here is the corrected code if the two disagree.

That changed how the project felt. The verifier was no longer just a backend. It became part of the performance.

I also learned that "runs locally" is not a single binary property. This app is local-first, not fully offline:

  • The draft model runs in the browser.
  • The verifier runs on Modal.
  • The sandbox runs on Modal.

That tradeoff felt honest for this build. The local model gives speed and presence. The cloud verifier gives a stronger second opinion. The app is not trying to be pure; it is trying to show what each small model is good at.

What I would improve next

The next version would add a few things:

  • A prompt gallery with reliable demos for PASS, FIX, and REWRITE.
  • A clearer diff view showing exactly what the verifier changed.
  • More language-aware sandboxing beyond Python.
  • A warm verifier indicator so users understand Modal cold starts.
  • Saved traces of local draft, verifier verdict, and sandbox result for easier judging and debugging.

Links

Built by Blessing Mwiti in collaboration with OpenAI Codex.

Community

Sign up or log in to comment