---
title: smolcode
emoji: 🤖
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 5.50.0
python_version: "3.12"
app_file: app.py
pinned: false
license: apache-2.0
short_description: A tiny local model that writes code, runs it, and fixes it.
tags:
  - build-small-hackathon
  - agent
  - code-generation
  - gradio
---

# smolcode 🤖

**A tiny local model that writes code, runs it, and fixes it — until it works.**

smolcode is an *agentic* coding assistant built for **small** language models. Instead of
autocompleting, it runs a **plan → write → execute → repair** loop: it writes a file, runs
it in a sandbox, reads the real error, and iterates until a test passes — on a model small
enough to run on your own machine (a ≤4B model on a laptop, scaling up to 32B on a
workstation). **No cloud APIs.**

Built for the [Hugging Face × Gradio **Build Small** Hackathon](https://huggingface.co/build-small-hackathon).

## Why it's a "Build Small" entry
- **Agentic on a 3B model.** The loop — not the model size — does the work. A ≤4B model
  drives tool calls reliably enough to write, run, and self-correct code.
- **Local-first & private.** Talks to any OpenAI-compatible endpoint (Ollama, llama.cpp).
  Nothing leaves your machine.
- **Specialty routing.** A 2D router classifies tasks into 16 language/function
  families and escalates within each family's fine-tuned ladder before falling back
  to bigger Granite models.
- **Fine-tuned tiny coder.** We fine-tuned **Qwen2.5-Coder-1.5B** to emit native tool calls
  so a ≤2B model can be the cheap entry tier — published at
  [`seanpoyner/smolcode-coder-1.5b-tools`](https://huggingface.co/seanpoyner/smolcode-coder-1.5b-tools).
- **Rust core.** Agent loop, tool execution, and tracing run through
  [**LiteForge**](https://github.com/seanpoyner/liteforge) and **smolcode-core**
  (Rust/PyO3). Gradio is the (required) shell; the brain is Rust.

## How to use this Space
1. Type a coding task, e.g. *"write a function that validates an email and test it."*
2. Watch the **agent trace** stream live: `write_file → run_python → (error) → fix → pass`.
3. The **router** badge shows which tier solved it and whether it's **✓ verified**.
4. Tick **⚡ fan out** and enter several lines to run independent tasks as **parallel subagents**.

## Benchmark — the loop is the product
The agentic loop is what makes a tiny model useful. On the same HumanEval-style suite
(`bench/tasks.py`, 10 tasks, pass@1):

<!-- BENCH_TABLE_START -->
| System | Model | pass@1 |
|--------|-------|--------|
| single-shot | fine-tuned **1.5B** | 50% |
| **agentic loop** | fine-tuned **1.5B** | **70%** |
| single-shot | granite4.1:3b | 90% |

*The write→run→fix loop lifts the fine-tuned 1.5B from **50% → 70%** (+20 pts) — the
loop, not raw model size, does the work. A larger model (granite 3B) scores higher
single-shot, which is exactly why the router escalates only when the small tier can't
verify. Measured with `bench/run.py` on the hal backend.*
<!-- BENCH_TABLE_END -->

## Under the hood
```
Gradio UI  →  smolcode-core / LiteForge (Rust/PyO3)  →  OpenAI-compatible endpoint
                  specialty router + agent loop
                  tools: write_file, read_file, run_python, run_tests
                  served by Ollama / llama.cpp (local, HAL LAN, or public Modal+Ollama)
```

The public demo serves the whole specialist matrix + Granite ladder from one
Modal container running Ollama, so the specialty router escalates for real in the
cloud — same engine, just an endpoint change. See
[SPACE_DEPLOY.md](SPACE_DEPLOY.md) option (c).

There's also a full terminal agent (`smolcode-cli`, a Rust ratatui TUI) and a
Replit/Lovable-style app builder (`smolbuilder.py`) on the same engine.

- **Code:** https://github.com/seanpoyner/smolcode
- **Model:** https://huggingface.co/seanpoyner/smolcode-coder-1.5b-tools
- **Engine:** https://github.com/seanpoyner/liteforge
- **App builder companion:** https://huggingface.co/spaces/seanpoyner/smolbuilder

## Demo video
<video controls src="https://huggingface.co/spaces/seanpoyner/smolcode/resolve/main/demo.mp4"></video>

[▶️ Watch the demo](https://huggingface.co/spaces/seanpoyner/smolcode/resolve/main/demo.mp4) — the agent writes code, runs it, fixes the failing test, and shows the router tier that solved it.

## Share
> Most coding tasks don't need a giant model. **smolcode** is an agentic coding agent that runs entirely on a *small local model* — it writes the code, runs it, reads the real error, and fixes itself until tests pass. Fine-tuned **1.5B** coder; the router escalates a tier only when needed (all ≤32B). Less compute, same result.
>
> Built for the #BuildSmall hackathon with @huggingface + @Gradio. 🦀 Rust core.
> ▶️ https://huggingface.co/spaces/seanpoyner/smolcode
> #SmallModels #LocalAI #Gradio #BuildSmall

📣 **Posted on LinkedIn:** https://www.linkedin.com/posts/sean-poyner_buildsmall-smallmodels-localai-share-7472421438109650944-bQGy/