smolcode / README.md
seanpoyner's picture
Card: refreshed demo video + Rust/learned-router framing
ad293ce verified
|
Raw
History Blame Contribute Delete
5.04 kB
---
title: smolcode
emoji: πŸ€–
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 5.50.0
python_version: "3.12"
app_file: app.py
pinned: false
license: apache-2.0
short_description: A tiny local model that writes code, runs it, and fixes it.
tags:
- build-small-hackathon
- agent
- code-generation
- gradio
---
# smolcode πŸ€–
**A tiny local model that writes code, runs it, and fixes it β€” until it works.**
smolcode is an *agentic* coding assistant built for **small** language models. Instead of
autocompleting, it runs a **plan β†’ write β†’ execute β†’ repair** loop: it writes a file, runs
it in a sandbox, reads the real error, and iterates until a test passes β€” on a model small
enough to run on your own machine (a ≀4B model on a laptop, scaling up to 32B on a
workstation). **No cloud APIs.**
Built for the [Hugging Face Γ— Gradio **Build Small** Hackathon](https://huggingface.co/build-small-hackathon).
## Why it's a "Build Small" entry
- **Agentic on a 3B model.** The loop β€” not the model size β€” does the work. A ≀4B model
drives tool calls reliably enough to write, run, and self-correct code.
- **Local-first & private.** Talks to any OpenAI-compatible endpoint (Ollama, llama.cpp).
Nothing leaves your machine.
- **Specialty routing.** A 2D router classifies tasks into 16 language/function
families and escalates within each family's fine-tuned ladder before falling back
to bigger Granite models.
- **Fine-tuned tiny coder.** We fine-tuned **Qwen2.5-Coder-1.5B** to emit native tool calls
so a ≀2B model can be the cheap entry tier β€” published at
[`seanpoyner/smolcode-coder-1.5b-tools`](https://huggingface.co/seanpoyner/smolcode-coder-1.5b-tools).
- **Rust core.** Agent loop, tool execution, and tracing run through
[**LiteForge**](https://github.com/seanpoyner/liteforge) and **smolcode-core**
(Rust/PyO3). Gradio is the (required) shell; the brain is Rust.
## How to use this Space
1. Type a coding task, e.g. *"write a function that validates an email and test it."*
2. Watch the **agent trace** stream live: `write_file β†’ run_python β†’ (error) β†’ fix β†’ pass`.
3. The **router** badge shows which tier solved it and whether it's **βœ“ verified**.
4. Tick **⚑ fan out** and enter several lines to run independent tasks as **parallel subagents**.
## Benchmark β€” the loop is the product
The agentic loop is what makes a tiny model useful. On the same HumanEval-style suite
(`bench/tasks.py`, 10 tasks, pass@1):
<!-- BENCH_TABLE_START -->
| System | Model | pass@1 |
|--------|-------|--------|
| single-shot | fine-tuned **1.5B** | 50% |
| **agentic loop** | fine-tuned **1.5B** | **70%** |
| single-shot | granite4.1:3b | 90% |
*The write→run→fix loop lifts the fine-tuned 1.5B from **50% → 70%** (+20 pts) — the
loop, not raw model size, does the work. A larger model (granite 3B) scores higher
single-shot, which is exactly why the router escalates only when the small tier can't
verify. Measured with `bench/run.py` on the hal backend.*
<!-- BENCH_TABLE_END -->
## Under the hood
```
Gradio UI β†’ smolcode-core / LiteForge (Rust/PyO3) β†’ OpenAI-compatible endpoint
specialty router + agent loop
tools: write_file, read_file, run_python, run_tests
served by Ollama / llama.cpp (local, HAL LAN, or public Modal+Ollama)
```
The public demo serves the whole specialist matrix + Granite ladder from one
Modal container running Ollama, so the specialty router escalates for real in the
cloud β€” same engine, just an endpoint change. See
[SPACE_DEPLOY.md](SPACE_DEPLOY.md) option (c).
There's also a full terminal agent (`smolcode-cli`, a Rust ratatui TUI) and a
Replit/Lovable-style app builder (`smolbuilder.py`) on the same engine.
- **Code:** https://github.com/seanpoyner/smolcode
- **Model:** https://huggingface.co/seanpoyner/smolcode-coder-1.5b-tools
- **Engine:** https://github.com/seanpoyner/liteforge
- **App builder companion:** https://huggingface.co/spaces/seanpoyner/smolbuilder
## Demo video
<video controls src="https://huggingface.co/spaces/seanpoyner/smolcode/resolve/main/demo.mp4"></video>
[▢️ Watch the demo](https://huggingface.co/spaces/seanpoyner/smolcode/resolve/main/demo.mp4) β€” the agent writes code, runs it, fixes the failing test, and shows the router tier that solved it.
## Share
> Most coding tasks don't need a giant model. **smolcode** is an agentic coding agent that runs entirely on a *small local model* β€” it writes the code, runs it, reads the real error, and fixes itself until tests pass. Fine-tuned **1.5B** coder; the router escalates a tier only when needed (all ≀32B). Less compute, same result.
>
> Built for the #BuildSmall hackathon with @huggingface + @Gradio. πŸ¦€ Rust core.
> ▢️ https://huggingface.co/spaces/seanpoyner/smolcode
> #SmallModels #LocalAI #Gradio #BuildSmall
πŸ“£ **Posted on LinkedIn:** https://www.linkedin.com/posts/sean-poyner_buildsmall-smallmodels-localai-share-7472421438109650944-bQGy/