--- title: smolcode emoji: πŸ€– colorFrom: purple colorTo: indigo sdk: gradio sdk_version: 5.50.0 python_version: "3.12" app_file: app.py pinned: false license: apache-2.0 short_description: A tiny local model that writes code, runs it, and fixes it. tags: - build-small-hackathon - agent - code-generation - gradio --- # smolcode πŸ€– **A tiny local model that writes code, runs it, and fixes it β€” until it works.** smolcode is an *agentic* coding assistant built for **small** language models. Instead of autocompleting, it runs a **plan β†’ write β†’ execute β†’ repair** loop: it writes a file, runs it in a sandbox, reads the real error, and iterates until a test passes β€” on a model small enough to run on your own machine (a ≀4B model on a laptop, scaling up to 32B on a workstation). **No cloud APIs.** Built for the [Hugging Face Γ— Gradio **Build Small** Hackathon](https://huggingface.co/build-small-hackathon). ## Why it's a "Build Small" entry - **Agentic on a 3B model.** The loop β€” not the model size β€” does the work. A ≀4B model drives tool calls reliably enough to write, run, and self-correct code. - **Local-first & private.** Talks to any OpenAI-compatible endpoint (Ollama, llama.cpp). Nothing leaves your machine. - **Specialty routing.** A 2D router classifies tasks into 16 language/function families and escalates within each family's fine-tuned ladder before falling back to bigger Granite models. - **Fine-tuned tiny coder.** We fine-tuned **Qwen2.5-Coder-1.5B** to emit native tool calls so a ≀2B model can be the cheap entry tier β€” published at [`seanpoyner/smolcode-coder-1.5b-tools`](https://huggingface.co/seanpoyner/smolcode-coder-1.5b-tools). - **Rust core.** Agent loop, tool execution, and tracing run through [**LiteForge**](https://github.com/seanpoyner/liteforge) and **smolcode-core** (Rust/PyO3). Gradio is the (required) shell; the brain is Rust. ## How to use this Space 1. Type a coding task, e.g. *"write a function that validates an email and test it."* 2. Watch the **agent trace** stream live: `write_file β†’ run_python β†’ (error) β†’ fix β†’ pass`. 3. The **router** badge shows which tier solved it and whether it's **βœ“ verified**. 4. Tick **⚑ fan out** and enter several lines to run independent tasks as **parallel subagents**. ## Benchmark β€” the loop is the product The agentic loop is what makes a tiny model useful. On the same HumanEval-style suite (`bench/tasks.py`, 10 tasks, pass@1): | System | Model | pass@1 | |--------|-------|--------| | single-shot | fine-tuned **1.5B** | 50% | | **agentic loop** | fine-tuned **1.5B** | **70%** | | single-shot | granite4.1:3b | 90% | *The writeβ†’runβ†’fix loop lifts the fine-tuned 1.5B from **50% β†’ 70%** (+20 pts) β€” the loop, not raw model size, does the work. A larger model (granite 3B) scores higher single-shot, which is exactly why the router escalates only when the small tier can't verify. Measured with `bench/run.py` on the hal backend.* ## Under the hood ``` Gradio UI β†’ smolcode-core / LiteForge (Rust/PyO3) β†’ OpenAI-compatible endpoint specialty router + agent loop tools: write_file, read_file, run_python, run_tests served by Ollama / llama.cpp (local, HAL LAN, or public Modal+Ollama) ``` The public demo serves the whole specialist matrix + Granite ladder from one Modal container running Ollama, so the specialty router escalates for real in the cloud β€” same engine, just an endpoint change. See [SPACE_DEPLOY.md](SPACE_DEPLOY.md) option (c). There's also a full terminal agent (`smolcode-cli`, a Rust ratatui TUI) and a Replit/Lovable-style app builder (`smolbuilder.py`) on the same engine. - **Code:** https://github.com/seanpoyner/smolcode - **Model:** https://huggingface.co/seanpoyner/smolcode-coder-1.5b-tools - **Engine:** https://github.com/seanpoyner/liteforge - **App builder companion:** https://huggingface.co/spaces/seanpoyner/smolbuilder ## Demo video [▢️ Watch the demo](https://huggingface.co/spaces/seanpoyner/smolcode/resolve/main/demo.mp4) β€” the agent writes code, runs it, fixes the failing test, and shows the router tier that solved it. ## Share > Most coding tasks don't need a giant model. **smolcode** is an agentic coding agent that runs entirely on a *small local model* β€” it writes the code, runs it, reads the real error, and fixes itself until tests pass. Fine-tuned **1.5B** coder; the router escalates a tier only when needed (all ≀32B). Less compute, same result. > > Built for the #BuildSmall hackathon with @huggingface + @Gradio. πŸ¦€ Rust core. > ▢️ https://huggingface.co/spaces/seanpoyner/smolcode > #SmallModels #LocalAI #Gradio #BuildSmall πŸ“£ **Posted on LinkedIn:** https://www.linkedin.com/posts/sean-poyner_buildsmall-smallmodels-localai-share-7472421438109650944-bQGy/