Spaces:

seanpoyner
/

smolcode

Paused

App Files Files Community

smolcode / README.md

seanpoyner

Card: refreshed demo video + Rust/learned-router framing

ad293ce verified 17 days ago

preview code

Raw

History Blame Contribute Delete

5.04 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

metadata

title: smolcode
emoji: 🤖
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 5.50.0
python_version: '3.12'
app_file: app.py
pinned: false
license: apache-2.0
short_description: A tiny local model that writes code, runs it, and fixes it.
tags:
  - build-small-hackathon
  - agent
  - code-generation
  - gradio

smolcode 🤖

A tiny local model that writes code, runs it, and fixes it — until it works.

smolcode is an agentic coding assistant built for small language models. Instead of autocompleting, it runs a plan → write → execute → repair loop: it writes a file, runs it in a sandbox, reads the real error, and iterates until a test passes — on a model small enough to run on your own machine (a ≤4B model on a laptop, scaling up to 32B on a workstation). No cloud APIs.

Built for the Hugging Face × Gradio Build Small Hackathon.

Why it's a "Build Small" entry

Agentic on a 3B model. The loop — not the model size — does the work. A ≤4B model drives tool calls reliably enough to write, run, and self-correct code.
Local-first & private. Talks to any OpenAI-compatible endpoint (Ollama, llama.cpp). Nothing leaves your machine.
Specialty routing. A 2D router classifies tasks into 16 language/function families and escalates within each family's fine-tuned ladder before falling back to bigger Granite models.
Fine-tuned tiny coder. We fine-tuned Qwen2.5-Coder-1.5B to emit native tool calls so a ≤2B model can be the cheap entry tier — published at seanpoyner/smolcode-coder-1.5b-tools.
Rust core. Agent loop, tool execution, and tracing run through LiteForge and smolcode-core (Rust/PyO3). Gradio is the (required) shell; the brain is Rust.

How to use this Space

Type a coding task, e.g. "write a function that validates an email and test it."
Watch the agent trace stream live: write_file → run_python → (error) → fix → pass.
The router badge shows which tier solved it and whether it's ✓ verified.
Tick ⚡ fan out and enter several lines to run independent tasks as parallel subagents.

Benchmark — the loop is the product

The agentic loop is what makes a tiny model useful. On the same HumanEval-style suite (bench/tasks.py, 10 tasks, pass@1):

System	Model	pass@1
single-shot	fine-tuned 1.5B	50%
agentic loop	fine-tuned 1.5B	70%
single-shot	granite4.1:3b	90%

The write→run→fix loop lifts the fine-tuned 1.5B from 50% → 70% (+20 pts) — the loop, not raw model size, does the work. A larger model (granite 3B) scores higher single-shot, which is exactly why the router escalates only when the small tier can't verify. Measured with bench/run.py on the hal backend.

Under the hood

Gradio UI  →  smolcode-core / LiteForge (Rust/PyO3)  →  OpenAI-compatible endpoint
                  specialty router + agent loop
                  tools: write_file, read_file, run_python, run_tests
                  served by Ollama / llama.cpp (local, HAL LAN, or public Modal+Ollama)

The public demo serves the whole specialist matrix + Granite ladder from one Modal container running Ollama, so the specialty router escalates for real in the cloud — same engine, just an endpoint change. See SPACE_DEPLOY.md option (c).

There's also a full terminal agent (smolcode-cli, a Rust ratatui TUI) and a Replit/Lovable-style app builder (smolbuilder.py) on the same engine.

Code: https://github.com/seanpoyner/smolcode
Model: https://huggingface.co/seanpoyner/smolcode-coder-1.5b-tools
Engine: https://github.com/seanpoyner/liteforge
App builder companion: https://huggingface.co/spaces/seanpoyner/smolbuilder

Demo video

▶️ Watch the demo — the agent writes code, runs it, fixes the failing test, and shows the router tier that solved it.

Most coding tasks don't need a giant model. smolcode is an agentic coding agent that runs entirely on a small local model — it writes the code, runs it, reads the real error, and fixes itself until tests pass. Fine-tuned 1.5B coder; the router escalates a tier only when needed (all ≤32B). Less compute, same result.

Built for the #BuildSmall hackathon with @huggingface + @Gradio. 🦀 Rust core. ▶️ https://huggingface.co/spaces/seanpoyner/smolcode #SmallModels #LocalAI #Gradio #BuildSmall

📣 Posted on LinkedIn: https://www.linkedin.com/posts/sean-poyner_buildsmall-smallmodels-localai-share-7472421438109650944-bQGy/