Spaces:

seanpoyner
/

smolcode

Paused

App Files Files Community

smolcode / README.md

seanpoyner

Card: refreshed demo video + Rust/learned-router framing

ad293ce verified 17 days ago

preview code

Raw

History Blame Contribute Delete

5.04 kB

	---
	title: smolcode
	emoji: 🤖
	colorFrom: purple
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.50.0
	python_version: "3.12"
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: A tiny local model that writes code, runs it, and fixes it.
	tags:
	- build-small-hackathon
	- agent
	- code-generation
	- gradio
	---

	# smolcode 🤖

	A tiny local model that writes code, runs it, and fixes it — until it works.

	smolcode is an agentic coding assistant built for small language models. Instead of
	autocompleting, it runs a plan → write → execute → repair loop: it writes a file, runs
	it in a sandbox, reads the real error, and iterates until a test passes — on a model small
	enough to run on your own machine (a ≤4B model on a laptop, scaling up to 32B on a
	workstation). No cloud APIs.

	Built for the [Hugging Face × Gradio Build Small Hackathon](https://huggingface.co/build-small-hackathon).

	## Why it's a "Build Small" entry
	- Agentic on a 3B model. The loop — not the model size — does the work. A ≤4B model
	drives tool calls reliably enough to write, run, and self-correct code.
	- Local-first & private. Talks to any OpenAI-compatible endpoint (Ollama, llama.cpp).
	Nothing leaves your machine.
	- Specialty routing. A 2D router classifies tasks into 16 language/function
	families and escalates within each family's fine-tuned ladder before falling back
	to bigger Granite models.
	- Fine-tuned tiny coder. We fine-tuned Qwen2.5-Coder-1.5B to emit native tool calls
	so a ≤2B model can be the cheap entry tier — published at
	[`seanpoyner/smolcode-coder-1.5b-tools`](https://huggingface.co/seanpoyner/smolcode-coder-1.5b-tools).
	- Rust core. Agent loop, tool execution, and tracing run through
	[LiteForge](https://github.com/seanpoyner/liteforge) and smolcode-core
	(Rust/PyO3). Gradio is the (required) shell; the brain is Rust.

	## How to use this Space
	1. Type a coding task, e.g. "write a function that validates an email and test it."
	2. Watch the agent trace stream live: `write_file → run_python → (error) → fix → pass`.
	3. The router badge shows which tier solved it and whether it's ✓ verified.
	4. Tick ⚡ fan out and enter several lines to run independent tasks as parallel subagents.

	## Benchmark — the loop is the product
	The agentic loop is what makes a tiny model useful. On the same HumanEval-style suite
	(`bench/tasks.py`, 10 tasks, pass@1):

	<!-- BENCH_TABLE_START -->
	\| System \| Model \| pass@1 \|
	\|--------\|-------\|--------\|
	\| single-shot \| fine-tuned 1.5B \| 50% \|
	\| agentic loop \| fine-tuned 1.5B \| 70% \|
	\| single-shot \| granite4.1:3b \| 90% \|

	The write→run→fix loop lifts the fine-tuned 1.5B from 50% → 70%* (+20 pts) — the
	loop, not raw model size, does the work. A larger model (granite 3B) scores higher
	single-shot, which is exactly why the router escalates only when the small tier can't
	verify. Measured with `bench/run.py` on the hal backend.*
	<!-- BENCH_TABLE_END -->

	## Under the hood
	```
	Gradio UI → smolcode-core / LiteForge (Rust/PyO3) → OpenAI-compatible endpoint
	specialty router + agent loop
	tools: write_file, read_file, run_python, run_tests
	served by Ollama / llama.cpp (local, HAL LAN, or public Modal+Ollama)
	```

	The public demo serves the whole specialist matrix + Granite ladder from one
	Modal container running Ollama, so the specialty router escalates for real in the
	cloud — same engine, just an endpoint change. See
	[SPACE_DEPLOY.md](SPACE_DEPLOY.md) option (c).

	There's also a full terminal agent (`smolcode-cli`, a Rust ratatui TUI) and a
	Replit/Lovable-style app builder (`smolbuilder.py`) on the same engine.

	- Code: https://github.com/seanpoyner/smolcode
	- Model: https://huggingface.co/seanpoyner/smolcode-coder-1.5b-tools
	- Engine: https://github.com/seanpoyner/liteforge
	- App builder companion: https://huggingface.co/spaces/seanpoyner/smolbuilder

	## Demo video
	<video controls src="https://huggingface.co/spaces/seanpoyner/smolcode/resolve/main/demo.mp4"></video>

	[▶️ Watch the demo](https://huggingface.co/spaces/seanpoyner/smolcode/resolve/main/demo.mp4) — the agent writes code, runs it, fixes the failing test, and shows the router tier that solved it.

	## Share
	> Most coding tasks don't need a giant model. smolcode is an agentic coding agent that runs entirely on a small local model — it writes the code, runs it, reads the real error, and fixes itself until tests pass. Fine-tuned 1.5B coder; the router escalates a tier only when needed (all ≤32B). Less compute, same result.
	>
	> Built for the #BuildSmall hackathon with @huggingface + @Gradio. 🦀 Rust core.
	> ▶️ https://huggingface.co/spaces/seanpoyner/smolcode
	> #SmallModels #LocalAI #Gradio #BuildSmall

	📣 Posted on LinkedIn: https://www.linkedin.com/posts/sean-poyner_buildsmall-smallmodels-localai-share-7472421438109650944-bQGy/