Post
76
Releasing YOLO-Coder-8B and YOLO-Coder-1.5B — fine-tuned models for fixing broken CLI commands, running 100% locally.
Both models are fine-tuned from Qwen2.5-Coder using MLX LoRA on Apple Silicon, trained on 6,719 real CLI error→fix pairs across 15 categories (Python, pip, Node.js, npm, Docker, Git, Cargo, SSH, database, and more).
Unlike general-purpose coding assistants, these models are laser-focused on a single task: given a CLI error, output exactly one bare shell command that fixes it. No explanation. No markdown. One command.
**Benchmark results (YOLO-Bench, 218 verified CLI errors, structural match scoring):**
- YOLO-Coder-8B raw LLM: **59.2%** (vs GPT-4o 48.6%, Claude Sonnet 60.1%)
- YOLO-Coder-8B full pipeline: **77.1%**
- YOLO-Coder-1.5B raw LLM: **42.2%**
- YOLO-Coder-1.5B full pipeline: **71.1%**
The full pipeline layers 73 deterministic interceptors and fix memory on top of the LLM — roughly half of all fixes never reach the model.
Both models are available as Q4_K_M GGUFs for Ollama:
- 🔗 [YOLO-Coder-8B]( erdemozkan/YOLO-Coder-8B)
- 🔗 [YOLO-Coder-1.5B]( erdemozkan/YOLO-Coder-1.5B)
Benchmark dataset and scoring code: [github.com/erdemozkan/YOLO-CODER/tree/main/benchmark](https://github.com/erdemozkan/YOLO-CODER/tree/main/benchmark)
Both models are fine-tuned from Qwen2.5-Coder using MLX LoRA on Apple Silicon, trained on 6,719 real CLI error→fix pairs across 15 categories (Python, pip, Node.js, npm, Docker, Git, Cargo, SSH, database, and more).
Unlike general-purpose coding assistants, these models are laser-focused on a single task: given a CLI error, output exactly one bare shell command that fixes it. No explanation. No markdown. One command.
**Benchmark results (YOLO-Bench, 218 verified CLI errors, structural match scoring):**
- YOLO-Coder-8B raw LLM: **59.2%** (vs GPT-4o 48.6%, Claude Sonnet 60.1%)
- YOLO-Coder-8B full pipeline: **77.1%**
- YOLO-Coder-1.5B raw LLM: **42.2%**
- YOLO-Coder-1.5B full pipeline: **71.1%**
The full pipeline layers 73 deterministic interceptors and fix memory on top of the LLM — roughly half of all fixes never reach the model.
Both models are available as Q4_K_M GGUFs for Ollama:
- 🔗 [YOLO-Coder-8B]( erdemozkan/YOLO-Coder-8B)
- 🔗 [YOLO-Coder-1.5B]( erdemozkan/YOLO-Coder-1.5B)
Benchmark dataset and scoring code: [github.com/erdemozkan/YOLO-CODER/tree/main/benchmark](https://github.com/erdemozkan/YOLO-CODER/tree/main/benchmark)