Releasing YOLO-Coder-8B and YOLO-Coder-1.5B — fine-tuned models for fixing broken CLI commands, running 100% locally.
Both models are fine-tuned from Qwen2.5-Coder using MLX LoRA on Apple Silicon, trained on 6,719 real CLI error→fix pairs across 15 categories (Python, pip, Node.js, npm, Docker, Git, Cargo, SSH, database, and more).
Unlike general-purpose coding assistants, these models are laser-focused on a single task: given a CLI error, output exactly one bare shell command that fixes it. No explanation. No markdown. One command.
**Benchmark results (YOLO-Bench, 218 verified CLI errors, structural match scoring):** - YOLO-Coder-8B raw LLM: **59.2%** (vs GPT-4o 48.6%, Claude Sonnet 60.1%) - YOLO-Coder-8B full pipeline: **77.1%** - YOLO-Coder-1.5B raw LLM: **42.2%** - YOLO-Coder-1.5B full pipeline: **71.1%**
The full pipeline layers 73 deterministic interceptors and fix memory on top of the LLM — roughly half of all fixes never reach the model.
Experimental global target bits‑per‑weight quantization of Qwen/Qwen3.6-27B and Qwen/Qwen3.6-35B-A3B.
Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.
Key Advantages: - VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM). - Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.
Full benchmarks (PPL, KLD, ARC, GPQA, MMLU, etc.) and methodology in the models' cards.