SGD Optimized

AXL-Coder-15M

Agentic coding. 26M params. PPL 1.54. 8-action tool router.

26M
Parameters
1.54
Perplexity
---
Training
30 MB
GGUF
PropertyValue
ArchitectureMulti-Scale Transformer
d_model?
Attention Heads?
Layers per Scale?
Context Window256 bytes
Downsample Factors[1, 2, 4]
Vocab Size258 (byte-level)
OptimizerSGD
Trained with SGD for 10 min. Tool router predicts 8 actions: generate, read, write, edit, run, search, think, done.
MetricValue
Final Loss0.4331
Perplexity1.54
Training Steps?
Training Time---

Usage

ollama create axl-coder-15m -f Modelfile
ollama run axl-coder-15m "def fibonacci():"
Agentic model. Decides what action to take per coding task.
FileSizeFormat
F16 GGUF30 MBFull precision
Q4_K_M GGUF30 MB4-bit quantized
GGUF files work with Ollama and llama.cpp. Q4_K_M is about 3x smaller than F16.
← All AXL Models