SGD Optimized

AXL-Refactor-20M

Refactoring (SGD). 19.1M params. PPL 1.01. Context 1024 bytes.

19M
Parameters
1.01
Perplexity
5 min
Training
38 MB
GGUF
PropertyValue
ArchitectureMulti-Scale Transformer
d_model?
Attention Heads?
Layers per Scale?
Context Window1024 bytes
Downsample Factors[1, 2, 4]
Vocab Size258 (byte-level)
OptimizerSGD
Trained on 7MB before/after pairs. 202 steps.
MetricValue
Final Loss0.0081
Perplexity1.01
Training Steps202
Training Time5 min

Usage

ollama create axl-refactor-20m -f Modelfile
ollama run axl-refactor-20m "def fibonacci():"
Refactoring baseline. AXL-Refactor-Lion has PPL 1.11.
FileSizeFormat
F16 GGUF38 MBFull precision
Q4_K_M GGUF38 MB4-bit quantized
GGUF files work with Ollama and llama.cpp. Q4_K_M is about 3x smaller than F16.
← All AXL Models