SGD Optimized

AXL-Reasoning-70M

CoT reasoning (SGD). 70M params. PPL 1.93.

70M
Parameters
1.93
Perplexity
60 min
Training
140 MB
GGUF
PropertyValue
ArchitectureMulti-Scale Transformer
d_model?
Attention Heads?
Layers per Scale?
Context Window256 bytes
Downsample Factors[1, 2, 4]
Vocab Size258 (byte-level)
OptimizerSGD
SGD 60 min. 1478 steps. Lion version achieves PPL 1.79 in 20 min.
MetricValue
Final Loss0.6384
Perplexity1.93
Training Steps1478
Training Time60 min

Usage

ollama create axl-reasoning-70m -f Modelfile
ollama run axl-reasoning-70m "def fibonacci():"
SGD reasoning baseline. Lion version is significantly better.
FileSizeFormat
F16 GGUF140 MBFull precision
Q4_K_M GGUF140 MB4-bit quantized
GGUF files work with Ollama and llama.cpp. Q4_K_M is about 3x smaller than F16.
← All AXL Models