Lion Optimized

AXL-Reasoning-Lion

Chain-of-thought reasoning. 70M params, 5 layers/scale. PPL 1.79. Context 2048 bytes.

70M
Parameters
1.79
Perplexity
20 min
Training
140 MB
GGUF
PropertyValue
ArchitectureMulti-Scale Transformer
d_model?
Attention Heads?
Layers per Scale?
Context Window2048 bytes
Downsample Factors[1, 2, 4]
Vocab Size258 (byte-level)
OptimizerLion
Trained on 50MB real HF Python code. 205 steps in 20 min. 5-layer architecture captures long dependency chains.
MetricValue
Final Loss0.6279
Perplexity1.79
Training Steps205
Training Time20 min

Usage

ollama create axl-reasoning-lion -f Modelfile
ollama run axl-reasoning-lion "def fibonacci():"
Best for multi-step code generation. Extra layers help with complex logic.
FileSizeFormat
F16 GGUF140 MBFull precision
Q4_K_M GGUF44 MB4-bit quantized
GGUF files work with Ollama and llama.cpp. Q4_K_M is about 3x smaller than F16.
← All AXL Models