Lion Optimized

AXL-300M

Flagship Lion model. 322M params. PPL 1.11. Context 256 bytes. Retrained from SGD to Lion.

322M
Parameters
1.11
Perplexity
45 min
Training
645 MB
GGUF
PropertyValue
ArchitectureMulti-Scale Transformer
d_model?
Attention Heads?
Layers per Scale?
Context Window256 bytes
Downsample Factors[1, 2, 4]
Vocab Size258 (byte-level)
OptimizerLion
Retrained with Lion optimizer on 50MB real HF Python code. 1082 steps in 45 min. Previous SGD version: PPL 5.98.
MetricValue
Final Loss0.1465
Perplexity1.11
Training Steps1082
Training Time45 min

Usage

ollama create axl-300m -f Modelfile
ollama run axl-300m "def fibonacci():"
Excellent general Python code generation. Flagship model.
FileSizeFormat
F16 GGUF645 MBFull precision
Q4_K_M GGUF---4-bit quantized
GGUF files work with Ollama and llama.cpp. Q4_K_M is about 3x smaller than F16.
← All AXL Models