Lion Optimized

AXL-Code-1B-Lion

The largest Lion model. 318M params trained in 20 min. PPL 1.90. Context 256 bytes.

318M
Parameters
1.90
Perplexity
20 min
Training
636 MB
GGUF
PropertyValue
ArchitectureMulti-Scale Transformer
d_model?
Attention Heads?
Layers per Scale?
Context Window256 bytes
Downsample Factors[1, 2, 4]
Vocab Size258 (byte-level)
OptimizerLion
Trained on 50MB real HF Python code. 421 steps, 20 min. Lion vs SGD: PPL 1.90 vs 31.22.
MetricValue
Final Loss0.6338
Perplexity1.90
Training Steps421
Training Time20 min

Usage

ollama create axl-code-1b-lion -f Modelfile
ollama run axl-code-1b-lion "def fibonacci():"
Best overall code generation. General-purpose code completion.
FileSizeFormat
F16 GGUF636 MBFull precision
Q4_K_M GGUF197 MB4-bit quantized
GGUF files work with Ollama and llama.cpp. Q4_K_M is about 3x smaller than F16.
← All AXL Models