Lion Optimized

AXL-Chat-10M

Conversational AI. 9.9M params. PPL 1.48. Context 2048 bytes.

10M
Parameters
1.48
Perplexity
10 min
Training
20 MB
GGUF
PropertyValue
ArchitectureMulti-Scale Transformer
d_model?
Attention Heads?
Layers per Scale?
Context Window2048 bytes
Downsample Factors[1, 2, 4]
Vocab Size258 (byte-level)
OptimizerLion
Retrained with Lion on 10MB chat pairs. 216 steps in 10 min. Covers code Q&A, general knowledge.
MetricValue
Final Loss0.3650
Perplexity1.48
Training Steps216
Training Time10 min

Usage

ollama create axl-chat-10m -f Modelfile
ollama run axl-chat-10m "def fibonacci():"
Good for code explanation and Q&A.
FileSizeFormat
F16 GGUF20 MBFull precision
Q4_K_M GGUF20 MB4-bit quantized
GGUF files work with Ollama and llama.cpp. Q4_K_M is about 3x smaller than F16.
← All AXL Models