Specialized Optimized

AXL-Chat-Pro

Advanced conversational AI. 12.8M params. PPL 1.34. Context 2048 bytes.

13M
Parameters
1.34
Perplexity
10 min
Training
26 MB
GGUF
PropertyValue
ArchitectureMulti-Scale Transformer
d_model?
Attention Heads?
Layers per Scale?
Context Window2048 bytes
Downsample Factors[1, 2, 4]
Vocab Size258 (byte-level)
OptimizerLion
Rewritten from numpy to PyTorch. Trained with Lion on 10MB chat pairs. 208 steps in 10 min.
MetricValue
Final Loss0.3106
Perplexity1.34
Training Steps208
Training Time10 min

Usage

ollama create axl-chat-pro -f Modelfile
ollama run axl-chat-pro "def fibonacci():"
Better quality than AXL-Chat-Lion (PPL 1.34 vs 1.52).
FileSizeFormat
F16 GGUF26 MBFull precision
Q4_K_M GGUF15 MB4-bit quantized
GGUF files work with Ollama and llama.cpp. Q4_K_M is about 3x smaller than F16.
← All AXL Models