Lion Optimized

AXL-Docs-8M

Documentation generation. 9.9M params. PPL 1.12. Context 2048 bytes.

10M

Parameters

1.12

Perplexity

10 min

Training

20 MB

GGUF

Specs Training Usage Download

Property	Value
Architecture	Multi-Scale Transformer
d_model	?
Attention Heads	?
Layers per Scale	?
Context Window	2048 bytes
Downsample Factors	[1, 2, 4]
Vocab Size	258 (byte-level)
Optimizer	Lion

Retrained with Lion on 16MB documentation pairs. 215 steps in 10 min.

Metric	Value
Final Loss	0.1067
Perplexity	1.12
Training Steps	215
Training Time	10 min

Usage

ollama create axl-docs-8m -f Modelfile
ollama run axl-docs-8m "def fibonacci():"

Adds docstrings with parameter descriptions.

File	Size	Format
F16 GGUF	20 MB	Full precision
Q4_K_M GGUF	20 MB	4-bit quantized

GGUF files work with Ollama and llama.cpp. Q4_K_M is about 3x smaller than F16.

← All AXL Models