Specialized Optimized

AXL-Vision-0.8M

Vision encoder. 1M params. Converts 224x224 images to feature vectors.

753024
Parameters
---
Perplexity
30 min
Training
---
GGUF
PropertyValue
ArchitectureMulti-Scale Transformer
d_model?
Attention Heads?
Layers per Scale?
Context Window256 bytes
Downsample Factors[1, 2, 4]
Vocab Size258 (byte-level)
OptimizerSGD
Patch-based image encoder with 16x16 patches. Foundation for multi-modal AXL.
MetricValue
Final Loss1.0014
Perplexity---
Training Steps32402
Training Time30 min

Usage

ollama create axl-vision-0.8m -f Modelfile
ollama run axl-vision-0.8m "def fibonacci():"
Image feature extraction for downstream vision tasks.
FileSizeFormat
F16 GGUF---Full precision
Q4_K_M GGUF---4-bit quantized
GGUF files work with Ollama and llama.cpp. Q4_K_M is about 3x smaller than F16.
← All AXL Models