ternary-models: VLMs, Multimodal & Audio
Collection
Ternary-quantized models for architectures GGUF can't handle. tritplane3 scheme. β’ 8 items β’ Updated
Ternary-quantized version of HuggingFaceTB/SmolVLM2-2.2B-Instruct using ternary-quant.
Compact VLM designed for edge deployment, now even smaller with ternary quantization.
| Property | Value |
|---|---|
| Base Model | HuggingFaceTB/SmolVLM2-2.2B-Instruct |
| Parameters | 2.2B |
| Architecture | VLM (image + text) |
| Quantization | tritplane3 (169 layers, 10.92 effective bits) |
| Vision Encoder | FP16 (preserved) |
| Compression | 1.47x |
| Avg Reconstruction Error | 0.1236 |
| License | Apache 2.0 |
| Method | Size | VLM Support |
|---|---|---|
| FP16 (original) | ~4.4 GB | Yes |
| Ternary tritplane3 | 1.8 GB | Yes |
No GGUF alternative exists for SmolVLM2.
Validated during quantization (collapse score: 0.009 β excellent):
| Test | Output |
|---|---|
| Image description (demo) | "A yellow circle with a diagonal line through it" (correct) |
| "What is machine learning?" | Correct, detailed explanation of ML, algorithms, training |
| "Explain gravity" | Accurate one-sentence explanation |
| Runtime | Min Memory | Hardware |
|---|---|---|
cached (CPU) |
~4 GB RAM | Any |
metal (Apple Silicon) |
~3 GB unified | M1+ |
cached (CUDA) |
~3 GB VRAM | Any NVIDIA GPU |
Ideal for edge deployment β runs on devices with 4 GB RAM.
pip install ternary-quant
from ternary_quant.inference import load_ternary_model
model, processor = load_ternary_model(
"AsadIsmail/SmolVLM2-2.2B-Instruct-ternary",
runtime_mode="cached", device="auto"
)
inputs = processor(text="Describe this image", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128)
print(processor.decode(outputs[0], skip_special_tokens=True))
Part of ternary-models.
GitHub: github.com/Asad-Ismail/ternary-models | Library: github.com/Asad-Ismail/ternary-quant
Base model
HuggingFaceTB/SmolLM2-1.7B