AsadIsmail
/

SmolVLM2-2.2B-Instruct-ternary

@@ -13,56 +13,78 @@ tags:
 base_model: HuggingFaceTB/SmolVLM2-2.2B-Instruct
 pipeline_tag: image-text-to-text
 license: apache-2.0
 ---
-# SmolVLM2-2.2B-Instruct — Ternary Quantized
-Ternary-quantized version of [HuggingFaceTB/SmolVLM2-2.2B-Instruct](https://huggingface.co/HuggingFaceTB/SmolVLM2-2.2B-Instruct),
-produced with [ternary-quant](https://github.com/Asad-Ismail/ternary-quant).
-SmolVLM2 is HuggingFace's compact vision-language model designed for edge deployment. The
-ternary-quantized version pushes it even further — making it feasible for mobile and IoT devices.
-## Quantization details
-| Metric | Value |
-|--------|-------|
-| **Scheme** | tritplane3 (3-plane progressive ternary) |
-| **Components quantized** | text_backbone, multimodal_connector (169 linear layers) |
-| **Vision encoder** | Kept in FP16 |
-| **Full-model effective bits** | 10.92 |
-| **Compression ratio** | 1.47x |
-| **Avg reconstruction error** | 0.1236 |
-| **Validation** | Passed (correctly describes demo image) |
-## Usage
 ```python
 from ternary_quant.inference import load_ternary_model
 model, processor = load_ternary_model(
     "AsadIsmail/SmolVLM2-2.2B-Instruct-ternary",
-    runtime_mode="cached"
 )
-from PIL import Image
-image = Image.open("photo.jpg")
-inputs = processor(text="Describe this image", images=image, return_tensors="pt")
-inputs = {k: v.to(model.device) for k, v in inputs.items()}
 outputs = model.generate(**inputs, max_new_tokens=128)
 print(processor.decode(outputs[0], skip_special_tokens=True))
 ```
-## Reproduce
-```bash
-pip install ternary-quant
-ternary-quant quantize-broad HuggingFaceTB/SmolVLM2-2.2B-Instruct \
-    --output ./SmolVLM2-2.2B-Instruct-ternary \
-    --components text_backbone multimodal_connector \
-    --scheme tritplane3 --dtype float16 --eval
-```
-## Part of the ternary-models collection
-[github.com/Asad-Ismail/ternary-models](https://github.com/Asad-Ismail/ternary-models)

 base_model: HuggingFaceTB/SmolVLM2-2.2B-Instruct
 pipeline_tag: image-text-to-text
 license: apache-2.0
+quantized_by: AsadIsmail
 ---
+# SmolVLM2-2.2B-Instruct — Ternary Quantized (tritplane3)
+**Ternary-quantized version** of [HuggingFaceTB/SmolVLM2-2.2B-Instruct](https://huggingface.co/HuggingFaceTB/SmolVLM2-2.2B-Instruct) using [ternary-quant](https://github.com/Asad-Ismail/ternary-quant).
+Compact VLM designed for edge deployment, now even smaller with ternary quantization.
+## Model Specifications
+| Property | Value |
+|---|---|
+| **Base Model** | [HuggingFaceTB/SmolVLM2-2.2B-Instruct](https://huggingface.co/HuggingFaceTB/SmolVLM2-2.2B-Instruct) |
+| **Parameters** | 2.2B |
+| **Architecture** | VLM (image + text) |
+| **Quantization** | tritplane3 (169 layers, 10.92 effective bits) |
+| **Vision Encoder** | FP16 (preserved) |
+| **Compression** | 1.47x |
+| **Avg Reconstruction Error** | 0.1236 |
+| **License** | Apache 2.0 |
+## Size Comparison
+| Method | Size | VLM Support |
+|---|---|---|
+| FP16 (original) | ~4.4 GB | Yes |
+| **Ternary tritplane3** | **1.8 GB** | **Yes** |
+**No GGUF alternative exists for SmolVLM2.**
+## Quality Verification
+Validated during quantization (collapse score: 0.009 — excellent):
+| Test | Output |
+|---|---|
+| Image description (demo) | "A yellow circle with a diagonal line through it" (correct) |
+| "What is machine learning?" | Correct, detailed explanation of ML, algorithms, training |
+| "Explain gravity" | Accurate one-sentence explanation |
+## Memory Requirements
+| Runtime | Min Memory | Hardware |
+|---|---|---|
+| `cached` (CPU) | ~4 GB RAM | Any |
+| `metal` (Apple Silicon) | ~3 GB unified | M1+ |
+| `cached` (CUDA) | ~3 GB VRAM | Any NVIDIA GPU |
+Ideal for edge deployment — runs on devices with 4 GB RAM.
+## Quickstart
+```bash
+pip install ternary-quant
+```
 ```python
 from ternary_quant.inference import load_ternary_model
 model, processor = load_ternary_model(
     "AsadIsmail/SmolVLM2-2.2B-Instruct-ternary",
+    runtime_mode="cached", device="auto"
 )
+inputs = processor(text="Describe this image", return_tensors="pt").to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=128)
 print(processor.decode(outputs[0], skip_special_tokens=True))
 ```
+## Collection
+Part of [ternary-models](https://huggingface.co/collections/AsadIsmail/ternary-models-vlms-multimodal-and-audio-69df85ff0b776624d6645d2a).
+GitHub: [github.com/Asad-Ismail/ternary-models](https://github.com/Asad-Ismail/ternary-models) | Library: [github.com/Asad-Ismail/ternary-quant](https://github.com/Asad-Ismail/ternary-quant)