AsadIsmail commited on
Commit
8ae73d4
·
verified ·
1 Parent(s): 31244fc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +54 -32
README.md CHANGED
@@ -13,56 +13,78 @@ tags:
13
  base_model: HuggingFaceTB/SmolVLM2-2.2B-Instruct
14
  pipeline_tag: image-text-to-text
15
  license: apache-2.0
 
16
  ---
17
 
18
- # SmolVLM2-2.2B-Instruct — Ternary Quantized
19
 
20
- Ternary-quantized version of [HuggingFaceTB/SmolVLM2-2.2B-Instruct](https://huggingface.co/HuggingFaceTB/SmolVLM2-2.2B-Instruct),
21
- produced with [ternary-quant](https://github.com/Asad-Ismail/ternary-quant).
22
 
23
- SmolVLM2 is HuggingFace's compact vision-language model designed for edge deployment. The
24
- ternary-quantized version pushes it even further — making it feasible for mobile and IoT devices.
25
 
26
- ## Quantization details
27
 
28
- | Metric | Value |
29
- |--------|-------|
30
- | **Scheme** | tritplane3 (3-plane progressive ternary) |
31
- | **Components quantized** | text_backbone, multimodal_connector (169 linear layers) |
32
- | **Vision encoder** | Kept in FP16 |
33
- | **Full-model effective bits** | 10.92 |
34
- | **Compression ratio** | 1.47x |
35
- | **Avg reconstruction error** | 0.1236 |
36
- | **Validation** | Passed (correctly describes demo image) |
 
37
 
38
- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ```python
41
  from ternary_quant.inference import load_ternary_model
42
 
43
  model, processor = load_ternary_model(
44
  "AsadIsmail/SmolVLM2-2.2B-Instruct-ternary",
45
- runtime_mode="cached"
46
  )
47
 
48
- from PIL import Image
49
- image = Image.open("photo.jpg")
50
- inputs = processor(text="Describe this image", images=image, return_tensors="pt")
51
- inputs = {k: v.to(model.device) for k, v in inputs.items()}
52
  outputs = model.generate(**inputs, max_new_tokens=128)
53
  print(processor.decode(outputs[0], skip_special_tokens=True))
54
  ```
55
 
56
- ## Reproduce
57
-
58
- ```bash
59
- pip install ternary-quant
60
- ternary-quant quantize-broad HuggingFaceTB/SmolVLM2-2.2B-Instruct \
61
- --output ./SmolVLM2-2.2B-Instruct-ternary \
62
- --components text_backbone multimodal_connector \
63
- --scheme tritplane3 --dtype float16 --eval
64
- ```
65
 
66
- ## Part of the ternary-models collection
67
 
68
- [github.com/Asad-Ismail/ternary-models](https://github.com/Asad-Ismail/ternary-models)
 
13
  base_model: HuggingFaceTB/SmolVLM2-2.2B-Instruct
14
  pipeline_tag: image-text-to-text
15
  license: apache-2.0
16
+ quantized_by: AsadIsmail
17
  ---
18
 
19
+ # SmolVLM2-2.2B-Instruct — Ternary Quantized (tritplane3)
20
 
21
+ **Ternary-quantized version** of [HuggingFaceTB/SmolVLM2-2.2B-Instruct](https://huggingface.co/HuggingFaceTB/SmolVLM2-2.2B-Instruct) using [ternary-quant](https://github.com/Asad-Ismail/ternary-quant).
 
22
 
23
+ Compact VLM designed for edge deployment, now even smaller with ternary quantization.
 
24
 
25
+ ## Model Specifications
26
 
27
+ | Property | Value |
28
+ |---|---|
29
+ | **Base Model** | [HuggingFaceTB/SmolVLM2-2.2B-Instruct](https://huggingface.co/HuggingFaceTB/SmolVLM2-2.2B-Instruct) |
30
+ | **Parameters** | 2.2B |
31
+ | **Architecture** | VLM (image + text) |
32
+ | **Quantization** | tritplane3 (169 layers, 10.92 effective bits) |
33
+ | **Vision Encoder** | FP16 (preserved) |
34
+ | **Compression** | 1.47x |
35
+ | **Avg Reconstruction Error** | 0.1236 |
36
+ | **License** | Apache 2.0 |
37
 
38
+ ## Size Comparison
39
+
40
+ | Method | Size | VLM Support |
41
+ |---|---|---|
42
+ | FP16 (original) | ~4.4 GB | Yes |
43
+ | **Ternary tritplane3** | **1.8 GB** | **Yes** |
44
+
45
+ **No GGUF alternative exists for SmolVLM2.**
46
+
47
+ ## Quality Verification
48
+
49
+ Validated during quantization (collapse score: 0.009 — excellent):
50
+
51
+ | Test | Output |
52
+ |---|---|
53
+ | Image description (demo) | "A yellow circle with a diagonal line through it" (correct) |
54
+ | "What is machine learning?" | Correct, detailed explanation of ML, algorithms, training |
55
+ | "Explain gravity" | Accurate one-sentence explanation |
56
+
57
+ ## Memory Requirements
58
+
59
+ | Runtime | Min Memory | Hardware |
60
+ |---|---|---|
61
+ | `cached` (CPU) | ~4 GB RAM | Any |
62
+ | `metal` (Apple Silicon) | ~3 GB unified | M1+ |
63
+ | `cached` (CUDA) | ~3 GB VRAM | Any NVIDIA GPU |
64
+
65
+ Ideal for edge deployment — runs on devices with 4 GB RAM.
66
+
67
+ ## Quickstart
68
+
69
+ ```bash
70
+ pip install ternary-quant
71
+ ```
72
 
73
  ```python
74
  from ternary_quant.inference import load_ternary_model
75
 
76
  model, processor = load_ternary_model(
77
  "AsadIsmail/SmolVLM2-2.2B-Instruct-ternary",
78
+ runtime_mode="cached", device="auto"
79
  )
80
 
81
+ inputs = processor(text="Describe this image", return_tensors="pt").to(model.device)
 
 
 
82
  outputs = model.generate(**inputs, max_new_tokens=128)
83
  print(processor.decode(outputs[0], skip_special_tokens=True))
84
  ```
85
 
86
+ ## Collection
 
 
 
 
 
 
 
 
87
 
88
+ Part of [ternary-models](https://huggingface.co/collections/AsadIsmail/ternary-models-vlms-multimodal-and-audio-69df85ff0b776624d6645d2a).
89
 
90
+ GitHub: [github.com/Asad-Ismail/ternary-models](https://github.com/Asad-Ismail/ternary-models) | Library: [github.com/Asad-Ismail/ternary-quant](https://github.com/Asad-Ismail/ternary-quant)