AINovice2005 commited on
Commit
a113a67
·
verified ·
1 Parent(s): 8d80bbd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -1
README.md CHANGED
@@ -4,4 +4,52 @@ base_model:
4
  - HuggingFaceTB/SmolVLM2-2.2B-Base
5
  pipeline_tag: image-text-to-text
6
  library_name: transformers
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - HuggingFaceTB/SmolVLM2-2.2B-Base
5
  pipeline_tag: image-text-to-text
6
  library_name: transformers
7
+ ---
8
+
9
+ **SmolVLM2‑2.2B‑Base Quantized**
10
+
11
+ ---
12
+
13
+ ### 🚀 Model Description
14
+
15
+ This is a **quantized version** of **SmolVLM2‑2.2B‑Base**, a compact yet powerful vision+language model by Hugging Face. It’s designed for **multimodal understanding**—including images, multi‑image inputs, and videos—while offering **faster and more efficient inference** thanks to quantization. Perfect for on-device and resource-constrained deployments.
16
+
17
+ ---
18
+
19
+ ### 🔧 Base Model Summary
20
+
21
+ * **Name**: SmolVLM2‑2.2B‑Base
22
+ * **Publisher**: Hugging Face TB
23
+ * **Architecture**: Idefics3 vision encoder + SmolLM2‑1.7B text decoder
24
+ * **Modalities**: image, multi-image, video, text
25
+ * **Capabilities**: captioning, VQA, video analysis, diagram understanding, text-in-image reading
26
+
27
+ ---
28
+
29
+ ### 📏 Quantization Details
30
+
31
+
32
+ **Method**: torchao quantization
33
+
34
+ **Weight Precision**: int8
35
+
36
+ **Activation Precision**: int8 dynamic
37
+
38
+ **Technique**: Symmetric mapping
39
+
40
+ Impact: Significant reduction in model size with minimal loss in reasoning, coding, and general instruction-following capabilities.
41
+ ---
42
+
43
+ ### 🎯 Intended Use
44
+
45
+ * On-device or low-VRAM systems (edge, mobile, small GPUs)
46
+ * Multimodal tasks: VQA, captioning, comparing images, video transcription
47
+ * Research on quantized multimodal models
48
+
49
+ ---
50
+
51
+ ### ⚠️ Limitations & Considerations
52
+
53
+ * May underperform compared to full-precision version
54
+ * Only supports the modalities supported by the base model
55
+ ---