Update README.md
Browse files
README.md
CHANGED
|
@@ -16,60 +16,118 @@ language:
|
|
| 16 |
pipeline_tag: image-text-to-text
|
| 17 |
---
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
---
|
| 28 |
|
| 29 |
## ๐ง Kemampuan Utama
|
| 30 |
|
| 31 |
-
|
| 32 |
-
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
|
|
|
| 38 |
|
| 39 |
---
|
| 40 |
|
| 41 |
-
## ๐ง
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
| Parameter | Value |
|
| 44 |
|-----------|-------|
|
| 45 |
-
| Base Model | Qwen/Qwen3-VL-4B-Instruct |
|
| 46 |
-
| Method | QLoRA (4-bit NF4) |
|
| 47 |
-
| LoRA Rank | 16 |
|
| 48 |
-
| LoRA Alpha | 32 |
|
| 49 |
-
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
|
| 50 |
-
| Trainable Params | 33,030,144 (0.74%) |
|
| 51 |
-
| Epochs | 3 |
|
| 52 |
-
|
|
| 53 |
-
|
|
| 54 |
-
|
|
| 55 |
-
|
|
| 56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
---
|
| 59 |
|
| 60 |
## ๐ Cara Penggunaan
|
| 61 |
|
| 62 |
-
### Install Dependencies
|
| 63 |
|
| 64 |
```bash
|
| 65 |
pip install transformers>=4.51.0 accelerate>=0.30.0 qwen-vl-utils
|
| 66 |
```
|
| 67 |
|
| 68 |
-
###
|
| 69 |
|
| 70 |
```python
|
| 71 |
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
|
| 72 |
-
from qwen_vl_utils import process_vision_info
|
| 73 |
import torch
|
| 74 |
|
| 75 |
model_id = "kadalicious22/snapgate-VL-4B"
|
|
@@ -82,7 +140,14 @@ model = Qwen3VLForConditionalGeneration.from_pretrained(
|
|
| 82 |
trust_remote_code=True,
|
| 83 |
)
|
| 84 |
|
| 85 |
-
SYSTEM_PROMPT = """Kamu adalah Snapgate AI, asisten AI multimodal milik Snapgate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
|
| 87 |
messages = [
|
| 88 |
{"role": "system", "content": SYSTEM_PROMPT},
|
|
@@ -112,61 +177,56 @@ response = processor.batch_decode(generated, skip_special_tokens=True)[0]
|
|
| 112 |
print(response)
|
| 113 |
```
|
| 114 |
|
| 115 |
-
### Inference Teks Saja
|
| 116 |
|
| 117 |
```python
|
| 118 |
messages = [
|
| 119 |
{"role": "system", "content": SYSTEM_PROMPT},
|
| 120 |
-
{"role": "user", "content": "Buatkan fungsi Python untuk validasi email."},
|
| 121 |
]
|
| 122 |
|
| 123 |
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 124 |
inputs = processor(text=[text], return_tensors="pt").to(model.device)
|
| 125 |
|
| 126 |
with torch.no_grad():
|
| 127 |
-
output_ids = model.generate(**inputs, max_new_tokens=1024)
|
| 128 |
|
| 129 |
-
response = processor.batch_decode(
|
|
|
|
|
|
|
|
|
|
| 130 |
print(response)
|
| 131 |
```
|
| 132 |
|
| 133 |
---
|
| 134 |
|
| 135 |
-
## ๐ Training Loss
|
| 136 |
-
|
| 137 |
-
| Step | Loss |
|
| 138 |
-
|------|------|
|
| 139 |
-
| 5 | 2.419 |
|
| 140 |
-
| 10 | 2.132 |
|
| 141 |
-
| 15 | 1.918 |
|
| 142 |
-
| 20 | 1.736 |
|
| 143 |
-
| 25 | 1.640 |
|
| 144 |
-
| 30 | 1.663 |
|
| 145 |
-
| 35 | 1.584 |
|
| 146 |
-
|
| 147 |
-
Loss turun konsisten dari **2.42 โ 1.58** selama training.
|
| 148 |
-
|
| 149 |
-
---
|
| 150 |
-
|
| 151 |
## โ ๏ธ Limitasi
|
| 152 |
|
| 153 |
-
-
|
| 154 |
-
- Dioptimalkan untuk Bahasa Indonesia dan Inggris
|
| 155 |
-
- Performa terbaik pada task coding dan
|
|
|
|
| 156 |
|
| 157 |
---
|
| 158 |
|
| 159 |
## ๐ Lisensi
|
| 160 |
|
| 161 |
-
|
| 162 |
|
| 163 |
---
|
| 164 |
|
| 165 |
## ๐ Links
|
| 166 |
|
| 167 |
-
|
| 168 |
-
-
|
|
|
|
|
|
|
|
|
|
| 169 |
|
| 170 |
---
|
| 171 |
|
| 172 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
pipeline_tag: image-text-to-text
|
| 17 |
---
|
| 18 |
|
| 19 |
+
<div align="center">
|
| 20 |
|
| 21 |
+
<img src="https://snapgate.tech/logo.png" alt="Snapgate Logo" width="120"/>
|
| 22 |
|
| 23 |
+
# ๐ snapgate-VL-4B
|
| 24 |
|
| 25 |
+
### Vision-Language AI ยท Fine-tuned for Coding & Design
|
| 26 |
+
|
| 27 |
+
[](https://opensource.org/licenses/Apache-2.0)
|
| 28 |
+
[](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct)
|
| 29 |
+
[](https://huggingface.co/kadalicious22/snapgate-VL-4B)
|
| 30 |
+
[](https://snapgate.tech)
|
| 31 |
+
|
| 32 |
+
**snapgate-VL-4B** adalah model vision-language multimodal hasil fine-tuning dari [Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) menggunakan **QLoRA**, dioptimalkan khusus untuk kebutuhan **developer** dan **desainer** โ memahami gambar sekaligus teks dengan presisi tinggi.
|
| 33 |
+
|
| 34 |
+
*Dikembangkan oleh [Snapgate](https://snapgate.tech) ยท Made with โค๏ธ in Indonesia ๐ฎ๐ฉ*
|
| 35 |
+
|
| 36 |
+
</div>
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
## โจ Highlights
|
| 41 |
+
|
| 42 |
+
- ๐ Fine-tuned dari Qwen3-VL-4B dengan QLoRA 4-bit โ efisien & powerful
|
| 43 |
+
- ๐ฏ Dioptimalkan untuk coding dan UI/UX analysis
|
| 44 |
+
- ๐ Bilingual โ Bahasa Indonesia & English
|
| 45 |
+
- ๐ก Hanya 0.74% parameter yang ditraining โ training ringan, hasil maksimal
|
| 46 |
+
- ๐ฆ 200 samples ยท 10 kategori ยท 3 epochs ยท Final loss: **0.444**
|
| 47 |
+
- โก Siap pakai di Google Colab dengan T4 GPU
|
| 48 |
|
| 49 |
---
|
| 50 |
|
| 51 |
## ๐ง Kemampuan Utama
|
| 52 |
|
| 53 |
+
| Kemampuan | Deskripsi |
|
| 54 |
+
|-----------|-----------|
|
| 55 |
+
| ๐ป **Code Generation & Review** | Menulis, menganalisis, debug, dan mengoptimalkan kode (Python, JS, TS, HTML/CSS, SQL, dll.) |
|
| 56 |
+
| ๐จ **UI/UX Design Analysis** | Menganalisis screenshot antarmuka, memberikan saran desain, mengidentifikasi masalah UX |
|
| 57 |
+
| ๐ผ๏ธ **Design to Code** | Mengkonversi mockup, wireframe, atau screenshot UI menjadi kode HTML/CSS/React/Tailwind |
|
| 58 |
+
| ๐๏ธ **Diagram & Architecture** | Memahami diagram alur, arsitektur sistem, ERD, dan flowchart teknis |
|
| 59 |
+
| ๐ธ **Code from Image** | Membaca dan menjelaskan kode dari screenshot atau foto |
|
| 60 |
+
| ๐ **Technical Documentation** | Membuat dokumentasi teknis yang jelas, terstruktur, dan profesional |
|
| 61 |
|
| 62 |
---
|
| 63 |
|
| 64 |
+
## ๐ง Training Configuration
|
| 65 |
+
|
| 66 |
+
<details>
|
| 67 |
+
<summary><b>Klik untuk lihat detail training</b></summary>
|
| 68 |
|
| 69 |
| Parameter | Value |
|
| 70 |
|-----------|-------|
|
| 71 |
+
| ๐ค Base Model | `Qwen/Qwen3-VL-4B-Instruct` |
|
| 72 |
+
| โ๏ธ Method | QLoRA (4-bit NF4) |
|
| 73 |
+
| ๐ข LoRA Rank | 16 |
|
| 74 |
+
| ๐ข LoRA Alpha | 32 |
|
| 75 |
+
| ๐ฏ Target Modules | `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` |
|
| 76 |
+
| ๐ข Trainable Params | 33,030,144 **(0.74% of total)** |
|
| 77 |
+
| ๐ Epochs | 3 |
|
| 78 |
+
| ๐ถ Total Steps | 75 |
|
| 79 |
+
| ๐ Learning Rate | `1e-4` |
|
| 80 |
+
| ๐ฆ Batch Size | 1 (grad accumulation: 8) |
|
| 81 |
+
| โก Optimizer | `paged_adamw_8bit` |
|
| 82 |
+
| ๐๏ธ Precision | `bfloat16` |
|
| 83 |
+
| ๐ฅ๏ธ Hardware | NVIDIA T4 ยท Google Colab |
|
| 84 |
+
| ๐ฆ Dataset | 200 samples internal Snapgate |
|
| 85 |
+
| ๐ท๏ธ Kategori | 10 kategori ยท 20 samples each |
|
| 86 |
+
| ๐ Format | ShareGPT |
|
| 87 |
+
|
| 88 |
+
**Kategori Dataset:**
|
| 89 |
+
`code_generation` ยท `code_review` ยท `debugging` ยท `refactoring` ยท `ui_html_css` ยท `ui_react` ยท `ui_tailwind` ยท `design_system` ยท `ux_analysis` ยท `design_to_code`
|
| 90 |
+
|
| 91 |
+
</details>
|
| 92 |
+
|
| 93 |
+
---
|
| 94 |
+
|
| 95 |
+
## ๐ Training Progress
|
| 96 |
+
|
| 97 |
+
Loss turun konsisten selama training โ dari **1.242 โ 0.444** โ
|
| 98 |
+
|
| 99 |
+
```
|
| 100 |
+
Step 5 โโโโโโโโโโโโโโโโโโโโโโ Loss: 1.242
|
| 101 |
+
Step 10 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.959
|
| 102 |
+
Step 15 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.808
|
| 103 |
+
Step 20 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.671
|
| 104 |
+
Step 25 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.544
|
| 105 |
+
Step 30 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.561
|
| 106 |
+
Step 35 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.513
|
| 107 |
+
Step 40 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.469
|
| 108 |
+
Step 45 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.448
|
| 109 |
+
Step 50 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.465
|
| 110 |
+
Step 55 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.453
|
| 111 |
+
Step 60 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.465
|
| 112 |
+
Step 65 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.465
|
| 113 |
+
Step 70 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.450
|
| 114 |
+
Step 75 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.444
|
| 115 |
+
```
|
| 116 |
|
| 117 |
---
|
| 118 |
|
| 119 |
## ๐ Cara Penggunaan
|
| 120 |
|
| 121 |
+
### 1. Install Dependencies
|
| 122 |
|
| 123 |
```bash
|
| 124 |
pip install transformers>=4.51.0 accelerate>=0.30.0 qwen-vl-utils
|
| 125 |
```
|
| 126 |
|
| 127 |
+
### 2. Load Model
|
| 128 |
|
| 129 |
```python
|
| 130 |
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
|
|
|
|
| 131 |
import torch
|
| 132 |
|
| 133 |
model_id = "kadalicious22/snapgate-VL-4B"
|
|
|
|
| 140 |
trust_remote_code=True,
|
| 141 |
)
|
| 142 |
|
| 143 |
+
SYSTEM_PROMPT = """Kamu adalah Snapgate AI, asisten AI multimodal milik Snapgate \
|
| 144 |
+
yang ahli dalam bidang coding dan UI/UX design."""
|
| 145 |
+
```
|
| 146 |
+
|
| 147 |
+
### 3. Inference dengan Gambar
|
| 148 |
+
|
| 149 |
+
```python
|
| 150 |
+
from qwen_vl_utils import process_vision_info
|
| 151 |
|
| 152 |
messages = [
|
| 153 |
{"role": "system", "content": SYSTEM_PROMPT},
|
|
|
|
| 177 |
print(response)
|
| 178 |
```
|
| 179 |
|
| 180 |
+
### 4. Inference Teks Saja
|
| 181 |
|
| 182 |
```python
|
| 183 |
messages = [
|
| 184 |
{"role": "system", "content": SYSTEM_PROMPT},
|
| 185 |
+
{"role": "user", "content": "Buatkan fungsi Python untuk validasi email dengan regex."},
|
| 186 |
]
|
| 187 |
|
| 188 |
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 189 |
inputs = processor(text=[text], return_tensors="pt").to(model.device)
|
| 190 |
|
| 191 |
with torch.no_grad():
|
| 192 |
+
output_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
|
| 193 |
|
| 194 |
+
response = processor.batch_decode(
|
| 195 |
+
output_ids[:, inputs["input_ids"].shape[1]:],
|
| 196 |
+
skip_special_tokens=True
|
| 197 |
+
)[0]
|
| 198 |
print(response)
|
| 199 |
```
|
| 200 |
|
| 201 |
---
|
| 202 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 203 |
## โ ๏ธ Limitasi
|
| 204 |
|
| 205 |
+
- ๐ฆ Di-training pada dataset internal Snapgate yang relatif kecil (200 samples) โ performa akan terus meningkat seiring penambahan data
|
| 206 |
+
- ๐ Dioptimalkan untuk Bahasa Indonesia dan Inggris; bahasa lain belum diuji
|
| 207 |
+
- ๐ฏ Performa terbaik pada task coding dan UI analysis; kurang optimal untuk domain di luar itu (misal: sains, hukum, medis)
|
| 208 |
+
- ๐ฅ๏ธ Direkomendasikan minimal GPU dengan 8GB VRAM untuk inference yang nyaman
|
| 209 |
|
| 210 |
---
|
| 211 |
|
| 212 |
## ๐ Lisensi
|
| 213 |
|
| 214 |
+
Dirilis di bawah lisensi **Apache 2.0**, mengikuti lisensi base model [Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct).
|
| 215 |
|
| 216 |
---
|
| 217 |
|
| 218 |
## ๐ Links
|
| 219 |
|
| 220 |
+
| | |
|
| 221 |
+
|---|---|
|
| 222 |
+
| ๐ Website | [snapgate.tech](https://snapgate.tech) |
|
| 223 |
+
| ๐ค Base Model | [Qwen/Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) |
|
| 224 |
+
| ๐ง Contact | Via website Snapgate |
|
| 225 |
|
| 226 |
---
|
| 227 |
|
| 228 |
+
<div align="center">
|
| 229 |
+
|
| 230 |
+
*Dibuat dengan โค๏ธ oleh tim **Snapgate** ยท Indonesia ๐ฎ๐ฉ*
|
| 231 |
+
|
| 232 |
+
</div>
|