---
license: apache-2.0
base_model: Qwen/Qwen3-VL-4B-Instruct
tags:
- qwen3_vl
- vision-language
- multimodal
- fine-tuned
- qlora
- safetensors
- coding
- design
language:
- id
- en
pipeline_tag: image-text-to-text
---

# ๐ snapgate-VL-4B
### Vision-Language AI ยท Fine-tuned for Coding & Design
[](https://opensource.org/licenses/Apache-2.0)
[](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct)
[](https://huggingface.co/kadalicious22/snapgate-VL-4B)
[](https://snapgate.tech)
**snapgate-code-4B** is a multimodal vision-language model fine-tuned from [Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) using **QLoRA**, specifically optimized for **developers** and **designers** โ understanding both images and text with high precision.
*Developed by [Snapgate](https://snapgate.tech) ยท Made with โค๏ธ in Indonesia ๐ฎ๐ฉ*
---
## ๐ง Core Capabilities
| Capability | Description |
|-----------|-----------|
| ๐ป **Code Generation & Review** | Write, analyze, debug, and optimize code (Python, JS, TS, HTML/CSS, SQL, etc.) |
| ๐จ **UI/UX Design Analysis** | Analyze interface screenshots, provide design suggestions, identify UX issues |
| ๐ผ๏ธ **Design to Code** | Convert mockups, wireframes, or UI screenshots into HTML/CSS/React/Tailwind code |
| ๐๏ธ **Diagram & Architecture** | Understand flowcharts, system architecture, ERDs, and technical diagrams |
| ๐ธ **Code from Image** | Read and explain code from screenshots or photos |
| ๐ **Technical Documentation** | Generate clear, structured, and professional technical documentation |
---
## ๐ง Training Configuration
Click to view training details
| Parameter | Value |
|-----------|-------|
| ๐ค Base Model | `Qwen/Qwen3-VL-4B-Instruct` |
| โ๏ธ Method | QLoRA (4-bit NF4) |
| ๐ข LoRA Rank | 16 |
| ๐ข LoRA Alpha | 32 |
| ๐ฏ Target Modules | `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` |
| ๐ข Trainable Params | 33,030,144 **(0.74% of total)** |
| ๐ Epochs | 3 |
| ๐ถ Total Steps | 75 |
| ๐ Learning Rate | `1e-4` |
| ๐ฆ Batch Size | 1 (grad accumulation: 8) |
| โก Optimizer | `paged_adamw_8bit` |
| ๐๏ธ Precision | `bfloat16` |
| ๐ฅ๏ธ Hardware | NVIDIA T4 ยท Google Colab |
| ๐ฆ Dataset | 200 samples internal Snapgate |
| ๐ท๏ธ Categories | 10 categories ยท 20 samples each |
| ๐ Format | ShareGPT |
**Dataset Categories:**
`code_generation` ยท `code_review` ยท `debugging` ยท `refactoring` ยท `ui_html_css` ยท `ui_react` ยท `ui_tailwind` ยท `design_system` ยท `ux_analysis` ยท `design_to_code`
---
## ๐ Training Progress
Loss decreased consistently throughout training โ from **1.242 โ 0.444** โ
```
Step 5 โโโโโโโโโโโโโโโโโโโโโโ Loss: 1.242
Step 10 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.959
Step 15 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.808
Step 20 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.671
Step 25 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.544
Step 30 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.561
Step 35 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.513
Step 40 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.469
Step 45 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.448
Step 50 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.465
Step 55 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.453
Step 60 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.465
Step 65 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.465
Step 70 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.450
Step 75 โโโโโโโโโโโโโโโโโโโโโโ Loss: 0.444
```
---
## ๐ Usage
### 1. Install Dependencies
```bash
pip install transformers>=4.51.0 accelerate>=0.30.0 qwen-vl-utils
```
### 2. Load Model
```python
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
model_id = "kadalicious22/snapgate-VL-4B"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = Qwen3VLForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
SYSTEM_PROMPT = """You are Snapgate AI, a multimodal AI assistant by Snapgate \
specialized in coding and UI/UX design."""
```
### 3. Inference with Image
```python
from qwen_vl_utils import process_vision_info
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": [
{"type": "image", "image": "path/to/your/image.png"},
{"type": "text", "text": "Analyze the UI from this image and generate its HTML/CSS code."},
],
},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
return_tensors="pt",
).to(model.device)
with torch.no_grad():
output_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
generated = output_ids[:, inputs["input_ids"].shape[1]:]
response = processor.batch_decode(generated, skip_special_tokens=True)[0]
print(response)
```
### 4. Text-Only Inference
```python
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "Write a Python function to validate email using regex."},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], return_tensors="pt").to(model.device)
with torch.no_grad():
output_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
response = processor.batch_decode(
output_ids[:, inputs["input_ids"].shape[1]:],
skip_special_tokens=True
)[0]
print(response)
```
---
## โ ๏ธ Limitations
- ๐ฆ Trained on a relatively small internal Snapgate dataset (200 samples) โ performance will improve as more data is added
- ๐ Optimized for Indonesian and English; other languages have not been tested
- ๐ฏ Best performance on coding and UI analysis tasks; less optimal for other domains (e.g., science, law, medicine)
- ๐ฅ๏ธ A GPU with at least 8GB VRAM is recommended for comfortable inference
---
## ๐ License
Released under the **Apache 2.0** license, following the base model license of [Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct).
---
## ๐ Links
| | |
|---|---|
| ๐ Website | [snapgate.tech](https://snapgate.tech) |
| ๐ค Base Model | [Qwen/Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) |
| ๐ง Contact | Via Snapgate website |
---