Update README.md

6af865d verified 2 days ago

7.53 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3-VL-4B-Instruct
	tags:
	- qwen3_vl
	- vision-language
	- multimodal
	- fine-tuned
	- qlora
	- safetensors
	- coding
	- design
	language:
	- id
	- en
	pipeline_tag: image-text-to-text
	---

	<div align="center">

	<img src="https://snapgate.tech/img/snapgatelogo.jpg" alt="Snapgate Logo" width="120"/>

	# 🌐 snapgate-VL-4B

	### Vision-Language AI · Fine-tuned for Coding & Design

	[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
	[![Base Model](https://img.shields.io/badge/Base-Qwen3--VL--4B-orange)](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct)
	[![Language](https://img.shields.io/badge/Language-ID%20%7C%20EN-green)](https://huggingface.co/kadalicious22/snapgate-VL-4B)
	[![Website](https://img.shields.io/badge/Website-snapgate.tech-purple)](https://snapgate.tech)

	snapgate-code-4B is a multimodal vision-language model fine-tuned from [Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) using QLoRA, specifically optimized for developers and designers — understanding both images and text with high precision.

	Developed by [Snapgate](https://snapgate.tech) · Made with ❤️ in Indonesia 🇮🇩

	</div>

	---

	## 🧠 Core Capabilities

	\| Capability \| Description \|
	\|-----------\|-----------\|
	\| 💻 Code Generation & Review \| Write, analyze, debug, and optimize code (Python, JS, TS, HTML/CSS, SQL, etc.) \|
	\| 🎨 UI/UX Design Analysis \| Analyze interface screenshots, provide design suggestions, identify UX issues \|
	\| 🖼️ Design to Code \| Convert mockups, wireframes, or UI screenshots into HTML/CSS/React/Tailwind code \|
	\| 🏗️ Diagram & Architecture \| Understand flowcharts, system architecture, ERDs, and technical diagrams \|
	\| 📸 Code from Image \| Read and explain code from screenshots or photos \|
	\| 📝 Technical Documentation \| Generate clear, structured, and professional technical documentation \|

	---

	## 🔧 Training Configuration

	<details>
	<summary><b>Click to view training details</b></summary>

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| 🤖 Base Model \| `Qwen/Qwen3-VL-4B-Instruct` \|
	\| ⚙️ Method \| QLoRA (4-bit NF4) \|
	\| 🔢 LoRA Rank \| 16 \|
	\| 🔢 LoRA Alpha \| 32 \|
	\| 🎯 Target Modules \| `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` \|
	\| 🔢 Trainable Params \| 33,030,144 (0.74% of total) \|
	\| 🔄 Epochs \| 3 \|
	\| 📶 Total Steps \| 75 \|
	\| 📈 Learning Rate \| `1e-4` \|
	\| 📦 Batch Size \| 1 (grad accumulation: 8) \|
	\| ⚡ Optimizer \| `paged_adamw_8bit` \|
	\| 🎛️ Precision \| `bfloat16` \|
	\| 🖥️ Hardware \| NVIDIA T4 · Google Colab \|
	\| 📦 Dataset \| 200 samples internal Snapgate \|
	\| 🏷️ Categories \| 10 categories · 20 samples each \|
	\| 📊 Format \| ShareGPT \|

	Dataset Categories:
	`code_generation` · `code_review` · `debugging` · `refactoring` · `ui_html_css` · `ui_react` · `ui_tailwind` · `design_system` · `ux_analysis` · `design_to_code`

	</details>

	---

	## 📊 Training Progress

	Loss decreased consistently throughout training — from 1.242 → 0.444 ✅

	```
	Step 5 │███░░░░░░░░░░░░░░░░░│ Loss: 1.242
	Step 10 │██████░░░░░░░░░░░░░░│ Loss: 0.959
	Step 15 │████████░░░░░░░░░░░░│ Loss: 0.808
	Step 20 │██████████░░░░░░░░░░│ Loss: 0.671
	Step 25 │████████████░░░░░░░░│ Loss: 0.544
	Step 30 │████████████░░░░░░░░│ Loss: 0.561
	Step 35 │█████████████░░░░░░░│ Loss: 0.513
	Step 40 │█████████████░░░░░░░│ Loss: 0.469
	Step 45 │██████████████░░░░░░│ Loss: 0.448
	Step 50 │██████████████░░░░░░│ Loss: 0.465
	Step 55 │██████████████░░░░░░│ Loss: 0.453
	Step 60 │██████████████░░░░░░│ Loss: 0.465
	Step 65 │██████████████░░░░░░│ Loss: 0.465
	Step 70 │██████████████░░░░░░│ Loss: 0.450
	Step 75 │██████████████░░░░░░│ Loss: 0.444
	```

	---

	## 🚀 Usage

	### 1. Install Dependencies

	```bash
	pip install transformers>=4.51.0 accelerate>=0.30.0 qwen-vl-utils
	```

	### 2. Load Model

	```python
	from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
	import torch

	model_id = "kadalicious22/snapgate-VL-4B"

	processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
	model = Qwen3VLForConditionalGeneration.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True,
	)

	SYSTEM_PROMPT = """You are Snapgate AI, a multimodal AI assistant by Snapgate \
	specialized in coding and UI/UX design."""
	```

	### 3. Inference with Image

	```python
	from qwen_vl_utils import process_vision_info

	messages = [
	{"role": "system", "content": SYSTEM_PROMPT},
	{
	"role": "user",
	"content": [
	{"type": "image", "image": "path/to/your/image.png"},
	{"type": "text", "text": "Analyze the UI from this image and generate its HTML/CSS code."},
	],
	},
	]

	text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	image_inputs, video_inputs = process_vision_info(messages)
	inputs = processor(
	text=[text],
	images=image_inputs,
	videos=video_inputs,
	return_tensors="pt",
	).to(model.device)

	with torch.no_grad():
	output_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=False)

	generated = output_ids[:, inputs["input_ids"].shape[1]:]
	response = processor.batch_decode(generated, skip_special_tokens=True)[0]
	print(response)
	```

	### 4. Text-Only Inference

	```python
	messages = [
	{"role": "system", "content": SYSTEM_PROMPT},
	{"role": "user", "content": "Write a Python function to validate email using regex."},
	]

	text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = processor(text=[text], return_tensors="pt").to(model.device)

	with torch.no_grad():
	output_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=False)

	response = processor.batch_decode(
	output_ids[:, inputs["input_ids"].shape[1]:],
	skip_special_tokens=True
	)[0]
	print(response)
	```

	---

	## ⚠️ Limitations

	- 📦 Trained on a relatively small internal Snapgate dataset (200 samples) — performance will improve as more data is added
	- 🌏 Optimized for Indonesian and English; other languages have not been tested
	- 🎯 Best performance on coding and UI analysis tasks; less optimal for other domains (e.g., science, law, medicine)
	- 🖥️ A GPU with at least 8GB VRAM is recommended for comfortable inference

	---

	## 📄 License

	Released under the Apache 2.0 license, following the base model license of [Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct).

	---

	## 🔗 Links

	\| \| \|
	\|---\|---\|
	\| 🌐 Website \| [snapgate.tech](https://snapgate.tech) \|
	\| 🤗 Base Model \| [Qwen/Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) \|
	\| 📧 Contact \| Via Snapgate website \|

	---