Instructions to use Kushtrim/Qwen3-VL-2B-Instruct-Shqip with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Kushtrim/Qwen3-VL-2B-Instruct-Shqip with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Kushtrim/Qwen3-VL-2B-Instruct-Shqip", filename="qwen3-vl-2b-instruct.BF16-mmproj.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Kushtrim/Qwen3-VL-2B-Instruct-Shqip with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M
Use Docker
docker model run hf.co/Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Kushtrim/Qwen3-VL-2B-Instruct-Shqip with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Kushtrim/Qwen3-VL-2B-Instruct-Shqip" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Kushtrim/Qwen3-VL-2B-Instruct-Shqip", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M
- Ollama
How to use Kushtrim/Qwen3-VL-2B-Instruct-Shqip with Ollama:
ollama run hf.co/Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M
- Unsloth Studio new
How to use Kushtrim/Qwen3-VL-2B-Instruct-Shqip with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Kushtrim/Qwen3-VL-2B-Instruct-Shqip to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Kushtrim/Qwen3-VL-2B-Instruct-Shqip to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Kushtrim/Qwen3-VL-2B-Instruct-Shqip to start chatting
- Pi new
How to use Kushtrim/Qwen3-VL-2B-Instruct-Shqip with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Kushtrim/Qwen3-VL-2B-Instruct-Shqip with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use Kushtrim/Qwen3-VL-2B-Instruct-Shqip with Docker Model Runner:
docker model run hf.co/Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M
- Lemonade
How to use Kushtrim/Qwen3-VL-2B-Instruct-Shqip with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Kushtrim/Qwen3-VL-2B-Instruct-Shqip:Q4_K_M
Run and chat with the model
lemonade run user.Qwen3-VL-2B-Instruct-Shqip-Q4_K_M
List all available models
lemonade list
Qwen3-VL-2B-Instruct-Shqip
Kushtrim/Qwen3-VL-2B-Instruct-Shqip is an Albanian-focused vision-language model based on unsloth/Qwen3-VL-2B-Instruct, finetuned primarily to improve OCR / transcription and document understanding for Albanian newspaper scans (e.g. Bujku).
What’s new vs the base model
- Better Albanian OCR on scanned newspaper-style pages (noisy layouts, multi-column text, artifacts).
- More consistent responses in Albanian (
sq) for OCR and document-style prompts.
Model details
- Type: vision-language (image + text → text)
- Base model:
unsloth/Qwen3-VL-2B-Instruct - Primary language: Albanian (
sq) - License: Apache-2.0 (same as the base model)
Intended use
Use this model for:
- OCR / transcription of Albanian text from images (scanned pages, photos of documents).
- Document understanding: summarization, extraction, Q&A over an image of a page.
- General multimodal chat in Albanian (image captioning, visual Q&A), with best results on document-like inputs.
How it was trained (high level)
- Base:
unsloth/Qwen3-VL-2B-Instruct(derived fromQwen/Qwen3-VL-2B-Instruct) - Finetuning method: LoRA adapters (SFT-style finetuning)
- Tooling: Unsloth + TRL
- Primary dataset:
Kushtrim/bujku_vl_ocr(page image + Albanian transcription)- Local snapshot used during development contains 34 000 images (train split).
- Training instruction:
Transcribe the text in this image.
Practical defaults used by the trainer scripts in this folder (may vary by run):
max_length: 2048- learning rate: 2e-4
- warmup ratio: 0.03
- LoRA:
r=16,alpha=16, dropout 0.0
Usage
🤗 Transformers (recommended)
Qwen3-VL support is in recent
transformers. If you hit import errors, install from source:
pip install git+https://github.com/huggingface/transformers
import torch
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
BASE_ID = "Qwen/Qwen3-VL-2B-Instruct" # <-- the real base model
ADAPTER_ID = "Kushtrim/Qwen3-VL-2B-Instruct-Shqip"
model = Qwen3VLForConditionalGeneration.from_pretrained(
BASE_ID,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, ADAPTER_ID, token=token)
# Optional (often nicer for inference): merge adapter into weights
model = model.merge_and_unload()
processor = AutoProcessor.from_pretrained(
BASE_ID,
trust_remote_code=True,
)
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "BU-19970125_34.png",
},
{"type": "text", "text": "Transcribe the text in this image."},
],
}
]
# Preparation for inference
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
)
inputs = inputs.to(model.device)
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=2048)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
Limitations
- Finetuning focus is OCR/document-style inputs; performance on general visual reasoning may differ from the base model.
- OCR quality depends heavily on image quality (blur, skew, low resolution, heavy compression).
- Historical newspapers may include OCR-hard typography, artifacts, and mixed-language snippets.
Citation
If you use this model in academic work, please cite the base Qwen3 technical report and clearly reference this Albanian finetuned variant.
- Downloads last month
- 66
