Twkeed Vision (ุชูููุฏ ููุฑุคูุฉ)
Arabic Vision-Language Model for OCR and Document Understanding, based on Qwen3-VL-4B.
Model Details
- Base Model: Qwen3-VL-4B-Instruct-4bit
- Fine-tuned for: Arabic OCR Text Understanding, Document Understanding
- Framework: MLX (Apple Silicon optimized)
- Type: LoRA Adapters
- Parameters: 4B base + LoRA adapters
Identity
When asked "ู ู ุฃูุชุ" (Who are you?), the model responds:
ุฃูุง ุชูููุฏ ููุฑุคูุฉุ ู ุณุงุนุฏ ุฐูู ู ุชุฎุตุต ูู ูุฑุงุกุฉ ุงููุตูุต ุงูุนุฑุจูุฉ ู ู ุงูุตูุฑ ูุงูู ุณุชูุฏุงุช
Capabilities
- Arabic OCR Understanding: Understand and process Arabic OCR text
- Document Understanding: Extract information from Arabic documents
- Receipt/Invoice Processing: Parse Arabic receipts and invoices
- ID Recognition: Read Saudi IDs and official documents
- Text Recognition: Handle various Arabic fonts and text styles
- 32-Language OCR: Built-in support for 32 languages including Arabic
Usage
import mlx.core as mx
from mlx_vlm import load, generate
from mlx_vlm.trainer import get_peft_model
# Load base model
model, processor = load("mlx-community/Qwen3-VL-4B-Instruct-4bit")
# Apply LoRA structure
target_modules = ["q_proj", "v_proj", "k_proj", "o_proj"]
model = get_peft_model(model, linear_layers=target_modules, rank=16, alpha=2.0, dropout=0.05, freeze=True)
# Load adapters (download from this repo)
adapter_weights = mx.load("path/to/adapters.safetensors")
# Strip language_model prefix
stripped_weights = {k.replace('language_model.', ''): v for k, v in adapter_weights.items()}
model.language_model.load_weights(list(stripped_weights.items()), strict=False)
# Generate with Arabic prompt
prompt = "<|im_start|>user\nู
ู ุฃูุชุ<|im_end|>\n<|im_start|>assistant\n"
result = generate(model, processor, prompt, max_tokens=256)
print(result.text)
Training
Fine-tuned using:
- Hardware: Mac Studio M3 Ultra 96GB
- Framework: mlx-vlm
- Method: LoRA (Low-Rank Adaptation)
- Target Modules: q_proj, k_proj, v_proj, o_proj
- Rank: 16
- Alpha: 32
- Data: Arabic OCR datasets, document understanding examples
- Epochs: 3
- Steps: 2000+
- Final Loss: ~0.09
Files
adapters.safetensors- LoRA adapter weights (47MB)adapter_config.json- LoRA configuration
Qwen3-VL-4B Features
- DeepStack ViT: Enhanced vision encoder
- 32-Language OCR: Built-in multilingual OCR support
- Improved Arabic: Better Arabic text handling than Qwen2.5
License
Apache 2.0
Acknowledgments
- Base model: Qwen Team (Alibaba)
- MLX framework: Apple
- Training framework: mlx-vlm
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for twkeed-sa/twkeed-vision
Base model
mlx-community/Qwen3-VL-4B-Instruct-4bit