Twkeed Vision (ุชูˆูƒูŠุฏ ู„ู„ุฑุคูŠุฉ)

Arabic Vision-Language Model for OCR and Document Understanding, based on Qwen3-VL-4B.

Model Details

  • Base Model: Qwen3-VL-4B-Instruct-4bit
  • Fine-tuned for: Arabic OCR Text Understanding, Document Understanding
  • Framework: MLX (Apple Silicon optimized)
  • Type: LoRA Adapters
  • Parameters: 4B base + LoRA adapters

Identity

When asked "ู…ู† ุฃู†ุชุŸ" (Who are you?), the model responds:

ุฃู†ุง ุชูˆูƒูŠุฏ ู„ู„ุฑุคูŠุฉุŒ ู…ุณุงุนุฏ ุฐูƒูŠ ู…ุชุฎุตุต ููŠ ู‚ุฑุงุกุฉ ุงู„ู†ุตูˆุต ุงู„ุนุฑุจูŠุฉ ู…ู† ุงู„ุตูˆุฑ ูˆุงู„ู…ุณุชู†ุฏุงุช

Capabilities

  • Arabic OCR Understanding: Understand and process Arabic OCR text
  • Document Understanding: Extract information from Arabic documents
  • Receipt/Invoice Processing: Parse Arabic receipts and invoices
  • ID Recognition: Read Saudi IDs and official documents
  • Text Recognition: Handle various Arabic fonts and text styles
  • 32-Language OCR: Built-in support for 32 languages including Arabic

Usage

import mlx.core as mx
from mlx_vlm import load, generate
from mlx_vlm.trainer import get_peft_model

# Load base model
model, processor = load("mlx-community/Qwen3-VL-4B-Instruct-4bit")

# Apply LoRA structure
target_modules = ["q_proj", "v_proj", "k_proj", "o_proj"]
model = get_peft_model(model, linear_layers=target_modules, rank=16, alpha=2.0, dropout=0.05, freeze=True)

# Load adapters (download from this repo)
adapter_weights = mx.load("path/to/adapters.safetensors")
# Strip language_model prefix
stripped_weights = {k.replace('language_model.', ''): v for k, v in adapter_weights.items()}
model.language_model.load_weights(list(stripped_weights.items()), strict=False)

# Generate with Arabic prompt
prompt = "<|im_start|>user\nู…ู† ุฃู†ุชุŸ<|im_end|>\n<|im_start|>assistant\n"
result = generate(model, processor, prompt, max_tokens=256)
print(result.text)

Training

Fine-tuned using:

  • Hardware: Mac Studio M3 Ultra 96GB
  • Framework: mlx-vlm
  • Method: LoRA (Low-Rank Adaptation)
  • Target Modules: q_proj, k_proj, v_proj, o_proj
  • Rank: 16
  • Alpha: 32
  • Data: Arabic OCR datasets, document understanding examples
  • Epochs: 3
  • Steps: 2000+
  • Final Loss: ~0.09

Files

  • adapters.safetensors - LoRA adapter weights (47MB)
  • adapter_config.json - LoRA configuration

Qwen3-VL-4B Features

  • DeepStack ViT: Enhanced vision encoder
  • 32-Language OCR: Built-in multilingual OCR support
  • Improved Arabic: Better Arabic text handling than Qwen2.5

License

Apache 2.0

Acknowledgments

  • Base model: Qwen Team (Alibaba)
  • MLX framework: Apple
  • Training framework: mlx-vlm
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for twkeed-sa/twkeed-vision

Adapter
(1)
this model