AI & ML interests

Model compression and optimization methods, such as quantization, pruning, distillation, and fine-tuning.

Recent Activity

s-y-a-n  updated a Space 17 days ago
ar-llm-browser/README
s-y-a-n  published a Space 17 days ago
ar-llm-browser/README
s-y-a-n  updated a model 21 days ago
ar-llm-browser/ALLaM-7B-w8a8
View all activity

Organization Card

Browser-based Locally Hosted Arabic LLM Optimization

This reporsitory contains all the models that were compressed and optimized as part of our final year research project.

For more information about our project, please refer to our project webpage.

Abstract

This work evaluates the effectiveness of several model compression and optimization techniques on large language models, namely quantization, pruning, and knowledge distillation, with a focus on Arabic natural language performance and browser deployment. The primary methods investigated are 8-bit and 4-bit quantization with several methods such as GPTQ, LLM.int8(), and QLoRA, SparseGPT and Wanda for pruning, and a knowledge distillation pipeline based on a Qwen2.5-32B-Instruct teacher model and a Qwen2.5-7B-Instruct student model. The proposed SelecTKD method filters low-confidence teacher tokens during token-level distillation to improve bilingual Arabic-English balance. The report concludes with a comprehensive comparative analysis between the tested compression methods, comparing their compression and accuracy tradeoff and discussing their practical effectiveness for limited- resource deployment. The study also includes staged fine-tuning to adapt the model to Arabic, GCC, and Bahraini contexts using a broad-to-specific curriculum while preserving bilingual performance.

datasets 0

None public yet