Spaces:
Running
Running
| title: README | |
| emoji: ๐ | |
| colorFrom: indigo | |
| colorTo: indigo | |
| sdk: static | |
| pinned: true | |
| thumbnail: >- | |
| https://cdn-uploads.huggingface.co/production/uploads/68979c2673a7d0c259085219/i6ENamHFtDN6lK96DDjAP.jpeg | |
| # Browser-based Locally Hosted Arabic LLM Optimization | |
| This reporsitory contains all the models that were compressed and optimized as part of our final year research project. | |
| For more information about our project, please refer to our [project webpage](https://arabicog.com). | |
| ## Abstract | |
| This work evaluates the effectiveness of several model compression and optimization techniques on large language models, namely quantization, pruning, and knowledge distillation, with a focus on Arabic natural language performance and browser deployment. The primary methods investigated are 8-bit and 4-bit quantization with several methods such as GPTQ, LLM.int8(), and QLoRA, SparseGPT and Wanda for pruning, and a knowledge distillation pipeline based on a Qwen2.5-32B-Instruct teacher model and a Qwen2.5-7B-Instruct student model. The proposed SelecTKD method filters low-confidence teacher tokens during token-level distillation to improve bilingual Arabic-English balance. The report concludes with a comprehensive comparative analysis between the tested compression methods, comparing their compression and accuracy tradeoff and discussing their practical effectiveness for limited- resource deployment. The study also includes staged fine-tuning to adapt the model to Arabic, GCC, and Bahraini contexts using a broad-to-specific curriculum while preserving bilingual performance. |