# Azhar_Model_v0.2_Final ## ๐Ÿ“œ Project Description This model is a fine-tuned version of Qwen2.5-7B optimized for Islamic Jurisprudence (Fiqh) using the Shamela Library corpus. ## ๐Ÿ“Š Training Specifications - **Base Model:** Qwen2.5-7B - **Training Data:** 20,000 high-quality juristic records from Shamela. - **Total Processed Tokens:** ~24.5 Million Tokens. - **Optimization Steps:** 3,000 Steps. - **Effective Batch Size:** 16 (via Gradient Accumulation). - **Hardware:** Kaggle Dual T4 GPUs. - **Total Processed Tokens:** 24,576,000 - **Loss Improvement:** 85.21% - **Final Perplexity:** 9.68 - ### ๐Ÿ“ˆ Performance Gains | Metric | Initial Value | Final Value | Improvement | | :--- | :--- | :--- | :--- | | **Cross-Entropy Loss** | 15.3510 | 2.2705 | **85.21%** | | **Perplexity (PPL)** | 4,643,546.51 | **9.68** | **99.99%** | ### ๐Ÿงช Qualitative Comparison Results: The model was evaluated across 4 paradigms. The **Azhar Hybrid** approach showed: - **Accuracy:** Significant reduction in juristic hallucinations compared to the Base model. - **Linguistic Style:** Successful adoption of classical 'Shamela' phrasing and scholarly terminology. - **Reliability:** High consistency in providing evidence-based rulings (Fiqh). ## ๐Ÿงช Qualitative Evaluation (Quad-Comparison Analysis) To ensure reliability, the model was tested across four different paradigms: 1. **Base Model:** Showed 15% accuracy with significant juristic hallucinations. 2. **RAG Only:** Accurate but lacked scholarly linguistic style. 3. **FT Only (Azhar v0.2):** Demonstrated "Juristic Intuition" and high fluency in classical Arabic. 4. **Hybrid (FT+RAG):** The optimal configuration, achieving the highest scores in both factual accuracy and stylistic authenticity. ## ๐Ÿ“‚ Verification Files The full comparison results (Base vs RAG vs FT vs Hybrid) are available in the attached `Azhar_Model_Quad_Comparison_v0.1.csv` file in this repository. ## ๐Ÿ“‰ Visualized Convergence The training exhibited a stable downward trend in both Loss and Perplexity curves, indicating successful domain adaptation without catastrophic forgetting. ## โš–๏ธ Ethical Considerations & Use Case This model is intended for academic and research purposes to assist scholars in navigating classical texts. It should be used as a supplementary tool alongside traditional scholarly verification. --- **Developed by:** MSc. Shamil Al-Mohammedi This model serves as the primary artifact for the research paper: *"Fine-Tuning Large Language Models on Classical Arabic Juristic Corpora: A Case Study on the Shamela Library."*