Ali0044
/

Qalam-Net

@@ -10,23 +10,53 @@ tags:
 - pytorch
 ---
-# Qalam-Net (قلم-نت): Advanced Arabic OCR (v2 Portable)
-Qalam-Net is a high-performance, cross-backend Optical Character Recognition (OCR) model for Arabic. Patched for **Keras 3**, it supports **JAX**, **PyTorch**, and **TensorFlow**.
-## 🚀 Quick Start (Robust Usage)
-This guide uses a custom **NumPy-based decoder** to ensure compatibility across all Keras 3 backends without needing `tf.keras.backend.ctc_decode`.
-### 1. Installation
-```bash
-pip install -U "keras>=3.0" jax jaxlib huggingface_hub opencv-python
 ```
-### 2. Implementation
 ```python
 import os
-os.environ["KERAS_BACKEND"] = "jax" # Options: "jax", "tensorflow", "torch"
 import keras
 import numpy as np
@@ -35,50 +65,50 @@ from huggingface_hub import hf_hub_download
 class QalamNet:
     def __init__(self, repo_id="Ali0044/Qalam-Net"):
-        # 1. Download and Load Model
         print(f"Loading Qalam-Net from {repo_id}...")
         model_path = hf_hub_download(repo_id=repo_id, filename="model.keras")
         self.model = keras.saving.load_model(model_path)
-        # 2. Define the exact 38-character Arabic Vocabulary
-        # [ALIF, BA, TA, THA, JEEM, HAA, KHAA, DAL, THAL, RA, ZAY, SEEN, SHEEN, SAD, DAD, TAA, ZAA, AIN, GHAIN, FA, QAF, KAF, LAM, MEEM, NOON, HA, WAW, YA, TEH_MARBUTA, ALEF_MAKSURA, ALEF_HAMZA_ABOVE, ALEF_HAMZA_BELOW, ALEF_MADDA, WAW_HAMZA, YEH_HAMZA, HAMZA, SPACE, TATWEEL]
         self.vocab = ['ا', 'ب', 'ت', 'ث', 'ج', 'ح', 'خ', 'د', 'ذ', 'ر', 'ز', 'س', 'ش', 'ص', 'ض', 'ط', 'ظ', 'ع', 'غ', 'ف', 'ق', 'ك', 'ل', 'م', 'ن', 'ه', 'و', 'ي', 'ة', 'ى', 'أ', 'إ', 'آ', 'ؤ', 'ئ', 'ء', ' ', 'ـ']
-    def preprocess(self, image_path):
         img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
         img = cv2.resize(img, (128, 32)) / 255.0
-        img = img.T # Transpose for CRNN architecture
-        img = np.expand_dims(img, axis=(-1, 0))
-        return img.astype(np.float32)
-    def predict(self, image_path):
-        batch_img = self.preprocess(image_path)
-        preds = self.model.predict(batch_img) # Output shape: (1, 32, 39)
-        # 3. NumPy-based CTC Greedy Decoding (Cross-Backend)
-        argmax_preds = np.argmax(preds, axis=-1)[0]
-        # Remove consecutive duplicates
-        unique_indices = [argmax_preds[i] for i in range(len(argmax_preds))
-                          if i == 0 or argmax_preds[i] != argmax_preds[i-1]]
-        # Remove blank index (index 38)
-        blank_index = preds.shape[-1] - 1
-        final_indices = [idx for idx in unique_indices if idx != blank_index]
-        # Map to vocabulary
-        return "".join([self.vocab[idx] for idx in final_indices if idx < len(self.vocab)])
-# Usage
 # ocr = QalamNet()
-# print(f"Predicted Arabic Text: {ocr.predict('sample.png')}")
 ```
-## 🧠 Model Architecture
-Qalam-Net employs a specialized **CNN-BiLSTM-Attention** pipeline:
-- **CNN Backbone**: Extracts high-level spatial features from Arabic script.
-- **BiLSTM Layers**: Captures the sequential nature of right-to-left writing.
-- **Attention Mechanism**: Resolves difficult character boundaries.
 ---
-**Maintained by [Ali Khalid](https://github.com/Ali0044)**

 - pytorch
 ---
+<div align="center">
+  <img src="https://huggingface.co/Ali0044/Qalam-Net/resolve/main/banner.png" width="100%" alt="Qalam-Net Banner">
+  # 🖋️ Qalam-Net (قلم-نت)
+  ### *High-Performance, Cross-Backend Arabic OCR*
+  [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+  [![Framework](https://img.shields.io/badge/Framework-Keras%203-F14B5C.svg)](https://keras.io/)
+  [![Backend](https://img.shields.io/badge/Backend-JAX%20|%20TF%20|%20Torch-blueviolet.svg)](https://keras.io/keras_3/)
+</div>
+---
+## 🌟 Highlights
+- **🚀 Ultra-Fast Inference**: Native JAX/XLA support for accelerated processing.
+- **🧩 Portable Architecture**: Patched (v2) to resolve serialization issues across Keras versions.
+- **🎯 Precision Driven**: CNN + BiLSTM + Self-Attention pipeline optimized for Arabic script.
+- **🔓 Unified Loading**: No custom layers or complex setup required for inference.
+---
+## 📖 How it Works
+The model processes Arabic text images through a sophisticated multi-stage pipeline:
+```mermaid
+graph LR
+    A[Input Image 128x32] --> B[CNN Backbone]
+    B --> C[Spatial Features]
+    C --> D[Dual BiLSTM]
+    D --> E[Self-Attention]
+    E --> F[Softmax Output]
+    F --> G[NumPy CTC Decoder]
+    G --> H[Arabic Text]
 ```
+---
+## 🚀 Quick Start (Robust Usage)
+Use the following implementation to run inference on any platform. This uses a custom **NumPy-based decoder** for 100% cross-backend compatibility.
+<details>
+<summary><b>View Python Implementation</b></summary>
 ```python
 import os
+os.environ["KERAS_BACKEND"] = "jax"
 import keras
 import numpy as np
 class QalamNet:
     def __init__(self, repo_id="Ali0044/Qalam-Net"):
         print(f"Loading Qalam-Net from {repo_id}...")
         model_path = hf_hub_download(repo_id=repo_id, filename="model.keras")
         self.model = keras.saving.load_model(model_path)
         self.vocab = ['ا', 'ب', 'ت', 'ث', 'ج', 'ح', 'خ', 'د', 'ذ', 'ر', 'ز', 'س', 'ش', 'ص', 'ض', 'ط', 'ظ', 'ع', 'غ', 'ف', 'ق', 'ك', 'ل', 'م', 'ن', 'ه', 'و', 'ي', 'ة', 'ى', 'أ', 'إ', 'آ', 'ؤ', 'ئ', 'ء', ' ', 'ـ']
+    def predict(self, image_path):
+        # Preprocessing
         img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
         img = cv2.resize(img, (128, 32)) / 255.0
+        img = np.expand_dims(img.T, axis=(-1, 0)).astype(np.float32)
+        # Inference
+        probs = self.model.predict(img)
+        # CTC Greedy Decoding
+        argmax_preds = np.argmax(probs, axis=-1)[0]
+        unique = [argmax_preds[i] for i in range(len(argmax_preds)) if i==0 or argmax_preds[i]!=argmax_preds[i-1]]
+        final = [idx for idx in unique if idx != 38]
+        return "".join([self.vocab[idx] for idx in final if idx < 38])
 # ocr = QalamNet()
+# print(ocr.predict('text.png'))
 ```
+</details>
+---
+## 📊 Performance & Metrics
+Training was conducted on the **mssqpi/Arabic-OCR-Dataset** over 50 epochs.
+| Metric | Value |
+| :--- | :--- |
+| **Input Shape** | 128 x 32 x 1 (Grayscale) |
+| **Output Classes** | 39 (38 Chars + 1 Blank) |
+| **Final Loss** | ~13.13 |
+| **Val Loss** | ~89.79 |
+| **Framework** | Keras 3.x (Native) |
 ---
+## 🤝 Acknowledgments
+Developed and maintained by **[Ali Khalid](https://github.com/Ali0044)**. This model is part of a comparative research study on Arabic OCR architectures.
+---
+> [!TIP]
+> **Pro Tip**: Use the **JAX** backend for the fastest inference times on modern CPUs and GPUs!