Ali0044
/

Qalam-Net

@@ -4,93 +4,84 @@ license: apache-2.0
 tags:
 - ocr
 - arabic
-- tensorflow
 - keras
-- crnn
-- attention
-datasets:
-- mssqpi/Arabic-OCR-Dataset
-metrics:
-- accuracy
-- character-error-rate
-model-index:
-- name: Qalam-Net
-  results:
-  - task:
-      type: optical-character-recognition
-      name: Optical Character Recognition
-    dataset:
-      name: Arabic-OCR-Dataset
-      type: mssqpi/Arabic-OCR-Dataset
-    metrics:
-    - type: accuracy
-      value: 0.92
-      name: Character Accuracy
 ---
 # Qalam-Net (قلم-نت): Advanced Arabic OCR
-Qalam-Net is a high-performance Optical Character Recognition (OCR) model specifically designed for the complexities of the Arabic script. It utilizes a hybrid architecture combining Convolutional Neural Networks (CNN) for feature extraction, Bidirectional Long Short-Term Memory (BiLSTM) for sequence modeling, and an Attention mechanism to improve character localization in cursive text.
-## Model Highlights
-- **Architecture**: CNN-BiLSTM-Attention (CRNN) with CTC Loss.
-- **Language Support**: Modern Standard Arabic (MSA) and common cursive variations.
-- **Performance**: Optimized for 128x32 grayscale images.
-- **Robustness**: Handles various fonts, ligatures, and diacritics common in Arabic.
-## Architecture Overview
-The model consists of:
-1.  **CNN Backbone**: Multiple `Conv2D` layers with `BatchNormalization` and `MaxPooling` to extract spatial features from the 128x32 input.
-2.  **Sequence Modeling**: Two layers of `Bidirectional LSTM` (128 units) to capture temporal dependencies in the Arabic script sequence.
-3.  **Attention Layer**: A self-attention mechanism that weighs the importance of different spatial features before final character prediction.
-4.  **CTC Output**: Connectionist Temporal Classification (CTC) layer for mapping variable-length input sequences to character labels without explicit alignment.
-## Usage Instructions
-### Dependencies
 ```bash
-pip install tensorflow opencv-python numpy
 ```
-### Loading and Prediction
 ```python
-import tensorflow as tf
 import numpy as np
 import cv2
-# Custom CTCLayer required for loading the training model
-class CTCLayer(tf.keras.layers.Layer):
-    def __init__(self, name=None, **kwargs):
-        super().__init__(name=name, **kwargs)
-        self.loss_fn = tf.keras.backend.ctc_batch_cost
-    def call(self, y_true, y_pred):
-        # Implementation of CTC loss calculation
-        return y_pred
-# Load the model
-model = tf.keras.models.load_model('Qalam-Net.keras', custom_objects={'CTCLayer': CTCLayer})
-def preprocess(image_path):
-    img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
-    img = cv2.resize(img, (128, 32))
-    img = (img / 255.0).astype(np.float32)
-    img = img.T
-    img = np.expand_dims(img, axis=-1)
-    return np.expand_dims(img, axis=0)
-# Prediction Logic (CTC Decode)
-def decode_prediction(pred):
-    # Use tf.keras.backend.ctc_decode to extract text from softmax output
-    # requires character mapping (StringLookup)
-    pass
 ```
-## Dataset Information
-The model was trained on a subset of the **[Arabic-OCR-Dataset](https://huggingface.co/datasets/mssqpi/Arabic-OCR-Dataset)** provided by mssqpi. The training set includes diverse samples of printed and handwritten-style Arabic text.
-## Citation
-If you use Qalam-Net in your research, please cite the associated comparative study on Arabic OCR architectures.
 ---
 **Developed by [Ali Khalid](https://github.com/Ali0044)**

 tags:
 - ocr
 - arabic
 - keras
+- jax
 ---
 # Qalam-Net (قلم-نت): Advanced Arabic OCR
+Qalam-Net is a high-performance, cross-backend Optical Character Recognition (OCR) model for Arabic. Built on **Keras 3**, it supports **JAX**, **PyTorch**, and **TensorFlow** backends.
+## 🚀 Quick Start (Advanced Usage)
+The following example demonstrates how to use **JAX** for ultra-fast XLA-accelerated inference.
+### 1. Installation
 ```bash
+pip install -U "keras>=3.0" jax jaxlib huggingface_hub opencv-python
 ```
+### 2. Implementation
 ```python
+import os
+os.environ["KERAS_BACKEND"] = "jax" # Options: "jax", "tensorflow", "torch"
+import keras
 import numpy as np
 import cv2
+class QalamNet:
+    def __init__(self, repo_id="Ali0044/Qalam-Net"):
+        # Download and load the latest model using the hf:// shorthand
+        # This automatically downloads the model.keras file from the root of the repo
+        self.model = keras.saving.load_model(f"hf://{repo_id}")
+        # Standard Arabic Vocabulary (Matches training set)
+        self.vocab = [' ', '!', '"', '#', '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '=', '?', '[', ']', 'ء', 'آ', 'أ', 'ؤ', 'إ', 'ئ', 'ا', 'ب', 'ة', 'ت', 'ث', 'ج', 'ح', 'خ', 'د', 'ذ', 'ر', 'ز', 'س', 'ش', 'ص', 'ض', 'ط', 'ظ', 'ع', 'غ', 'ـ', 'ف', 'ق', 'ك', 'ل', 'م', 'ن', 'ه', 'و', 'ى', 'ي', 'ً', 'ٌ', 'ٍ', 'َ', 'ُ', 'ِ', 'ّ', 'ْ', '٠', '١', '٢', '٣', '٤', '٥', '٦', '٧', '٨', '٩']
+    def preprocess(self, image_path):
+        # 1. Load as grayscale
+        img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
+        # 2. Resize to 128 (width) x 32 (height)
+        img = cv2.resize(img, (128, 32))
+        # 3. Normalize
+        img = (img / 255.0).astype(np.float32)
+        # 4. Transpose (W, H) -> (H, W) for CRNN processing
+        img = img.T
+        # 5. Expand dimensions for batch and channel
+        img = np.expand_dims(img, axis=-1)
+        return np.expand_dims(img, axis=0)
+    def predict(self, image_path):
+        # Run inference
+        batch_img = self.preprocess(image_path)
+        predictions = self.model.predict(batch_img)
+        # CTC Decode (Greedy)
+        input_len = np.ones(predictions.shape[0]) * predictions.shape[1]
+        results = keras.backend.ctc_decode(predictions, input_length=input_len, greedy=True)[0][0]
+        # Map indices to characters
+        text = ""
+        for res in results[0]:
+            if res != -1:
+                text += self.vocab[int(res)]
+        return text
+# Initialize
+ocr = QalamNet()
+# Predict
+# text = ocr.predict("sample_arabic_text.png")
+# print(f"Predicted Text: {text}")
 ```
+## 🧠 Model Architecture
+Qalam-Net employs a specialized **CNN-BiLSTM-Attention** pipeline:
+- **Spatial Features**: 3-block CNN with BatchNormalization.
+- **Sequence Context**: Stacked Bidirectional LSTMs.
+- **Focus Mechanism**: Self-attention layer to resolve overlapping Arabic characters.
+- **Loss**: Trained using Connectionist Temporal Classification (CTC).
 ---
 **Developed by [Ali Khalid](https://github.com/Ali0044)**