--- language: ar license: apache-2.0 tags: - ocr - arabic - keras - jax - tensorflow - pytorch datasets: - mssqpi/Arabic-OCR-Dataset ---
Qalam-Net Banner # 🖋️ Qalam-Net (قلم-نت) ### *High-Performance, Cross-Backend Arabic OCR* [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Framework](https://img.shields.io/badge/Framework-Keras%203-F14B5C.svg)](https://keras.io/) [![Backend](https://img.shields.io/badge/Backend-JAX%20|%20TF%20|%20Torch-blueviolet.svg)](https://keras.io/keras_3/)
--- ## 🌟 Highlights - **🚀 Ultra-Fast Inference**: Native JAX/XLA support for accelerated processing. - **🧩 Portable Architecture**: Patched (v2) to resolve serialization issues across Keras versions. - **🎯 Precision Driven**: CNN + BiLSTM + Self-Attention pipeline optimized for Arabic script. - **🔓 Unified Loading**: No custom layers or complex setup required for inference. --- ## 📖 How it Works The model processes Arabic text images through a sophisticated multi-stage pipeline: ```mermaid graph LR A[Input Image 128x32] --> B[CNN Backbone] B --> C[Spatial Features] C --> D[Dual BiLSTM] D --> E[Self-Attention] E --> F[Softmax Output] F --> G[NumPy CTC Decoder] G --> H[Arabic Text] ``` --- ## 🚀 Quick Start (Robust Usage) Use the following implementation to run inference on any platform. This uses a custom **NumPy-based decoder** for 100% cross-backend compatibility.
View Python Implementation ```python import os os.environ["KERAS_BACKEND"] = "jax" # Options: "jax", "tensorflow", "torch" import keras import numpy as np import cv2 from huggingface_hub import hf_hub_download class QalamNet: def __init__(self, repo_id="Ali0044/Qalam-Net"): # 1. Download and Load Model print(f"Loading Qalam-Net from {repo_id}...") model_path = hf_hub_download(repo_id=repo_id, filename="model.keras") self.model = keras.saving.load_model(model_path) # 2. Define the exact 38-character Arabic Vocabulary # [ALIF, BA, TA, THA, JEEM, HAA, KHAA, DAL, THAL, RA, ZAY, SEEN, SHEEN, SAD, DAD, TAA, ZAA, AIN, GHAIN, FA, QAF, KAF, LAM, MEEM, NOON, HA, WAW, YA, TEH_MARBUTA, ALEF_MAKSURA, ALEF_HAMZA_ABOVE, ALEF_HAMZA_BELOW, ALEF_MADDA, WAW_HAMZA, YEH_HAMZA, HAMZA, SPACE, TATWEEL] self.vocab = ['ا', 'ب', 'ت', 'ث', 'ج', 'ح', 'خ', 'د', 'ذ', 'ر', 'ز', 'س', 'ش', 'ص', 'ض', 'ط', 'ظ', 'ع', 'غ', 'ف', 'ق', 'ك', 'ل', 'م', 'ن', 'ه', 'و', 'ي', 'ة', 'ى', 'أ', 'إ', 'آ', 'ؤ', 'ئ', 'ء', ' ', 'ـ'] def preprocess(self, image_path): img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE) img = cv2.resize(img, (128, 32)) / 255.0 img = img.T # Transpose for CRNN architecture img = np.expand_dims(img, axis=(-1, 0)) return img.astype(np.float32) def predict(self, image_path): batch_img = self.preprocess(image_path) preds = self.model.predict(batch_img) # Output shape: (1, 32, 39) # 3. NumPy-based CTC Greedy Decoding (Cross-Backend) argmax_preds = np.argmax(preds, axis=-1)[0] # Remove consecutive duplicates unique_indices = [argmax_preds[i] for i in range(len(argmax_preds)) if i == 0 or argmax_preds[i] != argmax_preds[i-1]] # Remove blank index (index 38) blank_index = preds.shape[-1] - 1 final_indices = [idx for idx in unique_indices if idx != blank_index] # Map to vocabulary return "".join([self.vocab[idx] for idx in final_indices if idx < len(self.vocab)]) # Usage ocr = QalamNet() print(f"Predicted Arabic Text: {ocr.predict('/content/images.png')}") ```
--- ## 📊 Performance & Metrics Training was conducted on the **mssqpi/Arabic-OCR-Dataset** over 50 epochs. | Metric | Value | | :--- | :--- | | **Input Shape** | 128 x 32 x 1 (Grayscale) | | **Output Classes** | 39 (38 Chars + 1 Blank) | | **Final Loss** | ~13.13 | | **Val Loss** | ~89.79 | | **Framework** | Keras 3.x (Native) | ## 📁 Dataset This model was trained on the **[Arabic-OCR-Dataset](https://huggingface.co/datasets/mssqpi/Arabic-OCR-Dataset)** provided by **Muhammad AL-Qurishi (mssqpi)**. - **Total Samples**: ~2.16 Million images. - **Content**: A massive collection of Arabic text lines in various fonts and styles. - **Usage**: Used for training the CRNN architecture to recognize sequential Arabic script. --- ## 🤝 Acknowledgments Developed and maintained by **[Ali Khalid](https://github.com/Ali0044)**. This model is part of a comparative research study on Arabic OCR architectures. --- > [!TIP] > **Pro Tip**: Use the **JAX** backend for the fastest inference times on modern CPUs and GPUs!