| --- |
| language: ar |
| license: apache-2.0 |
| tags: |
| - ocr |
| - arabic |
| - keras |
| - jax |
| - tensorflow |
| - pytorch |
| datasets: |
| - mssqpi/Arabic-OCR-Dataset |
| --- |
| |
| <div align="center"> |
| <img src="https://huggingface.co/Ali0044/Qalam-Net/resolve/main/banner.png" width="100%" alt="Qalam-Net Banner"> |
| |
| # ๐๏ธ Qalam-Net (ููู
-ูุช) |
| ### *High-Performance, Cross-Backend Arabic OCR* |
| |
| [](https://opensource.org/licenses/Apache-2.0) |
| [](https://keras.io/) |
| [](https://keras.io/keras_3/) |
| </div> |
|
|
| --- |
|
|
| ## ๐ Highlights |
| - **๐ Ultra-Fast Inference**: Native JAX/XLA support for accelerated processing. |
| - **๐งฉ Portable Architecture**: Patched (v2) to resolve serialization issues across Keras versions. |
| - **๐ฏ Precision Driven**: CNN + BiLSTM + Self-Attention pipeline optimized for Arabic script. |
| - **๐ Unified Loading**: No custom layers or complex setup required for inference. |
|
|
| --- |
|
|
| ## ๐ How it Works |
| The model processes Arabic text images through a sophisticated multi-stage pipeline: |
|
|
| ```mermaid |
| graph LR |
| A[Input Image 128x32] --> B[CNN Backbone] |
| B --> C[Spatial Features] |
| C --> D[Dual BiLSTM] |
| D --> E[Self-Attention] |
| E --> F[Softmax Output] |
| F --> G[NumPy CTC Decoder] |
| G --> H[Arabic Text] |
| ``` |
|
|
| --- |
|
|
| ## ๐ Quick Start (Robust Usage) |
|
|
| Use the following implementation to run inference on any platform. This uses a custom **NumPy-based decoder** for 100% cross-backend compatibility. |
|
|
| <details> |
| <summary><b>View Python Implementation</b></summary> |
|
|
| ```python |
| import os |
| os.environ["KERAS_BACKEND"] = "jax" # Options: "jax", "tensorflow", "torch" |
| |
| import keras |
| import numpy as np |
| import cv2 |
| from huggingface_hub import hf_hub_download |
| |
| class QalamNet: |
| def __init__(self, repo_id="Ali0044/Qalam-Net"): |
| # 1. Download and Load Model |
| print(f"Loading Qalam-Net from {repo_id}...") |
| model_path = hf_hub_download(repo_id=repo_id, filename="model.keras") |
| self.model = keras.saving.load_model(model_path) |
| |
| # 2. Define the exact 38-character Arabic Vocabulary |
| # [ALIF, BA, TA, THA, JEEM, HAA, KHAA, DAL, THAL, RA, ZAY, SEEN, SHEEN, SAD, DAD, TAA, ZAA, AIN, GHAIN, FA, QAF, KAF, LAM, MEEM, NOON, HA, WAW, YA, TEH_MARBUTA, ALEF_MAKSURA, ALEF_HAMZA_ABOVE, ALEF_HAMZA_BELOW, ALEF_MADDA, WAW_HAMZA, YEH_HAMZA, HAMZA, SPACE, TATWEEL] |
| self.vocab = ['ุง', 'ุจ', 'ุช', 'ุซ', 'ุฌ', 'ุญ', 'ุฎ', 'ุฏ', 'ุฐ', 'ุฑ', 'ุฒ', 'ุณ', 'ุด', 'ุต', 'ุถ', 'ุท', 'ุธ', 'ุน', 'ุบ', 'ู', 'ู', 'ู', 'ู', 'ู
', 'ู', 'ู', 'ู', 'ู', 'ุฉ', 'ู', 'ุฃ', 'ุฅ', 'ุข', 'ุค', 'ุฆ', 'ุก', ' ', 'ู'] |
| |
| def preprocess(self, image_path): |
| img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE) |
| img = cv2.resize(img, (128, 32)) / 255.0 |
| img = img.T # Transpose for CRNN architecture |
| img = np.expand_dims(img, axis=(-1, 0)) |
| return img.astype(np.float32) |
| |
| def predict(self, image_path): |
| batch_img = self.preprocess(image_path) |
| preds = self.model.predict(batch_img) # Output shape: (1, 32, 39) |
| |
| # 3. NumPy-based CTC Greedy Decoding (Cross-Backend) |
| argmax_preds = np.argmax(preds, axis=-1)[0] |
| |
| # Remove consecutive duplicates |
| unique_indices = [argmax_preds[i] for i in range(len(argmax_preds)) |
| if i == 0 or argmax_preds[i] != argmax_preds[i-1]] |
| |
| # Remove blank index (index 38) |
| blank_index = preds.shape[-1] - 1 |
| final_indices = [idx for idx in unique_indices if idx != blank_index] |
| |
| # Map to vocabulary |
| return "".join([self.vocab[idx] for idx in final_indices if idx < len(self.vocab)]) |
| |
| # Usage |
| ocr = QalamNet() |
| print(f"Predicted Arabic Text: {ocr.predict('/content/images.png')}") |
| ``` |
| </details> |
|
|
| --- |
|
|
| ## ๐ Performance & Metrics |
| Training was conducted on the **mssqpi/Arabic-OCR-Dataset** over 50 epochs. |
|
|
| | Metric | Value | |
| | :--- | :--- | |
| | **Input Shape** | 128 x 32 x 1 (Grayscale) | |
| | **Output Classes** | 39 (38 Chars + 1 Blank) | |
| | **Final Loss** | ~13.13 | |
| | **Val Loss** | ~89.79 | |
| | **Framework** | Keras 3.x (Native) | |
|
|
| ## ๐ Dataset |
| This model was trained on the **[Arabic-OCR-Dataset](https://huggingface.co/datasets/mssqpi/Arabic-OCR-Dataset)** provided by **Muhammad AL-Qurishi (mssqpi)**. |
| - **Total Samples**: ~2.16 Million images. |
| - **Content**: A massive collection of Arabic text lines in various fonts and styles. |
| - **Usage**: Used for training the CRNN architecture to recognize sequential Arabic script. |
|
|
| --- |
|
|
| ## ๐ค Acknowledgments |
| Developed and maintained by **[Ali Khalid](https://github.com/Ali0044)**. This model is part of a comparative research study on Arabic OCR architectures. |
|
|
| --- |
|
|
| > [!TIP] |
| > **Pro Tip**: Use the **JAX** backend for the fastest inference times on modern CPUs and GPUs! |
|
|