File size: 5,001 Bytes

b4d3d90
 
 
 
 
 
 
 
 
 
b783b1c
 
a77c5bd
 
060cd9d
 
 
 
 
 
 
 
 
 
a77c5bd
060cd9d
a77c5bd
060cd9d
 
 
 
 
 
 
a77c5bd
060cd9d
 
a77c5bd
060cd9d
 
 
 
 
 
 
 
 
a77c5bd
 
060cd9d
 
 
 
 
 
 
 
 
a77c5bd
73dec90
7be27ff
73dec90
 
a77c5bd
 
0901703
12e3f7a
73dec90
 
7be27ff
fc09984
0901703
 
7be27ff
 
abbaa01
fc09984
73dec90
7be27ff
73dec90
15e3cdb
7be27ff
 
 
 
 
 
 
fc09984
7be27ff
 
73dec90
7be27ff
 
 
 
 
 
 
 
 
 
 
 
 
 
a77c5bd
060cd9d
 
 
a77c5bd
060cd9d
 
 
 
 
 
 
 
 
 
a77c5bd
b783b1c
b4d3d90
b783b1c
 
 
 
a77c5bd
060cd9d

---
language: ar
license: apache-2.0
tags:
- ocr
- arabic
- keras
- jax
- tensorflow
- pytorch
datasets:
- mssqpi/Arabic-OCR-Dataset
---

<div align="center">
  <img src="https://huggingface.co/Ali0044/Qalam-Net/resolve/main/banner.png" width="100%" alt="Qalam-Net Banner">
  
  # 🖋️ Qalam-Net (قلم-نت)
  ### *High-Performance, Cross-Backend Arabic OCR*
  
  [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
  [![Framework](https://img.shields.io/badge/Framework-Keras%203-F14B5C.svg)](https://keras.io/)
  [![Backend](https://img.shields.io/badge/Backend-JAX%20|%20TF%20|%20Torch-blueviolet.svg)](https://keras.io/keras_3/)
</div>

---

## 🌟 Highlights
- **🚀 Ultra-Fast Inference**: Native JAX/XLA support for accelerated processing.
- **🧩 Portable Architecture**: Patched (v2) to resolve serialization issues across Keras versions.
- **🎯 Precision Driven**: CNN + BiLSTM + Self-Attention pipeline optimized for Arabic script.
- **🔓 Unified Loading**: No custom layers or complex setup required for inference.

---

## 📖 How it Works
The model processes Arabic text images through a sophisticated multi-stage pipeline:

```mermaid
graph LR
    A[Input Image 128x32] --> B[CNN Backbone]
    B --> C[Spatial Features]
    C --> D[Dual BiLSTM]
    D --> E[Self-Attention]
    E --> F[Softmax Output]
    F --> G[NumPy CTC Decoder]
    G --> H[Arabic Text]
```

---

## 🚀 Quick Start (Robust Usage)

Use the following implementation to run inference on any platform. This uses a custom **NumPy-based decoder** for 100% cross-backend compatibility.

<details>
<summary><b>View Python Implementation</b></summary>

```python
import os
os.environ["KERAS_BACKEND"] = "jax" # Options: "jax", "tensorflow", "torch"

import keras
import numpy as np
import cv2
from huggingface_hub import hf_hub_download

class QalamNet:
    def __init__(self, repo_id="Ali0044/Qalam-Net"):
        # 1. Download and Load Model
        print(f"Loading Qalam-Net from {repo_id}...")
        model_path = hf_hub_download(repo_id=repo_id, filename="model.keras")
        self.model = keras.saving.load_model(model_path)
        
        # 2. Define the exact 38-character Arabic Vocabulary
        # [ALIF, BA, TA, THA, JEEM, HAA, KHAA, DAL, THAL, RA, ZAY, SEEN, SHEEN, SAD, DAD, TAA, ZAA, AIN, GHAIN, FA, QAF, KAF, LAM, MEEM, NOON, HA, WAW, YA, TEH_MARBUTA, ALEF_MAKSURA, ALEF_HAMZA_ABOVE, ALEF_HAMZA_BELOW, ALEF_MADDA, WAW_HAMZA, YEH_HAMZA, HAMZA, SPACE, TATWEEL]
        self.vocab = ['ا', 'ب', 'ت', 'ث', 'ج', 'ح', 'خ', 'د', 'ذ', 'ر', 'ز', 'س', 'ش', 'ص', 'ض', 'ط', 'ظ', 'ع', 'غ', 'ف', 'ق', 'ك', 'ل', 'م', 'ن', 'ه', 'و', 'ي', 'ة', 'ى', 'أ', 'إ', 'آ', 'ؤ', 'ئ', 'ء', ' ', 'ـ']
        
    def preprocess(self, image_path):
        img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
        img = cv2.resize(img, (128, 32)) / 255.0
        img = img.T # Transpose for CRNN architecture
        img = np.expand_dims(img, axis=(-1, 0))
        return img.astype(np.float32)

    def predict(self, image_path):
        batch_img = self.preprocess(image_path)
        preds = self.model.predict(batch_img) # Output shape: (1, 32, 39)
        
        # 3. NumPy-based CTC Greedy Decoding (Cross-Backend)
        argmax_preds = np.argmax(preds, axis=-1)[0]
        
        # Remove consecutive duplicates
        unique_indices = [argmax_preds[i] for i in range(len(argmax_preds)) 
                          if i == 0 or argmax_preds[i] != argmax_preds[i-1]]
        
        # Remove blank index (index 38)
        blank_index = preds.shape[-1] - 1
        final_indices = [idx for idx in unique_indices if idx != blank_index]
        
        # Map to vocabulary
        return "".join([self.vocab[idx] for idx in final_indices if idx < len(self.vocab)])

# Usage
ocr = QalamNet()
print(f"Predicted Arabic Text: {ocr.predict('/content/images.png')}")
```
</details>

---

## 📊 Performance & Metrics
Training was conducted on the **mssqpi/Arabic-OCR-Dataset** over 50 epochs.

| Metric | Value |
| :--- | :--- |
| **Input Shape** | 128 x 32 x 1 (Grayscale) |
| **Output Classes** | 39 (38 Chars + 1 Blank) |
| **Final Loss** | ~13.13 |
| **Val Loss** | ~89.79 |
| **Framework** | Keras 3.x (Native) |

## 📁 Dataset
This model was trained on the **[Arabic-OCR-Dataset](https://huggingface.co/datasets/mssqpi/Arabic-OCR-Dataset)** provided by **Muhammad AL-Qurishi (mssqpi)**.
- **Total Samples**: ~2.16 Million images.
- **Content**: A massive collection of Arabic text lines in various fonts and styles.
- **Usage**: Used for training the CRNN architecture to recognize sequential Arabic script.

---

## 🤝 Acknowledgments
Developed and maintained by **[Ali Khalid](https://github.com/Ali0044)**. This model is part of a comparative research study on Arabic OCR architectures.

---

> [!TIP]
> **Pro Tip**: Use the **JAX** backend for the fastest inference times on modern CPUs and GPUs!