File size: 5,001 Bytes
b4d3d90 b783b1c a77c5bd 060cd9d a77c5bd 060cd9d a77c5bd 060cd9d a77c5bd 060cd9d a77c5bd 060cd9d a77c5bd 060cd9d a77c5bd 73dec90 7be27ff 73dec90 a77c5bd 0901703 12e3f7a 73dec90 7be27ff fc09984 0901703 7be27ff abbaa01 fc09984 73dec90 7be27ff 73dec90 15e3cdb 7be27ff fc09984 7be27ff 73dec90 7be27ff a77c5bd 060cd9d a77c5bd 060cd9d a77c5bd b783b1c b4d3d90 b783b1c a77c5bd 060cd9d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | ---
language: ar
license: apache-2.0
tags:
- ocr
- arabic
- keras
- jax
- tensorflow
- pytorch
datasets:
- mssqpi/Arabic-OCR-Dataset
---
<div align="center">
<img src="https://huggingface.co/Ali0044/Qalam-Net/resolve/main/banner.png" width="100%" alt="Qalam-Net Banner">
# ๐๏ธ Qalam-Net (ููู
-ูุช)
### *High-Performance, Cross-Backend Arabic OCR*
[](https://opensource.org/licenses/Apache-2.0)
[](https://keras.io/)
[](https://keras.io/keras_3/)
</div>
---
## ๐ Highlights
- **๐ Ultra-Fast Inference**: Native JAX/XLA support for accelerated processing.
- **๐งฉ Portable Architecture**: Patched (v2) to resolve serialization issues across Keras versions.
- **๐ฏ Precision Driven**: CNN + BiLSTM + Self-Attention pipeline optimized for Arabic script.
- **๐ Unified Loading**: No custom layers or complex setup required for inference.
---
## ๐ How it Works
The model processes Arabic text images through a sophisticated multi-stage pipeline:
```mermaid
graph LR
A[Input Image 128x32] --> B[CNN Backbone]
B --> C[Spatial Features]
C --> D[Dual BiLSTM]
D --> E[Self-Attention]
E --> F[Softmax Output]
F --> G[NumPy CTC Decoder]
G --> H[Arabic Text]
```
---
## ๐ Quick Start (Robust Usage)
Use the following implementation to run inference on any platform. This uses a custom **NumPy-based decoder** for 100% cross-backend compatibility.
<details>
<summary><b>View Python Implementation</b></summary>
```python
import os
os.environ["KERAS_BACKEND"] = "jax" # Options: "jax", "tensorflow", "torch"
import keras
import numpy as np
import cv2
from huggingface_hub import hf_hub_download
class QalamNet:
def __init__(self, repo_id="Ali0044/Qalam-Net"):
# 1. Download and Load Model
print(f"Loading Qalam-Net from {repo_id}...")
model_path = hf_hub_download(repo_id=repo_id, filename="model.keras")
self.model = keras.saving.load_model(model_path)
# 2. Define the exact 38-character Arabic Vocabulary
# [ALIF, BA, TA, THA, JEEM, HAA, KHAA, DAL, THAL, RA, ZAY, SEEN, SHEEN, SAD, DAD, TAA, ZAA, AIN, GHAIN, FA, QAF, KAF, LAM, MEEM, NOON, HA, WAW, YA, TEH_MARBUTA, ALEF_MAKSURA, ALEF_HAMZA_ABOVE, ALEF_HAMZA_BELOW, ALEF_MADDA, WAW_HAMZA, YEH_HAMZA, HAMZA, SPACE, TATWEEL]
self.vocab = ['ุง', 'ุจ', 'ุช', 'ุซ', 'ุฌ', 'ุญ', 'ุฎ', 'ุฏ', 'ุฐ', 'ุฑ', 'ุฒ', 'ุณ', 'ุด', 'ุต', 'ุถ', 'ุท', 'ุธ', 'ุน', 'ุบ', 'ู', 'ู', 'ู', 'ู', 'ู
', 'ู', 'ู', 'ู', 'ู', 'ุฉ', 'ู', 'ุฃ', 'ุฅ', 'ุข', 'ุค', 'ุฆ', 'ุก', ' ', 'ู']
def preprocess(self, image_path):
img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, (128, 32)) / 255.0
img = img.T # Transpose for CRNN architecture
img = np.expand_dims(img, axis=(-1, 0))
return img.astype(np.float32)
def predict(self, image_path):
batch_img = self.preprocess(image_path)
preds = self.model.predict(batch_img) # Output shape: (1, 32, 39)
# 3. NumPy-based CTC Greedy Decoding (Cross-Backend)
argmax_preds = np.argmax(preds, axis=-1)[0]
# Remove consecutive duplicates
unique_indices = [argmax_preds[i] for i in range(len(argmax_preds))
if i == 0 or argmax_preds[i] != argmax_preds[i-1]]
# Remove blank index (index 38)
blank_index = preds.shape[-1] - 1
final_indices = [idx for idx in unique_indices if idx != blank_index]
# Map to vocabulary
return "".join([self.vocab[idx] for idx in final_indices if idx < len(self.vocab)])
# Usage
ocr = QalamNet()
print(f"Predicted Arabic Text: {ocr.predict('/content/images.png')}")
```
</details>
---
## ๐ Performance & Metrics
Training was conducted on the **mssqpi/Arabic-OCR-Dataset** over 50 epochs.
| Metric | Value |
| :--- | :--- |
| **Input Shape** | 128 x 32 x 1 (Grayscale) |
| **Output Classes** | 39 (38 Chars + 1 Blank) |
| **Final Loss** | ~13.13 |
| **Val Loss** | ~89.79 |
| **Framework** | Keras 3.x (Native) |
## ๐ Dataset
This model was trained on the **[Arabic-OCR-Dataset](https://huggingface.co/datasets/mssqpi/Arabic-OCR-Dataset)** provided by **Muhammad AL-Qurishi (mssqpi)**.
- **Total Samples**: ~2.16 Million images.
- **Content**: A massive collection of Arabic text lines in various fonts and styles.
- **Usage**: Used for training the CRNN architecture to recognize sequential Arabic script.
---
## ๐ค Acknowledgments
Developed and maintained by **[Ali Khalid](https://github.com/Ali0044)**. This model is part of a comparative research study on Arabic OCR architectures.
---
> [!TIP]
> **Pro Tip**: Use the **JAX** backend for the fastest inference times on modern CPUs and GPUs!
|