Qalam-Net / README.md
Ali0044's picture
Upload README.md with huggingface_hub
b4d3d90 verified
---
language: ar
license: apache-2.0
tags:
- ocr
- arabic
- keras
- jax
- tensorflow
- pytorch
datasets:
- mssqpi/Arabic-OCR-Dataset
---
<div align="center">
<img src="https://huggingface.co/Ali0044/Qalam-Net/resolve/main/banner.png" width="100%" alt="Qalam-Net Banner">
# ๐Ÿ–‹๏ธ Qalam-Net (ู‚ู„ู…-ู†ุช)
### *High-Performance, Cross-Backend Arabic OCR*
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Framework](https://img.shields.io/badge/Framework-Keras%203-F14B5C.svg)](https://keras.io/)
[![Backend](https://img.shields.io/badge/Backend-JAX%20|%20TF%20|%20Torch-blueviolet.svg)](https://keras.io/keras_3/)
</div>
---
## ๐ŸŒŸ Highlights
- **๐Ÿš€ Ultra-Fast Inference**: Native JAX/XLA support for accelerated processing.
- **๐Ÿงฉ Portable Architecture**: Patched (v2) to resolve serialization issues across Keras versions.
- **๐ŸŽฏ Precision Driven**: CNN + BiLSTM + Self-Attention pipeline optimized for Arabic script.
- **๐Ÿ”“ Unified Loading**: No custom layers or complex setup required for inference.
---
## ๐Ÿ“– How it Works
The model processes Arabic text images through a sophisticated multi-stage pipeline:
```mermaid
graph LR
A[Input Image 128x32] --> B[CNN Backbone]
B --> C[Spatial Features]
C --> D[Dual BiLSTM]
D --> E[Self-Attention]
E --> F[Softmax Output]
F --> G[NumPy CTC Decoder]
G --> H[Arabic Text]
```
---
## ๐Ÿš€ Quick Start (Robust Usage)
Use the following implementation to run inference on any platform. This uses a custom **NumPy-based decoder** for 100% cross-backend compatibility.
<details>
<summary><b>View Python Implementation</b></summary>
```python
import os
os.environ["KERAS_BACKEND"] = "jax" # Options: "jax", "tensorflow", "torch"
import keras
import numpy as np
import cv2
from huggingface_hub import hf_hub_download
class QalamNet:
def __init__(self, repo_id="Ali0044/Qalam-Net"):
# 1. Download and Load Model
print(f"Loading Qalam-Net from {repo_id}...")
model_path = hf_hub_download(repo_id=repo_id, filename="model.keras")
self.model = keras.saving.load_model(model_path)
# 2. Define the exact 38-character Arabic Vocabulary
# [ALIF, BA, TA, THA, JEEM, HAA, KHAA, DAL, THAL, RA, ZAY, SEEN, SHEEN, SAD, DAD, TAA, ZAA, AIN, GHAIN, FA, QAF, KAF, LAM, MEEM, NOON, HA, WAW, YA, TEH_MARBUTA, ALEF_MAKSURA, ALEF_HAMZA_ABOVE, ALEF_HAMZA_BELOW, ALEF_MADDA, WAW_HAMZA, YEH_HAMZA, HAMZA, SPACE, TATWEEL]
self.vocab = ['ุง', 'ุจ', 'ุช', 'ุซ', 'ุฌ', 'ุญ', 'ุฎ', 'ุฏ', 'ุฐ', 'ุฑ', 'ุฒ', 'ุณ', 'ุด', 'ุต', 'ุถ', 'ุท', 'ุธ', 'ุน', 'ุบ', 'ู', 'ู‚', 'ูƒ', 'ู„', 'ู…', 'ู†', 'ู‡', 'ูˆ', 'ูŠ', 'ุฉ', 'ู‰', 'ุฃ', 'ุฅ', 'ุข', 'ุค', 'ุฆ', 'ุก', ' ', 'ู€']
def preprocess(self, image_path):
img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, (128, 32)) / 255.0
img = img.T # Transpose for CRNN architecture
img = np.expand_dims(img, axis=(-1, 0))
return img.astype(np.float32)
def predict(self, image_path):
batch_img = self.preprocess(image_path)
preds = self.model.predict(batch_img) # Output shape: (1, 32, 39)
# 3. NumPy-based CTC Greedy Decoding (Cross-Backend)
argmax_preds = np.argmax(preds, axis=-1)[0]
# Remove consecutive duplicates
unique_indices = [argmax_preds[i] for i in range(len(argmax_preds))
if i == 0 or argmax_preds[i] != argmax_preds[i-1]]
# Remove blank index (index 38)
blank_index = preds.shape[-1] - 1
final_indices = [idx for idx in unique_indices if idx != blank_index]
# Map to vocabulary
return "".join([self.vocab[idx] for idx in final_indices if idx < len(self.vocab)])
# Usage
ocr = QalamNet()
print(f"Predicted Arabic Text: {ocr.predict('/content/images.png')}")
```
</details>
---
## ๐Ÿ“Š Performance & Metrics
Training was conducted on the **mssqpi/Arabic-OCR-Dataset** over 50 epochs.
| Metric | Value |
| :--- | :--- |
| **Input Shape** | 128 x 32 x 1 (Grayscale) |
| **Output Classes** | 39 (38 Chars + 1 Blank) |
| **Final Loss** | ~13.13 |
| **Val Loss** | ~89.79 |
| **Framework** | Keras 3.x (Native) |
## ๐Ÿ“ Dataset
This model was trained on the **[Arabic-OCR-Dataset](https://huggingface.co/datasets/mssqpi/Arabic-OCR-Dataset)** provided by **Muhammad AL-Qurishi (mssqpi)**.
- **Total Samples**: ~2.16 Million images.
- **Content**: A massive collection of Arabic text lines in various fonts and styles.
- **Usage**: Used for training the CRNN architecture to recognize sequential Arabic script.
---
## ๐Ÿค Acknowledgments
Developed and maintained by **[Ali Khalid](https://github.com/Ali0044)**. This model is part of a comparative research study on Arabic OCR architectures.
---
> [!TIP]
> **Pro Tip**: Use the **JAX** backend for the fastest inference times on modern CPUs and GPUs!