File size: 5,001 Bytes
b4d3d90
 
 
 
 
 
 
 
 
 
b783b1c
 
a77c5bd
 
060cd9d
 
 
 
 
 
 
 
 
 
a77c5bd
060cd9d
a77c5bd
060cd9d
 
 
 
 
 
 
a77c5bd
060cd9d
 
a77c5bd
060cd9d
 
 
 
 
 
 
 
 
a77c5bd
 
060cd9d
 
 
 
 
 
 
 
 
a77c5bd
73dec90
7be27ff
73dec90
 
a77c5bd
 
0901703
12e3f7a
73dec90
 
7be27ff
fc09984
0901703
 
7be27ff
 
abbaa01
fc09984
73dec90
7be27ff
73dec90
15e3cdb
7be27ff
 
 
 
 
 
 
fc09984
7be27ff
 
73dec90
7be27ff
 
 
 
 
 
 
 
 
 
 
 
 
 
a77c5bd
060cd9d
 
 
a77c5bd
060cd9d
 
 
 
 
 
 
 
 
 
a77c5bd
b783b1c
b4d3d90
b783b1c
 
 
 
a77c5bd
060cd9d
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
language: ar
license: apache-2.0
tags:
- ocr
- arabic
- keras
- jax
- tensorflow
- pytorch
datasets:
- mssqpi/Arabic-OCR-Dataset
---

<div align="center">
  <img src="https://huggingface.co/Ali0044/Qalam-Net/resolve/main/banner.png" width="100%" alt="Qalam-Net Banner">
  
  # ๐Ÿ–‹๏ธ Qalam-Net (ู‚ู„ู…-ู†ุช)
  ### *High-Performance, Cross-Backend Arabic OCR*
  
  [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
  [![Framework](https://img.shields.io/badge/Framework-Keras%203-F14B5C.svg)](https://keras.io/)
  [![Backend](https://img.shields.io/badge/Backend-JAX%20|%20TF%20|%20Torch-blueviolet.svg)](https://keras.io/keras_3/)
</div>

---

## ๐ŸŒŸ Highlights
- **๐Ÿš€ Ultra-Fast Inference**: Native JAX/XLA support for accelerated processing.
- **๐Ÿงฉ Portable Architecture**: Patched (v2) to resolve serialization issues across Keras versions.
- **๐ŸŽฏ Precision Driven**: CNN + BiLSTM + Self-Attention pipeline optimized for Arabic script.
- **๐Ÿ”“ Unified Loading**: No custom layers or complex setup required for inference.

---

## ๐Ÿ“– How it Works
The model processes Arabic text images through a sophisticated multi-stage pipeline:

```mermaid
graph LR
    A[Input Image 128x32] --> B[CNN Backbone]
    B --> C[Spatial Features]
    C --> D[Dual BiLSTM]
    D --> E[Self-Attention]
    E --> F[Softmax Output]
    F --> G[NumPy CTC Decoder]
    G --> H[Arabic Text]
```

---

## ๐Ÿš€ Quick Start (Robust Usage)

Use the following implementation to run inference on any platform. This uses a custom **NumPy-based decoder** for 100% cross-backend compatibility.

<details>
<summary><b>View Python Implementation</b></summary>

```python
import os
os.environ["KERAS_BACKEND"] = "jax" # Options: "jax", "tensorflow", "torch"

import keras
import numpy as np
import cv2
from huggingface_hub import hf_hub_download

class QalamNet:
    def __init__(self, repo_id="Ali0044/Qalam-Net"):
        # 1. Download and Load Model
        print(f"Loading Qalam-Net from {repo_id}...")
        model_path = hf_hub_download(repo_id=repo_id, filename="model.keras")
        self.model = keras.saving.load_model(model_path)
        
        # 2. Define the exact 38-character Arabic Vocabulary
        # [ALIF, BA, TA, THA, JEEM, HAA, KHAA, DAL, THAL, RA, ZAY, SEEN, SHEEN, SAD, DAD, TAA, ZAA, AIN, GHAIN, FA, QAF, KAF, LAM, MEEM, NOON, HA, WAW, YA, TEH_MARBUTA, ALEF_MAKSURA, ALEF_HAMZA_ABOVE, ALEF_HAMZA_BELOW, ALEF_MADDA, WAW_HAMZA, YEH_HAMZA, HAMZA, SPACE, TATWEEL]
        self.vocab = ['ุง', 'ุจ', 'ุช', 'ุซ', 'ุฌ', 'ุญ', 'ุฎ', 'ุฏ', 'ุฐ', 'ุฑ', 'ุฒ', 'ุณ', 'ุด', 'ุต', 'ุถ', 'ุท', 'ุธ', 'ุน', 'ุบ', 'ู', 'ู‚', 'ูƒ', 'ู„', 'ู…', 'ู†', 'ู‡', 'ูˆ', 'ูŠ', 'ุฉ', 'ู‰', 'ุฃ', 'ุฅ', 'ุข', 'ุค', 'ุฆ', 'ุก', ' ', 'ู€']
        
    def preprocess(self, image_path):
        img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
        img = cv2.resize(img, (128, 32)) / 255.0
        img = img.T # Transpose for CRNN architecture
        img = np.expand_dims(img, axis=(-1, 0))
        return img.astype(np.float32)

    def predict(self, image_path):
        batch_img = self.preprocess(image_path)
        preds = self.model.predict(batch_img) # Output shape: (1, 32, 39)
        
        # 3. NumPy-based CTC Greedy Decoding (Cross-Backend)
        argmax_preds = np.argmax(preds, axis=-1)[0]
        
        # Remove consecutive duplicates
        unique_indices = [argmax_preds[i] for i in range(len(argmax_preds)) 
                          if i == 0 or argmax_preds[i] != argmax_preds[i-1]]
        
        # Remove blank index (index 38)
        blank_index = preds.shape[-1] - 1
        final_indices = [idx for idx in unique_indices if idx != blank_index]
        
        # Map to vocabulary
        return "".join([self.vocab[idx] for idx in final_indices if idx < len(self.vocab)])

# Usage
ocr = QalamNet()
print(f"Predicted Arabic Text: {ocr.predict('/content/images.png')}")
```
</details>

---

## ๐Ÿ“Š Performance & Metrics
Training was conducted on the **mssqpi/Arabic-OCR-Dataset** over 50 epochs.

| Metric | Value |
| :--- | :--- |
| **Input Shape** | 128 x 32 x 1 (Grayscale) |
| **Output Classes** | 39 (38 Chars + 1 Blank) |
| **Final Loss** | ~13.13 |
| **Val Loss** | ~89.79 |
| **Framework** | Keras 3.x (Native) |

## ๐Ÿ“ Dataset
This model was trained on the **[Arabic-OCR-Dataset](https://huggingface.co/datasets/mssqpi/Arabic-OCR-Dataset)** provided by **Muhammad AL-Qurishi (mssqpi)**.
- **Total Samples**: ~2.16 Million images.
- **Content**: A massive collection of Arabic text lines in various fonts and styles.
- **Usage**: Used for training the CRNN architecture to recognize sequential Arabic script.

---

## ๐Ÿค Acknowledgments
Developed and maintained by **[Ali Khalid](https://github.com/Ali0044)**. This model is part of a comparative research study on Arabic OCR architectures.

---

> [!TIP]
> **Pro Tip**: Use the **JAX** backend for the fastest inference times on modern CPUs and GPUs!