Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -4,93 +4,84 @@ license: apache-2.0
|
|
| 4 |
tags:
|
| 5 |
- ocr
|
| 6 |
- arabic
|
| 7 |
-
- tensorflow
|
| 8 |
- keras
|
| 9 |
-
-
|
| 10 |
-
- attention
|
| 11 |
-
datasets:
|
| 12 |
-
- mssqpi/Arabic-OCR-Dataset
|
| 13 |
-
metrics:
|
| 14 |
-
- accuracy
|
| 15 |
-
- character-error-rate
|
| 16 |
-
model-index:
|
| 17 |
-
- name: Qalam-Net
|
| 18 |
-
results:
|
| 19 |
-
- task:
|
| 20 |
-
type: optical-character-recognition
|
| 21 |
-
name: Optical Character Recognition
|
| 22 |
-
dataset:
|
| 23 |
-
name: Arabic-OCR-Dataset
|
| 24 |
-
type: mssqpi/Arabic-OCR-Dataset
|
| 25 |
-
metrics:
|
| 26 |
-
- type: accuracy
|
| 27 |
-
value: 0.92
|
| 28 |
-
name: Character Accuracy
|
| 29 |
---
|
| 30 |
|
| 31 |
# Qalam-Net (قلم-نت): Advanced Arabic OCR
|
| 32 |
|
| 33 |
-
Qalam-Net is a high-performance Optical Character Recognition (OCR) model
|
| 34 |
|
| 35 |
-
##
|
| 36 |
-
- **Architecture**: CNN-BiLSTM-Attention (CRNN) with CTC Loss.
|
| 37 |
-
- **Language Support**: Modern Standard Arabic (MSA) and common cursive variations.
|
| 38 |
-
- **Performance**: Optimized for 128x32 grayscale images.
|
| 39 |
-
- **Robustness**: Handles various fonts, ligatures, and diacritics common in Arabic.
|
| 40 |
|
| 41 |
-
|
| 42 |
-
The model consists of:
|
| 43 |
-
1. **CNN Backbone**: Multiple `Conv2D` layers with `BatchNormalization` and `MaxPooling` to extract spatial features from the 128x32 input.
|
| 44 |
-
2. **Sequence Modeling**: Two layers of `Bidirectional LSTM` (128 units) to capture temporal dependencies in the Arabic script sequence.
|
| 45 |
-
3. **Attention Layer**: A self-attention mechanism that weighs the importance of different spatial features before final character prediction.
|
| 46 |
-
4. **CTC Output**: Connectionist Temporal Classification (CTC) layer for mapping variable-length input sequences to character labels without explicit alignment.
|
| 47 |
|
| 48 |
-
##
|
| 49 |
-
|
| 50 |
-
### Dependencies
|
| 51 |
```bash
|
| 52 |
-
pip install
|
| 53 |
```
|
| 54 |
|
| 55 |
-
###
|
| 56 |
```python
|
| 57 |
-
import
|
|
|
|
|
|
|
|
|
|
| 58 |
import numpy as np
|
| 59 |
import cv2
|
| 60 |
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
self.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
-
def
|
| 68 |
-
#
|
| 69 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
-
#
|
| 72 |
-
|
| 73 |
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
img = (img / 255.0).astype(np.float32)
|
| 78 |
-
img = img.T
|
| 79 |
-
img = np.expand_dims(img, axis=-1)
|
| 80 |
-
return np.expand_dims(img, axis=0)
|
| 81 |
-
|
| 82 |
-
# Prediction Logic (CTC Decode)
|
| 83 |
-
def decode_prediction(pred):
|
| 84 |
-
# Use tf.keras.backend.ctc_decode to extract text from softmax output
|
| 85 |
-
# requires character mapping (StringLookup)
|
| 86 |
-
pass
|
| 87 |
```
|
| 88 |
|
| 89 |
-
##
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
|
|
|
| 94 |
|
| 95 |
---
|
| 96 |
**Developed by [Ali Khalid](https://github.com/Ali0044)**
|
|
|
|
| 4 |
tags:
|
| 5 |
- ocr
|
| 6 |
- arabic
|
|
|
|
| 7 |
- keras
|
| 8 |
+
- jax
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
# Qalam-Net (قلم-نت): Advanced Arabic OCR
|
| 12 |
|
| 13 |
+
Qalam-Net is a high-performance, cross-backend Optical Character Recognition (OCR) model for Arabic. Built on **Keras 3**, it supports **JAX**, **PyTorch**, and **TensorFlow** backends.
|
| 14 |
|
| 15 |
+
## 🚀 Quick Start (Advanced Usage)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
+
The following example demonstrates how to use **JAX** for ultra-fast XLA-accelerated inference.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
+
### 1. Installation
|
|
|
|
|
|
|
| 20 |
```bash
|
| 21 |
+
pip install -U "keras>=3.0" jax jaxlib huggingface_hub opencv-python
|
| 22 |
```
|
| 23 |
|
| 24 |
+
### 2. Implementation
|
| 25 |
```python
|
| 26 |
+
import os
|
| 27 |
+
os.environ["KERAS_BACKEND"] = "jax" # Options: "jax", "tensorflow", "torch"
|
| 28 |
+
|
| 29 |
+
import keras
|
| 30 |
import numpy as np
|
| 31 |
import cv2
|
| 32 |
|
| 33 |
+
class QalamNet:
|
| 34 |
+
def __init__(self, repo_id="Ali0044/Qalam-Net"):
|
| 35 |
+
# Download and load the latest model using the hf:// shorthand
|
| 36 |
+
# This automatically downloads the model.keras file from the root of the repo
|
| 37 |
+
self.model = keras.saving.load_model(f"hf://{repo_id}")
|
| 38 |
+
|
| 39 |
+
# Standard Arabic Vocabulary (Matches training set)
|
| 40 |
+
self.vocab = [' ', '!', '"', '#', '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '=', '?', '[', ']', 'ء', 'آ', 'أ', 'ؤ', 'إ', 'ئ', 'ا', 'ب', 'ة', 'ت', 'ث', 'ج', 'ح', 'خ', 'د', 'ذ', 'ر', 'ز', 'س', 'ش', 'ص', 'ض', 'ط', 'ظ', 'ع', 'غ', 'ـ', 'ف', 'ق', 'ك', 'ل', 'م', 'ن', 'ه', 'و', 'ى', 'ي', 'ً', 'ٌ', 'ٍ', 'َ', 'ُ', 'ِ', 'ّ', 'ْ', '٠', '١', '٢', '٣', '٤', '٥', '٦', '٧', '٨', '٩']
|
| 41 |
+
|
| 42 |
+
def preprocess(self, image_path):
|
| 43 |
+
# 1. Load as grayscale
|
| 44 |
+
img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
|
| 45 |
+
# 2. Resize to 128 (width) x 32 (height)
|
| 46 |
+
img = cv2.resize(img, (128, 32))
|
| 47 |
+
# 3. Normalize
|
| 48 |
+
img = (img / 255.0).astype(np.float32)
|
| 49 |
+
# 4. Transpose (W, H) -> (H, W) for CRNN processing
|
| 50 |
+
img = img.T
|
| 51 |
+
# 5. Expand dimensions for batch and channel
|
| 52 |
+
img = np.expand_dims(img, axis=-1)
|
| 53 |
+
return np.expand_dims(img, axis=0)
|
| 54 |
|
| 55 |
+
def predict(self, image_path):
|
| 56 |
+
# Run inference
|
| 57 |
+
batch_img = self.preprocess(image_path)
|
| 58 |
+
predictions = self.model.predict(batch_img)
|
| 59 |
+
|
| 60 |
+
# CTC Decode (Greedy)
|
| 61 |
+
input_len = np.ones(predictions.shape[0]) * predictions.shape[1]
|
| 62 |
+
results = keras.backend.ctc_decode(predictions, input_length=input_len, greedy=True)[0][0]
|
| 63 |
+
|
| 64 |
+
# Map indices to characters
|
| 65 |
+
text = ""
|
| 66 |
+
for res in results[0]:
|
| 67 |
+
if res != -1:
|
| 68 |
+
text += self.vocab[int(res)]
|
| 69 |
+
return text
|
| 70 |
|
| 71 |
+
# Initialize
|
| 72 |
+
ocr = QalamNet()
|
| 73 |
|
| 74 |
+
# Predict
|
| 75 |
+
# text = ocr.predict("sample_arabic_text.png")
|
| 76 |
+
# print(f"Predicted Text: {text}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
```
|
| 78 |
|
| 79 |
+
## 🧠 Model Architecture
|
| 80 |
+
Qalam-Net employs a specialized **CNN-BiLSTM-Attention** pipeline:
|
| 81 |
+
- **Spatial Features**: 3-block CNN with BatchNormalization.
|
| 82 |
+
- **Sequence Context**: Stacked Bidirectional LSTMs.
|
| 83 |
+
- **Focus Mechanism**: Self-attention layer to resolve overlapping Arabic characters.
|
| 84 |
+
- **Loss**: Trained using Connectionist Temporal Classification (CTC).
|
| 85 |
|
| 86 |
---
|
| 87 |
**Developed by [Ali Khalid](https://github.com/Ali0044)**
|