Ali0044 commited on
Commit
73dec90
·
verified ·
1 Parent(s): 13d6c69

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +58 -67
README.md CHANGED
@@ -4,93 +4,84 @@ license: apache-2.0
4
  tags:
5
  - ocr
6
  - arabic
7
- - tensorflow
8
  - keras
9
- - crnn
10
- - attention
11
- datasets:
12
- - mssqpi/Arabic-OCR-Dataset
13
- metrics:
14
- - accuracy
15
- - character-error-rate
16
- model-index:
17
- - name: Qalam-Net
18
- results:
19
- - task:
20
- type: optical-character-recognition
21
- name: Optical Character Recognition
22
- dataset:
23
- name: Arabic-OCR-Dataset
24
- type: mssqpi/Arabic-OCR-Dataset
25
- metrics:
26
- - type: accuracy
27
- value: 0.92
28
- name: Character Accuracy
29
  ---
30
 
31
  # Qalam-Net (قلم-نت): Advanced Arabic OCR
32
 
33
- Qalam-Net is a high-performance Optical Character Recognition (OCR) model specifically designed for the complexities of the Arabic script. It utilizes a hybrid architecture combining Convolutional Neural Networks (CNN) for feature extraction, Bidirectional Long Short-Term Memory (BiLSTM) for sequence modeling, and an Attention mechanism to improve character localization in cursive text.
34
 
35
- ## Model Highlights
36
- - **Architecture**: CNN-BiLSTM-Attention (CRNN) with CTC Loss.
37
- - **Language Support**: Modern Standard Arabic (MSA) and common cursive variations.
38
- - **Performance**: Optimized for 128x32 grayscale images.
39
- - **Robustness**: Handles various fonts, ligatures, and diacritics common in Arabic.
40
 
41
- ## Architecture Overview
42
- The model consists of:
43
- 1. **CNN Backbone**: Multiple `Conv2D` layers with `BatchNormalization` and `MaxPooling` to extract spatial features from the 128x32 input.
44
- 2. **Sequence Modeling**: Two layers of `Bidirectional LSTM` (128 units) to capture temporal dependencies in the Arabic script sequence.
45
- 3. **Attention Layer**: A self-attention mechanism that weighs the importance of different spatial features before final character prediction.
46
- 4. **CTC Output**: Connectionist Temporal Classification (CTC) layer for mapping variable-length input sequences to character labels without explicit alignment.
47
 
48
- ## Usage Instructions
49
-
50
- ### Dependencies
51
  ```bash
52
- pip install tensorflow opencv-python numpy
53
  ```
54
 
55
- ### Loading and Prediction
56
  ```python
57
- import tensorflow as tf
 
 
 
58
  import numpy as np
59
  import cv2
60
 
61
- # Custom CTCLayer required for loading the training model
62
- class CTCLayer(tf.keras.layers.Layer):
63
- def __init__(self, name=None, **kwargs):
64
- super().__init__(name=name, **kwargs)
65
- self.loss_fn = tf.keras.backend.ctc_batch_cost
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
- def call(self, y_true, y_pred):
68
- # Implementation of CTC loss calculation
69
- return y_pred
 
 
 
 
 
 
 
 
 
 
 
 
70
 
71
- # Load the model
72
- model = tf.keras.models.load_model('Qalam-Net.keras', custom_objects={'CTCLayer': CTCLayer})
73
 
74
- def preprocess(image_path):
75
- img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
76
- img = cv2.resize(img, (128, 32))
77
- img = (img / 255.0).astype(np.float32)
78
- img = img.T
79
- img = np.expand_dims(img, axis=-1)
80
- return np.expand_dims(img, axis=0)
81
-
82
- # Prediction Logic (CTC Decode)
83
- def decode_prediction(pred):
84
- # Use tf.keras.backend.ctc_decode to extract text from softmax output
85
- # requires character mapping (StringLookup)
86
- pass
87
  ```
88
 
89
- ## Dataset Information
90
- The model was trained on a subset of the **[Arabic-OCR-Dataset](https://huggingface.co/datasets/mssqpi/Arabic-OCR-Dataset)** provided by mssqpi. The training set includes diverse samples of printed and handwritten-style Arabic text.
91
-
92
- ## Citation
93
- If you use Qalam-Net in your research, please cite the associated comparative study on Arabic OCR architectures.
 
94
 
95
  ---
96
  **Developed by [Ali Khalid](https://github.com/Ali0044)**
 
4
  tags:
5
  - ocr
6
  - arabic
 
7
  - keras
8
+ - jax
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
  # Qalam-Net (قلم-نت): Advanced Arabic OCR
12
 
13
+ Qalam-Net is a high-performance, cross-backend Optical Character Recognition (OCR) model for Arabic. Built on **Keras 3**, it supports **JAX**, **PyTorch**, and **TensorFlow** backends.
14
 
15
+ ## 🚀 Quick Start (Advanced Usage)
 
 
 
 
16
 
17
+ The following example demonstrates how to use **JAX** for ultra-fast XLA-accelerated inference.
 
 
 
 
 
18
 
19
+ ### 1. Installation
 
 
20
  ```bash
21
+ pip install -U "keras>=3.0" jax jaxlib huggingface_hub opencv-python
22
  ```
23
 
24
+ ### 2. Implementation
25
  ```python
26
+ import os
27
+ os.environ["KERAS_BACKEND"] = "jax" # Options: "jax", "tensorflow", "torch"
28
+
29
+ import keras
30
  import numpy as np
31
  import cv2
32
 
33
+ class QalamNet:
34
+ def __init__(self, repo_id="Ali0044/Qalam-Net"):
35
+ # Download and load the latest model using the hf:// shorthand
36
+ # This automatically downloads the model.keras file from the root of the repo
37
+ self.model = keras.saving.load_model(f"hf://{repo_id}")
38
+
39
+ # Standard Arabic Vocabulary (Matches training set)
40
+ self.vocab = [' ', '!', '"', '#', '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '=', '?', '[', ']', 'ء', 'آ', 'أ', 'ؤ', 'إ', 'ئ', 'ا', 'ب', 'ة', 'ت', 'ث', 'ج', 'ح', 'خ', 'د', 'ذ', 'ر', 'ز', 'س', 'ش', 'ص', 'ض', 'ط', 'ظ', 'ع', 'غ', 'ـ', 'ف', 'ق', 'ك', 'ل', 'م', 'ن', 'ه', 'و', 'ى', 'ي', 'ً', 'ٌ', 'ٍ', 'َ', 'ُ', 'ِ', 'ّ', 'ْ', '٠', '١', '٢', '٣', '٤', '٥', '٦', '٧', '٨', '٩']
41
+
42
+ def preprocess(self, image_path):
43
+ # 1. Load as grayscale
44
+ img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
45
+ # 2. Resize to 128 (width) x 32 (height)
46
+ img = cv2.resize(img, (128, 32))
47
+ # 3. Normalize
48
+ img = (img / 255.0).astype(np.float32)
49
+ # 4. Transpose (W, H) -> (H, W) for CRNN processing
50
+ img = img.T
51
+ # 5. Expand dimensions for batch and channel
52
+ img = np.expand_dims(img, axis=-1)
53
+ return np.expand_dims(img, axis=0)
54
 
55
+ def predict(self, image_path):
56
+ # Run inference
57
+ batch_img = self.preprocess(image_path)
58
+ predictions = self.model.predict(batch_img)
59
+
60
+ # CTC Decode (Greedy)
61
+ input_len = np.ones(predictions.shape[0]) * predictions.shape[1]
62
+ results = keras.backend.ctc_decode(predictions, input_length=input_len, greedy=True)[0][0]
63
+
64
+ # Map indices to characters
65
+ text = ""
66
+ for res in results[0]:
67
+ if res != -1:
68
+ text += self.vocab[int(res)]
69
+ return text
70
 
71
+ # Initialize
72
+ ocr = QalamNet()
73
 
74
+ # Predict
75
+ # text = ocr.predict("sample_arabic_text.png")
76
+ # print(f"Predicted Text: {text}")
 
 
 
 
 
 
 
 
 
 
77
  ```
78
 
79
+ ## 🧠 Model Architecture
80
+ Qalam-Net employs a specialized **CNN-BiLSTM-Attention** pipeline:
81
+ - **Spatial Features**: 3-block CNN with BatchNormalization.
82
+ - **Sequence Context**: Stacked Bidirectional LSTMs.
83
+ - **Focus Mechanism**: Self-attention layer to resolve overlapping Arabic characters.
84
+ - **Loss**: Trained using Connectionist Temporal Classification (CTC).
85
 
86
  ---
87
  **Developed by [Ali Khalid](https://github.com/Ali0044)**