Ali0044 commited on
Commit
060cd9d
·
verified ·
1 Parent(s): 73b55fd

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +71 -41
README.md CHANGED
@@ -10,23 +10,53 @@ tags:
10
  - pytorch
11
  ---
12
 
13
- # Qalam-Net (قلم-نت): Advanced Arabic OCR (v2 Portable)
 
 
 
 
 
 
 
 
 
14
 
15
- Qalam-Net is a high-performance, cross-backend Optical Character Recognition (OCR) model for Arabic. Patched for **Keras 3**, it supports **JAX**, **PyTorch**, and **TensorFlow**.
16
 
17
- ## 🚀 Quick Start (Robust Usage)
 
 
 
 
 
 
18
 
19
- This guide uses a custom **NumPy-based decoder** to ensure compatibility across all Keras 3 backends without needing `tf.keras.backend.ctc_decode`.
 
20
 
21
- ### 1. Installation
22
- ```bash
23
- pip install -U "keras>=3.0" jax jaxlib huggingface_hub opencv-python
 
 
 
 
 
 
24
  ```
25
 
26
- ### 2. Implementation
 
 
 
 
 
 
 
 
27
  ```python
28
  import os
29
- os.environ["KERAS_BACKEND"] = "jax" # Options: "jax", "tensorflow", "torch"
30
 
31
  import keras
32
  import numpy as np
@@ -35,50 +65,50 @@ from huggingface_hub import hf_hub_download
35
 
36
  class QalamNet:
37
  def __init__(self, repo_id="Ali0044/Qalam-Net"):
38
- # 1. Download and Load Model
39
  print(f"Loading Qalam-Net from {repo_id}...")
40
  model_path = hf_hub_download(repo_id=repo_id, filename="model.keras")
41
  self.model = keras.saving.load_model(model_path)
42
-
43
- # 2. Define the exact 38-character Arabic Vocabulary
44
- # [ALIF, BA, TA, THA, JEEM, HAA, KHAA, DAL, THAL, RA, ZAY, SEEN, SHEEN, SAD, DAD, TAA, ZAA, AIN, GHAIN, FA, QAF, KAF, LAM, MEEM, NOON, HA, WAW, YA, TEH_MARBUTA, ALEF_MAKSURA, ALEF_HAMZA_ABOVE, ALEF_HAMZA_BELOW, ALEF_MADDA, WAW_HAMZA, YEH_HAMZA, HAMZA, SPACE, TATWEEL]
45
  self.vocab = ['ا', 'ب', 'ت', 'ث', 'ج', 'ح', 'خ', 'د', 'ذ', 'ر', 'ز', 'س', 'ش', 'ص', 'ض', 'ط', 'ظ', 'ع', 'غ', 'ف', 'ق', 'ك', 'ل', 'م', 'ن', 'ه', 'و', 'ي', 'ة', 'ى', 'أ', 'إ', 'آ', 'ؤ', 'ئ', 'ء', ' ', 'ـ']
46
 
47
- def preprocess(self, image_path):
 
48
  img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
49
  img = cv2.resize(img, (128, 32)) / 255.0
50
- img = img.T # Transpose for CRNN architecture
51
- img = np.expand_dims(img, axis=(-1, 0))
52
- return img.astype(np.float32)
53
-
54
- def predict(self, image_path):
55
- batch_img = self.preprocess(image_path)
56
- preds = self.model.predict(batch_img) # Output shape: (1, 32, 39)
57
-
58
- # 3. NumPy-based CTC Greedy Decoding (Cross-Backend)
59
- argmax_preds = np.argmax(preds, axis=-1)[0]
60
 
61
- # Remove consecutive duplicates
62
- unique_indices = [argmax_preds[i] for i in range(len(argmax_preds))
63
- if i == 0 or argmax_preds[i] != argmax_preds[i-1]]
64
 
65
- # Remove blank index (index 38)
66
- blank_index = preds.shape[-1] - 1
67
- final_indices = [idx for idx in unique_indices if idx != blank_index]
68
-
69
- # Map to vocabulary
70
- return "".join([self.vocab[idx] for idx in final_indices if idx < len(self.vocab)])
71
 
72
- # Usage
73
  # ocr = QalamNet()
74
- # print(f"Predicted Arabic Text: {ocr.predict('sample.png')}")
75
  ```
 
 
 
76
 
77
- ## 🧠 Model Architecture
78
- Qalam-Net employs a specialized **CNN-BiLSTM-Attention** pipeline:
79
- - **CNN Backbone**: Extracts high-level spatial features from Arabic script.
80
- - **BiLSTM Layers**: Captures the sequential nature of right-to-left writing.
81
- - **Attention Mechanism**: Resolves difficult character boundaries.
 
 
 
 
 
82
 
83
  ---
84
- **Maintained by [Ali Khalid](https://github.com/Ali0044)**
 
 
 
 
 
 
 
 
10
  - pytorch
11
  ---
12
 
13
+ <div align="center">
14
+ <img src="https://huggingface.co/Ali0044/Qalam-Net/resolve/main/banner.png" width="100%" alt="Qalam-Net Banner">
15
+
16
+ # 🖋️ Qalam-Net (قلم-نت)
17
+ ### *High-Performance, Cross-Backend Arabic OCR*
18
+
19
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
20
+ [![Framework](https://img.shields.io/badge/Framework-Keras%203-F14B5C.svg)](https://keras.io/)
21
+ [![Backend](https://img.shields.io/badge/Backend-JAX%20|%20TF%20|%20Torch-blueviolet.svg)](https://keras.io/keras_3/)
22
+ </div>
23
 
24
+ ---
25
 
26
+ ## 🌟 Highlights
27
+ - **🚀 Ultra-Fast Inference**: Native JAX/XLA support for accelerated processing.
28
+ - **🧩 Portable Architecture**: Patched (v2) to resolve serialization issues across Keras versions.
29
+ - **🎯 Precision Driven**: CNN + BiLSTM + Self-Attention pipeline optimized for Arabic script.
30
+ - **🔓 Unified Loading**: No custom layers or complex setup required for inference.
31
+
32
+ ---
33
 
34
+ ## 📖 How it Works
35
+ The model processes Arabic text images through a sophisticated multi-stage pipeline:
36
 
37
+ ```mermaid
38
+ graph LR
39
+ A[Input Image 128x32] --> B[CNN Backbone]
40
+ B --> C[Spatial Features]
41
+ C --> D[Dual BiLSTM]
42
+ D --> E[Self-Attention]
43
+ E --> F[Softmax Output]
44
+ F --> G[NumPy CTC Decoder]
45
+ G --> H[Arabic Text]
46
  ```
47
 
48
+ ---
49
+
50
+ ## 🚀 Quick Start (Robust Usage)
51
+
52
+ Use the following implementation to run inference on any platform. This uses a custom **NumPy-based decoder** for 100% cross-backend compatibility.
53
+
54
+ <details>
55
+ <summary><b>View Python Implementation</b></summary>
56
+
57
  ```python
58
  import os
59
+ os.environ["KERAS_BACKEND"] = "jax"
60
 
61
  import keras
62
  import numpy as np
 
65
 
66
  class QalamNet:
67
  def __init__(self, repo_id="Ali0044/Qalam-Net"):
 
68
  print(f"Loading Qalam-Net from {repo_id}...")
69
  model_path = hf_hub_download(repo_id=repo_id, filename="model.keras")
70
  self.model = keras.saving.load_model(model_path)
 
 
 
71
  self.vocab = ['ا', 'ب', 'ت', 'ث', 'ج', 'ح', 'خ', 'د', 'ذ', 'ر', 'ز', 'س', 'ش', 'ص', 'ض', 'ط', 'ظ', 'ع', 'غ', 'ف', 'ق', 'ك', 'ل', 'م', 'ن', 'ه', 'و', 'ي', 'ة', 'ى', 'أ', 'إ', 'آ', 'ؤ', 'ئ', 'ء', ' ', 'ـ']
72
 
73
+ def predict(self, image_path):
74
+ # Preprocessing
75
  img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
76
  img = cv2.resize(img, (128, 32)) / 255.0
77
+ img = np.expand_dims(img.T, axis=(-1, 0)).astype(np.float32)
 
 
 
 
 
 
 
 
 
78
 
79
+ # Inference
80
+ probs = self.model.predict(img)
 
81
 
82
+ # CTC Greedy Decoding
83
+ argmax_preds = np.argmax(probs, axis=-1)[0]
84
+ unique = [argmax_preds[i] for i in range(len(argmax_preds)) if i==0 or argmax_preds[i]!=argmax_preds[i-1]]
85
+ final = [idx for idx in unique if idx != 38]
86
+ return "".join([self.vocab[idx] for idx in final if idx < 38])
 
87
 
 
88
  # ocr = QalamNet()
89
+ # print(ocr.predict('text.png'))
90
  ```
91
+ </details>
92
+
93
+ ---
94
 
95
+ ## 📊 Performance & Metrics
96
+ Training was conducted on the **mssqpi/Arabic-OCR-Dataset** over 50 epochs.
97
+
98
+ | Metric | Value |
99
+ | :--- | :--- |
100
+ | **Input Shape** | 128 x 32 x 1 (Grayscale) |
101
+ | **Output Classes** | 39 (38 Chars + 1 Blank) |
102
+ | **Final Loss** | ~13.13 |
103
+ | **Val Loss** | ~89.79 |
104
+ | **Framework** | Keras 3.x (Native) |
105
 
106
  ---
107
+
108
+ ## 🤝 Acknowledgments
109
+ Developed and maintained by **[Ali Khalid](https://github.com/Ali0044)**. This model is part of a comparative research study on Arabic OCR architectures.
110
+
111
+ ---
112
+
113
+ > [!TIP]
114
+ > **Pro Tip**: Use the **JAX** backend for the fastest inference times on modern CPUs and GPUs!