Upload README.md with huggingface_hub

b4d3d90 verified 1 day ago

5 kB

	---
	language: ar
	license: apache-2.0
	tags:
	- ocr
	- arabic
	- keras
	- jax
	- tensorflow
	- pytorch
	datasets:
	- mssqpi/Arabic-OCR-Dataset
	---

	<div align="center">
	<img src="https://huggingface.co/Ali0044/Qalam-Net/resolve/main/banner.png" width="100%" alt="Qalam-Net Banner">

	# 🖋️ Qalam-Net (قلم-نت)
	### High-Performance, Cross-Backend Arabic OCR

	[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
	[![Framework](https://img.shields.io/badge/Framework-Keras%203-F14B5C.svg)](https://keras.io/)
	[![Backend](https://img.shields.io/badge/Backend-JAX%20\|%20TF%20\|%20Torch-blueviolet.svg)](https://keras.io/keras_3/)
	</div>

	---

	## 🌟 Highlights
	- 🚀 Ultra-Fast Inference: Native JAX/XLA support for accelerated processing.
	- 🧩 Portable Architecture: Patched (v2) to resolve serialization issues across Keras versions.
	- 🎯 Precision Driven: CNN + BiLSTM + Self-Attention pipeline optimized for Arabic script.
	- 🔓 Unified Loading: No custom layers or complex setup required for inference.

	---

	## 📖 How it Works
	The model processes Arabic text images through a sophisticated multi-stage pipeline:

	```mermaid
	graph LR
	A[Input Image 128x32] --> B[CNN Backbone]
	B --> C[Spatial Features]
	C --> D[Dual BiLSTM]
	D --> E[Self-Attention]
	E --> F[Softmax Output]
	F --> G[NumPy CTC Decoder]
	G --> H[Arabic Text]
	```

	---

	## 🚀 Quick Start (Robust Usage)

	Use the following implementation to run inference on any platform. This uses a custom NumPy-based decoder for 100% cross-backend compatibility.

	<details>
	<summary><b>View Python Implementation</b></summary>

	```python
	import os
	os.environ["KERAS_BACKEND"] = "jax" # Options: "jax", "tensorflow", "torch"

	import keras
	import numpy as np
	import cv2
	from huggingface_hub import hf_hub_download

	class QalamNet:
	def __init__(self, repo_id="Ali0044/Qalam-Net"):
	# 1. Download and Load Model
	print(f"Loading Qalam-Net from {repo_id}...")
	model_path = hf_hub_download(repo_id=repo_id, filename="model.keras")
	self.model = keras.saving.load_model(model_path)

	# 2. Define the exact 38-character Arabic Vocabulary
	# [ALIF, BA, TA, THA, JEEM, HAA, KHAA, DAL, THAL, RA, ZAY, SEEN, SHEEN, SAD, DAD, TAA, ZAA, AIN, GHAIN, FA, QAF, KAF, LAM, MEEM, NOON, HA, WAW, YA, TEH_MARBUTA, ALEF_MAKSURA, ALEF_HAMZA_ABOVE, ALEF_HAMZA_BELOW, ALEF_MADDA, WAW_HAMZA, YEH_HAMZA, HAMZA, SPACE, TATWEEL]
	self.vocab = ['ا', 'ب', 'ت', 'ث', 'ج', 'ح', 'خ', 'د', 'ذ', 'ر', 'ز', 'س', 'ش', 'ص', 'ض', 'ط', 'ظ', 'ع', 'غ', 'ف', 'ق', 'ك', 'ل', 'م', 'ن', 'ه', 'و', 'ي', 'ة', 'ى', 'أ', 'إ', 'آ', 'ؤ', 'ئ', 'ء', ' ', 'ـ']

	def preprocess(self, image_path):
	img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
	img = cv2.resize(img, (128, 32)) / 255.0
	img = img.T # Transpose for CRNN architecture
	img = np.expand_dims(img, axis=(-1, 0))
	return img.astype(np.float32)

	def predict(self, image_path):
	batch_img = self.preprocess(image_path)
	preds = self.model.predict(batch_img) # Output shape: (1, 32, 39)

	# 3. NumPy-based CTC Greedy Decoding (Cross-Backend)
	argmax_preds = np.argmax(preds, axis=-1)[0]

	# Remove consecutive duplicates
	unique_indices = [argmax_preds[i] for i in range(len(argmax_preds))
	if i == 0 or argmax_preds[i] != argmax_preds[i-1]]

	# Remove blank index (index 38)
	blank_index = preds.shape[-1] - 1
	final_indices = [idx for idx in unique_indices if idx != blank_index]

	# Map to vocabulary
	return "".join([self.vocab[idx] for idx in final_indices if idx < len(self.vocab)])

	# Usage
	ocr = QalamNet()
	print(f"Predicted Arabic Text: {ocr.predict('/content/images.png')}")
	```
	</details>

	---

	## 📊 Performance & Metrics
	Training was conducted on the mssqpi/Arabic-OCR-Dataset over 50 epochs.

	\| Metric \| Value \|
	\| :--- \| :--- \|
	\| Input Shape \| 128 x 32 x 1 (Grayscale) \|
	\| Output Classes \| 39 (38 Chars + 1 Blank) \|
	\| Final Loss \| ~13.13 \|
	\| Val Loss \| ~89.79 \|
	\| Framework \| Keras 3.x (Native) \|

	## 📁 Dataset
	This model was trained on the [Arabic-OCR-Dataset](https://huggingface.co/datasets/mssqpi/Arabic-OCR-Dataset) provided by Muhammad AL-Qurishi (mssqpi).
	- Total Samples: ~2.16 Million images.
	- Content: A massive collection of Arabic text lines in various fonts and styles.
	- Usage: Used for training the CRNN architecture to recognize sequential Arabic script.

	---

	## 🤝 Acknowledgments
	Developed and maintained by [Ali Khalid](https://github.com/Ali0044). This model is part of a comparative research study on Arabic OCR architectures.

	---

	> [!TIP]
	> Pro Tip: Use the JAX backend for the fastest inference times on modern CPUs and GPUs!