💜 Github | 🤗 Hugging Face | 📚 Cookbooks
🖥️ Demo
🚀 Ready-to-Use Arabic OCR Engine
Arabic-GLM-OCR-v1 is a production-optimized model for Arabic OCR, developed from GLM-OCR for high-accuracy document understanding.
Specifically designed for real-world Arabic documents, The most powerful Arabic handwriting recognition model ever . it delivers powerful performance in extracting printed and handwritten Arabic text from structured and semi-structured documents.
💎 Key Strengths
✅ Highly accurate Arabic text reconstruction
✅ Preserves punctuation well
✅ Clear spacing and consistent formatting
✅ Fine-tuned decoding strategy
✅ Safe generation settings for production environments
🧠 Technical Architecture
Base Model: GLM-OCR (Visual Language Model)
Fine-tuning:
Accuracy: FP16
Loss Strategy: Supervised training with answers only
Guidance hiding: Enabled
Learning Method: Progression from easy to difficult
Engineering Outcomes
- Stable convergence
- Minimal over-customization
- Robust generalization
- Clear symbol hiding behavior
📊 Training Stability Analysis
| Scale | Final Value |
|--------|------------|
| Training Loss | ~0.35 |
| Evaluation Loss | ~0.34 |
| Gap | <10% |
✅ Observations
- Stable loss curves
- No over-allocation observed
- Balanced behavior between training and evaluation
- Gradual improvement in evaluation during training
🔎 Over-allocation Evaluation
| Indicator | Evaluation |
|------------|------------|
| Training Stability | ★★★★★ |
| Generalization | ★★★★★ |
| Over-allocation Risk | Very Low |
| Methodology Efficiency | Excellent |
| Hiding Accuracy | Verified |
📈 Performance Characteristics
| Scenario | Performance |
|----------|------------|
| Clear Printed Text | ★★★★★ |
| Medium Quality Scan | ★★★★☆ |
| Significant Distortion | ★★☆☆☆ |
| Arabic Handwriting | Excellent |
Strengths
Handles lines of Arabic text
Preserves punctuation well
Reconstructs spaces stably
Works best on scanned documents with clear text
Prioritizes:
Accuracy Consistency Stability Ease of deployment
⚠️ The model works with very high efficiency and is still in the testing phase, with ongoing work to improve the formatting. It is the most powerful OCR model ever.
⚠️ Known Limitations
⚙️ Implementation Methodology
The official inference pipeline and the modified pipeline differ significantly in processing strategy.
The modified implementation provides:
- Better generation length control
- Improved repetition handling
- Cleaner post-processing
- More stable decoding behavior
The official pipeline requires adjustments to fully suit structured Arabic OCR tasks.
For this reason, development continues using the optimized modified pipeline, with ongoing stability and formatting improvements.
⚠️ Chat Template Compatibility
The model sherif1313/Arabic-GLM-OCR-v1 may not be fully aligned with the default apply_chat_template behavior.
Improper usage may lead to:
- Incorrect image token encoding
- Minor text token misalignment
- Reduced OCR extraction accuracy
It is recommended to verify prompt formatting and ensure correct image-text separation during preprocessing.
🔁 Generation Control Notice
Using excessively large generation limits such as:
may cause:
- Repetitive outputs
- Failure to stop at
eos_token - Duplicate or unstructured text
- Recommended settings:
install
git clone https://github.com/zai-org/glm-ocr.git cd glm-ocr uv venv --python 3.12 --seed && source .venv/bin/activate uv pip install -e .
import argparse
import sys
import os
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch
from PIL import Image
MODEL_PATH = "sherif1313/Arabic-GLM-OCR-v1"
# تحديد الجهاز (GPU إن وجد)
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32
print(f"[معلومات] الجهاز: {device}", file=sys.stderr)
# تحميل النموذج والمعالج
try:
processor = AutoProcessor.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
MODEL_PATH,
dtype=dtype,
trust_remote_code=True,
low_cpu_mem_usage=True,
device_map="auto"
)
model.eval()
except Exception as e:
print(f"[خطأ] فشل تحميل النموذج: {e}", file=sys.stderr)
sys.exit(1)
def ocr_image(image_path):
try:
image = Image.open(image_path).convert("RGB")
except Exception as e:
return f"[خطأ] لا يمكن فتح الصورة: {e}"
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Text Recognition:"}
],
}
]
try:
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
generated_ids = model.generate(
**inputs,
max_new_tokens=2048,
do_sample=False
)
hasil = generated_ids[0][len(inputs["input_ids"][0]):]
teks_final = processor.decode(hasil, skip_special_tokens=True)
return teks_final
except Exception as e:
return f"[خطأ] فشلت عملية التعرف: {e}"
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="التعرف على النص من صور متعددة باستخدام Arabic-GLM-OCR-v1")
parser.add_argument("paths", nargs="*", help="مسارات الملفات أو المجلدات (إذا كان مجلدًا تتم معالجة كل الصور داخله)")
parser.add_argument("--ext", default=".jpg,.jpeg,.png,.bmp,.tiff", help="امتدادات الصور المفصولة بفواصل")
args = parser.parse_args()
# إذا لم يحدد المستخدم أي مسار، نستخدم المجلد الافتراضي
if not args.paths:
default_dir = "/home/sheriff/Desktop/5"
if os.path.isdir(default_dir):
print(f"[معلومات] لم يتم تحديد مسار، نستخدم المجلد الافتراضي: {default_dir}", file=sys.stderr)
args.paths = [default_dir]
else:
print("[خطأ] لم تحدد أي مسار والمجلد الافتراضي غير موجود!", file=sys.stderr)
sys.exit(1)
# تجميع قائمة بملفات الصور المراد معالجتها
image_files = []
extensions = [ext.strip().lower() for ext in args.ext.split(",")]
for path in args.paths:
if os.path.isfile(path):
# إذا كان الملف موجوداً وله امتداد صورة، نضيفه
ext = os.path.splitext(path)[1].lower()
if ext in extensions:
image_files.append(path)
else:
print(f"[تحذير] الملف {path} ليس صورة مدعومة (الامتدادات: {extensions})", file=sys.stderr)
elif os.path.isdir(path):
# إذا كان مجلداً، نبحث عن جميع الصور داخله (غير متكرر)
for file in os.listdir(path):
filepath = os.path.join(path, file)
if os.path.isfile(filepath):
ext = os.path.splitext(file)[1].lower()
if ext in extensions:
image_files.append(filepath)
else:
print(f"[خطأ] المسار غير موجود: {path}", file=sys.stderr)
if not image_files:
print("[خطأ] لا توجد صور صالحة للمعالجة!", file=sys.stderr)
sys.exit(1)
# معالجة كل صورة
for img_path in image_files:
print(f"\n--- الملف: {img_path} ---")
result = ocr_image(img_path)
print(result)
...
## Web
```python
import gradio as gr
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch
from PIL import Image
# --- KONFIGURASI MODEL ---
MODEL_PATH = "sherif1313/Arabic-GLM-OCR-v1"
# Deteksi perangkat secara otomatis
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32
print(f"🚀 Mesin OCR dimulai: Device={device} | Dtype={dtype}")
# --- INISIALISASI MODEL (dengan pengecekan error) ---
try:
print("⏳ Memuat processor...")
processor = AutoProcessor.from_pretrained(MODEL_PATH, trust_remote_code=True)
print("⏳ Memuat model (mungkin butuh waktu beberapa menit)...")
model = AutoModelForImageTextToText.from_pretrained(
MODEL_PATH,
torch_dtype=dtype,
trust_remote_code=True,
low_cpu_mem_usage=True,
device_map="auto"
)
model.eval()
print("✅ Model siap digunakan!")
except Exception as e:
print(f"❌ Gagal memuat model: {e}")
raise # Hentikan eksekusi jika model gagal dimuat
# --- DAFTAR GAMBAR CONTOH (pastikan file-file ini ada di folder yang sama dengan skrip) ---
EXAMPLE_IMAGES = [
"train_22062.jpg",
"train_22057.jpg",
"BULAC_MS_ARA_417_0006_0011.jpg",
"00025.png",
"AHTD3A0074_Para4_1.jpg",
"00060.png",
]
# --- FUNGSI OCR ---
def proses_intelijen(image):
if image is None:
return "⚠️ Silakan unggah gambar terlebih dahulu."
# Format pesan sesuai template model
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Text Recognition:"}
],
}
]
try:
# Terapkan template chat dan tokenisasi
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt"
).to(model.device)
# Generate teks
with torch.no_grad():
generated_ids = model.generate(
**inputs,
max_new_tokens=2048,
do_sample=False
)
# Ambil hanya bagian output (tanpa prompt)
hasil = generated_ids[0][len(inputs["input_ids"][0]):]
teks_final = processor.decode(hasil, skip_special_tokens=True)
return teks_final
except Exception as e:
return f"🚨 Terjadi kesalahan: {str(e)}"
# --- ANTARMUKA GRADIO ---
css_custom = """
.container { max-width: 1200px; margin: auto; padding-top: 20px; }
h1 { text-align: center; color: #3b82f6; }
"""
with gr.Blocks(css=css_custom, title="Arabic GLM-OCR") as app:
with gr.Column(elem_classes="container"):
gr.Markdown("# Arabic GLM-OCR")
gr.Markdown("Arabic OCR powered by GLM-OCR.")
with gr.Row():
with gr.Column(scale=1):
input_img = gr.Image(type="pil", label="Upload Gambar", height=450)
scan_btn = gr.Button("🚀 MULAI SCAN", variant="primary", size="lg")
with gr.Column(scale=1):
output_txt = gr.Textbox(label="Hasil Teks", lines=24)
# Tambahkan contoh gambar yang bisa diklik
gr.Examples(
examples=EXAMPLE_IMAGES,
inputs=input_img,
outputs=output_txt,
fn=proses_intelijen,
cache_examples=False, # Set ke True jika ingin mempercepat (butuh disk space)
label="Contoh Gambar (klik untuk memuat)"
)
# Hubungkan tombol dengan fungsi
scan_btn.click(fn=proses_intelijen, inputs=input_img, outputs=output_txt)
if __name__ == "__main__":
app.launch()
Careful verification of message formatting is recommended when using custom paths.
📜 License
Apache 2.0
- Downloads last month
- 216
Model tree for sherif1313/Arabic-GLM-OCR-v1
Base model
zai-org/GLM-OCR