Update README.md

f820033 verified 27 days ago

12.9 kB

license: apache-2.0
language:
  - ar
base_model:
  - zai-org/GLM-OCR
pipeline_tag: image-text-to-text
new_version: sherif1313/Arabic-GLM-OCR-v2

💜 Github | 🤗 Hugging Face | 📚 Cookbooks
🖥️ Demo

# 🏆 sherif1313/Arabic-GLM-OCR-v1 ### High-Quality AI Model for Arabic Documents

🚀 Ready-to-Use Arabic OCR Engine

Arabic-GLM-OCR-v1 is a production-optimized model for Arabic OCR, developed from GLM-OCR for high-accuracy document understanding.

Specifically designed for real-world Arabic documents, The most powerful Arabic handwriting recognition model ever . it delivers powerful performance in extracting printed and handwritten Arabic text from structured and semi-structured documents.

💎 Key Strengths

✅ Highly accurate Arabic text reconstruction

✅ Preserves punctuation well

✅ Clear spacing and consistent formatting

✅ Fine-tuned decoding strategy

✅ Safe generation settings for production environments

🧠 Technical Architecture

Base Model: GLM-OCR (Visual Language Model)
Fine-tuning:
Accuracy: FP16
Loss Strategy: Supervised training with answers only
Guidance hiding: Enabled
Learning Method: Progression from easy to difficult

Engineering Outcomes

Stable convergence
Minimal over-customization
Robust generalization
Clear symbol hiding behavior

📊 Training Stability Analysis

| Scale | Final Value |

|--------|------------|

| Training Loss | ~0.35 |

| Evaluation Loss | ~0.34 |

| Gap | <10% |

✅ Observations

Stable loss curves
No over-allocation observed
Balanced behavior between training and evaluation
Gradual improvement in evaluation during training

🔎 Over-allocation Evaluation

| Indicator | Evaluation |

|------------|------------|

| Training Stability | ★★★★★ |

| Generalization | ★★★★★ |

| Over-allocation Risk | Very Low |

| Methodology Efficiency | Excellent |

| Hiding Accuracy | Verified |

📈 Performance Characteristics

| Scenario | Performance |

|----------|------------|

| Clear Printed Text | ★★★★★ |

| Medium Quality Scan | ★★★★☆ |

| Significant Distortion | ★★☆☆☆ |

| Arabic Handwriting | Excellent |

Strengths

Handles lines of Arabic text
Preserves punctuation well
Reconstructs spaces stably
Works best on scanned documents with clear text

Prioritizes:

Accuracy Consistency Stability Ease of deployment

⚠️ The model works with very high efficiency and is still in the testing phase, with ongoing work to improve the formatting. It is the most powerful OCR model ever.

⚠️ Known Limitations

⚙️ Implementation Methodology

The official inference pipeline and the modified pipeline differ significantly in processing strategy.

The modified implementation provides:

Better generation length control
Improved repetition handling
Cleaner post-processing
More stable decoding behavior

The official pipeline requires adjustments to fully suit structured Arabic OCR tasks.

For this reason, development continues using the optimized modified pipeline, with ongoing stability and formatting improvements.

⚠️ Chat Template Compatibility

The model sherif1313/Arabic-GLM-OCR-v1 may not be fully aligned with the default apply_chat_template behavior.

Improper usage may lead to:

Incorrect image token encoding
Minor text token misalignment
Reduced OCR extraction accuracy

It is recommended to verify prompt formatting and ensure correct image-text separation during preprocessing.

🔁 Generation Control Notice

Using excessively large generation limits such as:

may cause:

Repetitive outputs
Failure to stop at eos_token
Duplicate or unstructured text
Recommended settings:

install

git clone https://github.com/zai-org/glm-ocr.git cd glm-ocr uv venv --python 3.12 --seed && source .venv/bin/activate uv pip install -e .

import argparse
import sys
import os
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch
from PIL import Image

MODEL_PATH = "sherif1313/Arabic-GLM-OCR-v1"

# تحديد الجهاز (GPU إن وجد)
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32
print(f"[معلومات] الجهاز: {device}", file=sys.stderr)

# تحميل النموذج والمعالج
try:
    processor = AutoProcessor.from_pretrained(MODEL_PATH, trust_remote_code=True)
    model = AutoModelForImageTextToText.from_pretrained(
        MODEL_PATH,
        dtype=dtype,
        trust_remote_code=True,
        low_cpu_mem_usage=True,
        device_map="auto"
    )
    model.eval()
except Exception as e:
    print(f"[خطأ] فشل تحميل النموذج: {e}", file=sys.stderr)
    sys.exit(1)

def ocr_image(image_path):
    try:
        image = Image.open(image_path).convert("RGB")
    except Exception as e:
        return f"[خطأ] لا يمكن فتح الصورة: {e}"

    messages = [
        {
            "role": "user",
            "content": [
                {"type": "image", "image": image},
                {"type": "text", "text": "Text Recognition:"}
            ],
        }
    ]

    try:
        inputs = processor.apply_chat_template(
            messages,
            add_generation_prompt=True,
            tokenize=True,
            return_dict=True,
            return_tensors="pt"
        ).to(model.device)

        with torch.no_grad():
            generated_ids = model.generate(
                **inputs,
                max_new_tokens=2048,
                do_sample=False
            )

        hasil = generated_ids[0][len(inputs["input_ids"][0]):]
        teks_final = processor.decode(hasil, skip_special_tokens=True)
        return teks_final
    except Exception as e:
        return f"[خطأ] فشلت عملية التعرف: {e}"

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="التعرف على النص من صور متعددة باستخدام Arabic-GLM-OCR-v1")
    parser.add_argument("paths", nargs="*", help="مسارات الملفات أو المجلدات (إذا كان مجلدًا تتم معالجة كل الصور داخله)")
    parser.add_argument("--ext", default=".jpg,.jpeg,.png,.bmp,.tiff", help="امتدادات الصور المفصولة بفواصل")
    args = parser.parse_args()

    # إذا لم يحدد المستخدم أي مسار، نستخدم المجلد الافتراضي
    if not args.paths:
        default_dir = "/home/sheriff/Desktop/5"
        if os.path.isdir(default_dir):
            print(f"[معلومات] لم يتم تحديد مسار، نستخدم المجلد الافتراضي: {default_dir}", file=sys.stderr)
            args.paths = [default_dir]
        else:
            print("[خطأ] لم تحدد أي مسار والمجلد الافتراضي غير موجود!", file=sys.stderr)
            sys.exit(1)

    # تجميع قائمة بملفات الصور المراد معالجتها
    image_files = []
    extensions = [ext.strip().lower() for ext in args.ext.split(",")]

    for path in args.paths:
        if os.path.isfile(path):
            # إذا كان الملف موجوداً وله امتداد صورة، نضيفه
            ext = os.path.splitext(path)[1].lower()
            if ext in extensions:
                image_files.append(path)
            else:
                print(f"[تحذير] الملف {path} ليس صورة مدعومة (الامتدادات: {extensions})", file=sys.stderr)
        elif os.path.isdir(path):
            # إذا كان مجلداً، نبحث عن جميع الصور داخله (غير متكرر)
            for file in os.listdir(path):
                filepath = os.path.join(path, file)
                if os.path.isfile(filepath):
                    ext = os.path.splitext(file)[1].lower()
                    if ext in extensions:
                        image_files.append(filepath)
        else:
            print(f"[خطأ] المسار غير موجود: {path}", file=sys.stderr)

    if not image_files:
        print("[خطأ] لا توجد صور صالحة للمعالجة!", file=sys.stderr)
        sys.exit(1)

    # معالجة كل صورة
    for img_path in image_files:
        print(f"\n--- الملف: {img_path} ---")
        result = ocr_image(img_path)
        print(result)
...



## Web


```python

import gradio as gr
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch
from PIL import Image

# --- KONFIGURASI MODEL ---
MODEL_PATH = "sherif1313/Arabic-GLM-OCR-v1"

# Deteksi perangkat secara otomatis
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32
print(f"🚀 Mesin OCR dimulai: Device={device} | Dtype={dtype}")

# --- INISIALISASI MODEL (dengan pengecekan error) ---
try:
    print("⏳ Memuat processor...")
    processor = AutoProcessor.from_pretrained(MODEL_PATH, trust_remote_code=True)

    print("⏳ Memuat model (mungkin butuh waktu beberapa menit)...")
    model = AutoModelForImageTextToText.from_pretrained(
        MODEL_PATH,
        torch_dtype=dtype,
        trust_remote_code=True,
        low_cpu_mem_usage=True,
        device_map="auto"
    )
    model.eval()
    print("✅ Model siap digunakan!")
except Exception as e:
    print(f"❌ Gagal memuat model: {e}")
    raise  # Hentikan eksekusi jika model gagal dimuat

# --- DAFTAR GAMBAR CONTOH (pastikan file-file ini ada di folder yang sama dengan skrip) ---
EXAMPLE_IMAGES = [
    "train_22062.jpg",
    "train_22057.jpg",
    "BULAC_MS_ARA_417_0006_0011.jpg",
    "00025.png",
    "AHTD3A0074_Para4_1.jpg",
    "00060.png",
]

# --- FUNGSI OCR ---
def proses_intelijen(image):
    if image is None:
        return "⚠️ Silakan unggah gambar terlebih dahulu."

    # Format pesan sesuai template model
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "image", "image": image},
                {"type": "text", "text": "Text Recognition:"}
            ],
        }
    ]

    try:
        # Terapkan template chat dan tokenisasi
        inputs = processor.apply_chat_template(
            messages,
            add_generation_prompt=True,
            tokenize=True,
            return_dict=True,
            return_tensors="pt"
        ).to(model.device)

        # Generate teks
        with torch.no_grad():
            generated_ids = model.generate(
                **inputs,
                max_new_tokens=2048,
                do_sample=False
            )

        # Ambil hanya bagian output (tanpa prompt)
        hasil = generated_ids[0][len(inputs["input_ids"][0]):]
        teks_final = processor.decode(hasil, skip_special_tokens=True)
        return teks_final

    except Exception as e:
        return f"🚨 Terjadi kesalahan: {str(e)}"

# --- ANTARMUKA GRADIO ---
css_custom = """
.container { max-width: 1200px; margin: auto; padding-top: 20px; }
h1 { text-align: center; color: #3b82f6; }
"""

with gr.Blocks(css=css_custom, title="Arabic GLM-OCR") as app:
    with gr.Column(elem_classes="container"):
        gr.Markdown("# Arabic GLM-OCR")
        gr.Markdown("Arabic OCR powered by GLM-OCR.")

        with gr.Row():
            with gr.Column(scale=1):
                input_img = gr.Image(type="pil", label="Upload Gambar", height=450)
                scan_btn = gr.Button("🚀 MULAI SCAN", variant="primary", size="lg")

            with gr.Column(scale=1):
                output_txt = gr.Textbox(label="Hasil Teks", lines=24)

        # Tambahkan contoh gambar yang bisa diklik
        gr.Examples(
            examples=EXAMPLE_IMAGES,
            inputs=input_img,
            outputs=output_txt,
            fn=proses_intelijen,
            cache_examples=False,  # Set ke True jika ingin mempercepat (butuh disk space)
            label="Contoh Gambar (klik untuk memuat)"
        )

    # Hubungkan tombol dengan fungsi
    scan_btn.click(fn=proses_intelijen, inputs=input_img, outputs=output_txt)

if __name__ == "__main__":
    app.launch()

Careful verification of message formatting is recommended when using custom paths.

📜 License

Apache 2.0