GLM-OCR Fine-tuned — Documents arabes & français

Modèle GLM-OCR fine-tuné avec LoRA sur ~5 466 documents annotés manuellement.

Données d'entraînement

  • 4 704 documents arabes (annotations Supervisely manuelles)
  • 113 documents latin (français/anglais)
  • 800 documents anglais scannés (pseudo-labels)
  • Types : Administrative form, Invoice, Receipt, Newspaper, Official document, Handwritten text...

Entraînement

  • Modèle de base : zai-org/GLM-OCR
  • Méthode : LoRA continu (rank=16, alpha=32)
  • Epochs : 3 — Learning rate : 1e-4
  • GPU : NVIDIA RTX 4050 Laptop (6 Go VRAM)

Usage

from transformers import AutoProcessor, GlmOcrForConditionalGeneration
import torch
from PIL import Image

model_id = "maloukafer/GLM-OCR-finetuned-documents"
processor = AutoProcessor.from_pretrained(model_id)
model = GlmOcrForConditionalGeneration.from_pretrained(
    model_id, dtype=torch.bfloat16, device_map="auto"
)
model.eval()

image = Image.open("document.jpg").convert("RGB")
messages = [{"role": "user", "content": [
    {"type": "image"},
    {"type": "text", "text": "Document Parsing:"}
]}]

text_input = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(text=text_input, images=[image], return_tensors="pt").to("cuda")

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=1024, do_sample=False)

n    = inputs["input_ids"].shape[1]
text = processor.decode(out[0][n:], skip_special_tokens=True).strip()
print(text)
Downloads last month
74
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for maloukafer/GLM-OCR-finetuned-documents

Base model

zai-org/GLM-OCR
Adapter
(5)
this model