Kabyle OCR Model – PaddleOCR Checkpoint

This is a text recognition model for the Kabyle language (written in Latin script), trained using PaddleOCR (PP‑OCRv3 architecture). The model was trained on synthetic text generated from Kabyle news corpora.

Model Details

Property Value
Architecture PP‑OCRv3 (CRNN)
Character set size 109 Kabyle characters + 1 blank token
Image shape 3×48×480 (height=48, width=480)
Max text length 25 characters
Training data 18,000 synthetic images (mini‑test)
Evaluation accuracy 57% (on held‑out validation set)
Normalised edit distance 0.96

The character set includes both basic Latin letters and Kabyle‑specific characters:
č, ḍ, ɛ, ǧ, ɣ, ḥ, ṛ, ṣ, ṭ, ẓ (and their uppercase variants).

Files in this repository

  • best_accuracy.pdparams – Trained model weights (PaddlePaddle format)
  • kab_dict.txt – Character dictionary (one character per line)
  • config.yml – Full training configuration (including image shape, transforms, etc.)
  • inference.yml – Inference settings (optional, used by some scripts)

How to Use the Model

This is a test. Do not use it in production environnement.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support