Kabyle OCR Model – PaddleOCR Checkpoint
This is a text recognition model for the Kabyle language (written in Latin script), trained using PaddleOCR (PP‑OCRv3 architecture). The model was trained on synthetic text generated from Kabyle news corpora.
Model Details
| Property | Value |
|---|---|
| Architecture | PP‑OCRv3 (CRNN) |
| Character set size | 109 Kabyle characters + 1 blank token |
| Image shape | 3×48×480 (height=48, width=480) |
| Max text length | 25 characters |
| Training data | 18,000 synthetic images (mini‑test) |
| Evaluation accuracy | 57% (on held‑out validation set) |
| Normalised edit distance | 0.96 |
The character set includes both basic Latin letters and Kabyle‑specific characters:č, ḍ, ɛ, ǧ, ɣ, ḥ, ṛ, ṣ, ṭ, ẓ (and their uppercase variants).
Files in this repository
best_accuracy.pdparams– Trained model weights (PaddlePaddle format)kab_dict.txt– Character dictionary (one character per line)config.yml– Full training configuration (including image shape, transforms, etc.)inference.yml– Inference settings (optional, used by some scripts)
How to Use the Model
This is a test. Do not use it in production environnement.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support