DiacNetYorDB

GitHub Repository

DiacNetYorDB is a lightweight dot-below diacritics restorer for Yoruba (yo) text. It restores only dot-below marks (ọ, ẹ, ṣ) using a character-level k-NN backoff classifier.

Model Details

  • Model Type: Syllable/Character-level k-NN with Context Backoff
  • File Size: 245 KB (yoruba_diacritizer_dot_below.json)
  • Supported Languages: Yoruba (yo)
  • Metrics:
    • Word Accuracy: 87.38%
  • Dependencies: None (pure Python / zero dependencies)

Usage

Loaded and used via the unified olaverse SDK wrapper:

from olaverse.nlp.diacritizer import Diacritizer

diacritizer = Diacritizer(model="diacnet-yor-db")
text = "Ojo lo si oja lana"
print(diacritizer.restore(text))
# Output: "Ọjọ lo si ọja lana"
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including olaverse/diacnet-yor-db