DiacNet
Collection
5 items • Updated
DiacNetYor is a character-level Bidirectional LSTM (BiLSTM) model designed for high-accuracy full tonal and dot-below diacritization of Yoruba (yo) text.
diacnet_yor.pt)yo)Loaded and used via the unified olaverse SDK wrapper (automatically downloads the weights and loads the PyTorch models in the background):
from olaverse.nlp.diacritizer import Diacritizer
diacritizer = Diacritizer(model="diacnet-yor")
text = "Ojo lo si oja lana"
print(diacritizer.restore(text))
# Output: "Ọjọ́ ló sí ọjà lànà"
During evaluation, the model integrates a candidate-constrained vocabulary post-processing step to map predicted character sequences to valid dictionary-based diacritization candidates, which significantly boosts word-level accuracy.
Loaded and used via the unified olaverse SDK wrapper (automatically downloads the weights and loads the Transformer model in the background):
from olaverse.nlp.diacritizer import Diacritizer
diacritizer = Diacritizer(model="diacnet-yor-x")
text = "Ojo lo si oja lana"
print(diacritizer.restore(text))
# Output: "Ọjọ́ ló sí ọjà lànà"
diacnet_yor.pt: The PyTorch model state dict and configurations.diacnet_yor_vocab.json: The character vocabulary maps and word candidate lists used for constrained decoding.