LID-Neural-5

LID-Neural-5 is a state-of-the-art transformer sequence classifier fine-tuned for high-accuracy language identification (LID) across 5 major languages spoken in Nigeria: Yoruba (yor), Igbo (ibo), Hausa (hau), Nigerian Pidgin (pcm), and English (eng).

It is fine-tuned on top of castorini/afriberta_large, a multilingual XLM-RoBERTa transformer pre-trained specifically on African languages, ensuring exceptional subword tokenization and deep semantic contextualization.

Model Details

Property Value
Base Model castorini/afriberta_large (XLM-RoBERTa, 125M parameters)
Model Type Transformer Sequence Classification
Model Size 484.03 MB
Supported Languages Yoruba, Hausa, Igbo, Nigerian Pidgin, English
Testing Accuracy 98.96% (Macro validation)
Average Latency 13.30 ms per sentence (CPU/GPU)
Dependencies PyTorch, Transformers

Accuracy Details

Language Precision Recall F1-Score
Yoruba (yor) 99.60% 99.60% 99.60%
Hausa (hau) 99.60% 99.20% 99.40%
Igbo (ibo) 98.79% 98.20% 98.50%
Nigerian Pidgin (pcm) 99.20% 98.80% 99.00%
English (eng) 97.63% 99.00% 98.31%

Usage

The model is integrated directly into the olaverse Python library.

Installation

pip install olaverse[deeplearning]
# installs: torch, transformers

Via the olaverse library (recommended)

from olaverse import LIDNeural5

# Automatically downloads and loads the model on demand
detector = LIDNeural5()
detector.load()

# 1. Predict dominant language
lang = detector.predict("Kedu ka ị mere today?")
print(f"Predicted language: {lang}")  # → 'ibo'

# 2. Get probability distributions
probs = detector.predict_proba("How far, wetin dey happen?")
print(probs)
# → {'eng': 0.002, 'hau': 0.001, 'ibo': 0.003, 'pcm': 0.991, 'yor': 0.003}

Via Hugging Face pipeline (direct)

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("olaverse/lid-neural-5")
model = AutoModelForSequenceClassification.from_pretrained("olaverse/lid-neural-5")

lid = pipeline("text-classification", model=model, tokenizer=tokenizer)
result = lid("Bawo ni, se daadaa ni?")
print(result)  # → [{'label': 'yor', 'score': 0.9987}]

Links

Downloads last month
69
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using olaverse/lid-neural-5 1

Collection including olaverse/lid-neural-5