|
|
--- |
|
|
language: |
|
|
- ar |
|
|
tags: |
|
|
- text-classification |
|
|
- eou |
|
|
- end-of-utterance |
|
|
- turn-detection |
|
|
- arabic |
|
|
- saudi-dialect |
|
|
- marbert |
|
|
base_model: UBC-NLP/MARBERT |
|
|
license: apache-2.0 |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
--- |
|
|
|
|
|
# MARBERT Arabic End-of-Utterance Detection |
|
|
|
|
|
Fine-tuned MARBERT model for Arabic End-of-Utterance (EOU) detection in real-time voice agents. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Base Model:** UBC-NLP/MARBERT (163M parameters) |
|
|
- **Task:** Binary sequence classification (complete vs incomplete utterance) |
|
|
- **Language:** Arabic (emphasis on Saudi/Gulf dialect) |
|
|
- **Training Data:** 125K samples from SADA22 dataset |
|
|
- **Inference Speed:** ~30ms average latency on CPU |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Metric | Score | |
|
|
|--------|-------| |
|
|
| **F1 Score** | 0.8174 | |
|
|
| **Accuracy** | 0.7995 | |
|
|
| **Precision** | 0.7506 | |
|
|
| **Recall** | 0.8971 | |
|
|
| **AUC-ROC** | 0.8249 | |
|
|
|
|
|
**Test Set:** 31,289 samples (50% complete, 50% incomplete) |
|
|
|
|
|
## Usage |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
model = AutoModelForSequenceClassification.from_pretrained("azeddinShr/marbert-arabic-eou") |
|
|
tokenizer = AutoTokenizer.from_pretrained("azeddinShr/marbert-arabic-eou") |
|
|
|
|
|
def predict_eou(text): |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
probs = torch.softmax(outputs.logits, dim=-1) |
|
|
eou_prob = probs[0][1].item() |
|
|
return eou_prob |
|
|
|
|
|
# Example |
|
|
text = "شكرا جزيلا على المساعدة" |
|
|
prob = predict_eou(text) |
|
|
is_complete = prob > 0.5 |
|
|
print(f"EOU Probability: {prob:.3f} - {'Complete' if is_complete else 'Incomplete'}") |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Epochs:** 6 |
|
|
- **Batch Size:** 16 (train), 32 (eval) |
|
|
- **Learning Rate:** 2e-5 |
|
|
- **Optimizer:** AdamW |
|
|
- **Max Length:** 128 tokens |
|
|
- **Training Time:** ~2 minutes (GPU) |
|
|
|
|
|
## Use Cases |
|
|
|
|
|
- Real-time Arabic voice agents |
|
|
- Turn-taking detection in conversations |
|
|
- Streaming speech-to-text applications |
|
|
- Voice assistant interrupt handling |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Best performance on Saudi/Gulf Arabic dialects |
|
|
- Requires Arabic text input (not audio) |
|
|
|
|
|
## Citation |
|
|
```bibtex |
|
|
@model{marbert-arabic-eou, |
|
|
author = {azeddinShr}, |
|
|
title = {MARBERT Arabic End-of-Utterance Detection}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
url = {https://huggingface.co/azeddinShr/marbert-arabic-eou} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Dataset |
|
|
|
|
|
Training dataset: [azeddinShr/arabic-eou-sada22](https://huggingface.co/datasets/azeddinShr/arabic-eou-sada22) |