|
|
--- |
|
|
language: |
|
|
- ar |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
tags: |
|
|
- arabic |
|
|
- end-of-utterance |
|
|
- eou |
|
|
- turn-detection |
|
|
- voice-agent |
|
|
- saudi-dialect |
|
|
- conversational-ai |
|
|
- livekit |
|
|
datasets: |
|
|
- Amr-h/EOU_Arabic_Saudi |
|
|
base_model: UBC-NLP/MARBERT |
|
|
pipeline_tag: text-classification |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
model-index: |
|
|
- name: arabic-eou-marbert |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: End of Utterance Detection |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 0.918 |
|
|
name: OOD Accuracy |
|
|
- type: f1 |
|
|
value: 0.9176 |
|
|
name: Weighted F1 |
|
|
--- |
|
|
|
|
|
# Arabic End-of-Utterance (EOU) Detection Model |
|
|
|
|
|
This model detects whether a speaker has finished their turn in Arabic conversations, with emphasis on **Saudi dialect**. It's designed for real-time voice agent applications like LiveKit. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Base Model:** [UBC-NLP/MARBERT](https://huggingface.co/UBC-NLP/MARBERT) |
|
|
- **Language:** Arabic (with focus on Saudi/Gulf dialect) |
|
|
- **Task:** Binary classification (Complete vs Incomplete utterance) |
|
|
- **Use Case:** Real-time turn detection for voice agents |
|
|
|
|
|
## Labels |
|
|
|
|
|
| Label | ID | Description | |
|
|
|-------|-----|-------------| |
|
|
| INCOMPLETE | 0 | Speaker has not finished their turn | |
|
|
| COMPLETE | 1 | Speaker has finished their turn | |
|
|
|
|
|
## Performance |
|
|
|
|
|
### Out-of-Distribution Test Results (200 samples) |
|
|
|
|
|
| Metric | Complete (1) | Incomplete (0) | |
|
|
|--------|--------------|----------------| |
|
|
| Precision | 100.00% | 85.94% | |
|
|
| Recall | 83.64% | 100.00% | |
|
|
| F1-Score | 91.09% | 92.44% | |
|
|
|
|
|
**Overall Weighted F1: 91.76%** |
|
|
|
|
|
### Key Characteristics |
|
|
- ✅ **Zero false interruptions** - Model never incorrectly predicts "Complete" for incomplete utterances |
|
|
- ✅ **Conservative behavior** - Ideal for voice agents (better to wait than interrupt) |
|
|
- ✅ **Fast inference** - Suitable for real-time applications |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Quick Start |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
# Load the model |
|
|
eou_detector = pipeline( |
|
|
"text-classification", |
|
|
model="Amr-h/arabic-eou-marbert", |
|
|
device=0 # Use GPU, or -1 for CPU |
|
|
) |
|
|
|
|
|
# Detect end of utterance |
|
|
text = "هل بلغوك انهم بيحتاجون ساعات اضافيه؟" |
|
|
result = eou_detector(text)[0] |
|
|
|
|
|
print(f"Label: {result['label']}") # COMPLETE or INCOMPLETE |
|
|
print(f"Confidence: {result['score']:.2%}") |
|
|
``` |
|
|
|
|
|
### With Model and Tokenizer |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model = AutoModelForSequenceClassification.from_pretrained("Amr-h/arabic-eou-marbert") |
|
|
tokenizer = AutoTokenizer.from_pretrained("Amr-h/arabic-eou-marbert") |
|
|
|
|
|
# Inference |
|
|
text = "انتظر خلني اشوف وين حطيت ال" |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=64) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
prediction = torch.argmax(outputs.logits, dim=-1).item() |
|
|
|
|
|
label = "COMPLETE" if prediction == 1 else "INCOMPLETE" |
|
|
print(f"Prediction: {label}") |
|
|
``` |
|
|
|
|
|
### For LiveKit Integration |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
class ArabicEOUDetector: |
|
|
def __init__(self, model_name="Amr-h/arabic-eou-marbert"): |
|
|
self.model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
self.tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
self.model.eval() |
|
|
|
|
|
def predict(self, text: str) -> tuple[bool, float]: |
|
|
""" |
|
|
Returns (is_complete, confidence) |
|
|
""" |
|
|
inputs = self.tokenizer( |
|
|
text, |
|
|
return_tensors="pt", |
|
|
truncation=True, |
|
|
max_length=64 |
|
|
) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = self.model(**inputs) |
|
|
probs = torch.softmax(outputs.logits, dim=-1) |
|
|
prediction = torch.argmax(probs, dim=-1).item() |
|
|
confidence = probs[0][prediction].item() |
|
|
|
|
|
is_complete = prediction == 1 |
|
|
return is_complete, confidence |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Training Data:** ~12,000 Saudi dialect Arabic utterances |
|
|
- **Validation Data:** ~1,500 samples |
|
|
- **Test Data:** ~1,500 samples |
|
|
- **Epochs:** 3 |
|
|
- **Learning Rate:** 2e-5 |
|
|
- **Batch Size:** 32 |
|
|
- **Max Length:** 64 tokens |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
- Real-time voice agents and conversational AI |
|
|
- Turn-taking detection in Arabic dialogue systems |
|
|
- LiveKit agent integration |
|
|
- Customer service voice bots |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Optimized for Saudi/Gulf Arabic dialect |
|
|
- May require fine-tuning for other Arabic dialects |
|
|
- Designed for spoken/conversational text, not formal written Arabic |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{arabic-eou-marbert, |
|
|
author = {YOUR_NAME}, |
|
|
title = {Arabic End-of-Utterance Detection Model}, |
|
|
year = {2024}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/Amr-h/arabic-eou-marbert} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|