arabic-eou-marbert / README.md
Amr-h's picture
Upload Arabic EOU detection model (MARBERT fine-tuned)
265095b verified
---
language:
- ar
license: apache-2.0
library_name: transformers
tags:
- arabic
- end-of-utterance
- eou
- turn-detection
- voice-agent
- saudi-dialect
- conversational-ai
- livekit
datasets:
- Amr-h/EOU_Arabic_Saudi
base_model: UBC-NLP/MARBERT
pipeline_tag: text-classification
metrics:
- accuracy
- f1
model-index:
- name: arabic-eou-marbert
results:
- task:
type: text-classification
name: End of Utterance Detection
metrics:
- type: accuracy
value: 0.918
name: OOD Accuracy
- type: f1
value: 0.9176
name: Weighted F1
---
# Arabic End-of-Utterance (EOU) Detection Model
This model detects whether a speaker has finished their turn in Arabic conversations, with emphasis on **Saudi dialect**. It's designed for real-time voice agent applications like LiveKit.
## Model Description
- **Base Model:** [UBC-NLP/MARBERT](https://huggingface.co/UBC-NLP/MARBERT)
- **Language:** Arabic (with focus on Saudi/Gulf dialect)
- **Task:** Binary classification (Complete vs Incomplete utterance)
- **Use Case:** Real-time turn detection for voice agents
## Labels
| Label | ID | Description |
|-------|-----|-------------|
| INCOMPLETE | 0 | Speaker has not finished their turn |
| COMPLETE | 1 | Speaker has finished their turn |
## Performance
### Out-of-Distribution Test Results (200 samples)
| Metric | Complete (1) | Incomplete (0) |
|--------|--------------|----------------|
| Precision | 100.00% | 85.94% |
| Recall | 83.64% | 100.00% |
| F1-Score | 91.09% | 92.44% |
**Overall Weighted F1: 91.76%**
### Key Characteristics
-**Zero false interruptions** - Model never incorrectly predicts "Complete" for incomplete utterances
-**Conservative behavior** - Ideal for voice agents (better to wait than interrupt)
-**Fast inference** - Suitable for real-time applications
## Usage
### Quick Start
```python
from transformers import pipeline
# Load the model
eou_detector = pipeline(
"text-classification",
model="Amr-h/arabic-eou-marbert",
device=0 # Use GPU, or -1 for CPU
)
# Detect end of utterance
text = "هل بلغوك انهم بيحتاجون ساعات اضافيه؟"
result = eou_detector(text)[0]
print(f"Label: {result['label']}") # COMPLETE or INCOMPLETE
print(f"Confidence: {result['score']:.2%}")
```
### With Model and Tokenizer
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("Amr-h/arabic-eou-marbert")
tokenizer = AutoTokenizer.from_pretrained("Amr-h/arabic-eou-marbert")
# Inference
text = "انتظر خلني اشوف وين حطيت ال"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=64)
with torch.no_grad():
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=-1).item()
label = "COMPLETE" if prediction == 1 else "INCOMPLETE"
print(f"Prediction: {label}")
```
### For LiveKit Integration
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
class ArabicEOUDetector:
def __init__(self, model_name="Amr-h/arabic-eou-marbert"):
self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model.eval()
def predict(self, text: str) -> tuple[bool, float]:
"""
Returns (is_complete, confidence)
"""
inputs = self.tokenizer(
text,
return_tensors="pt",
truncation=True,
max_length=64
)
with torch.no_grad():
outputs = self.model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
prediction = torch.argmax(probs, dim=-1).item()
confidence = probs[0][prediction].item()
is_complete = prediction == 1
return is_complete, confidence
```
## Training Details
- **Training Data:** ~12,000 Saudi dialect Arabic utterances
- **Validation Data:** ~1,500 samples
- **Test Data:** ~1,500 samples
- **Epochs:** 3
- **Learning Rate:** 2e-5
- **Batch Size:** 32
- **Max Length:** 64 tokens
## Intended Use
- Real-time voice agents and conversational AI
- Turn-taking detection in Arabic dialogue systems
- LiveKit agent integration
- Customer service voice bots
## Limitations
- Optimized for Saudi/Gulf Arabic dialect
- May require fine-tuning for other Arabic dialects
- Designed for spoken/conversational text, not formal written Arabic
## Citation
If you use this model, please cite:
```bibtex
@misc{arabic-eou-marbert,
author = {YOUR_NAME},
title = {Arabic End-of-Utterance Detection Model},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/Amr-h/arabic-eou-marbert}
}
```
## License
Apache 2.0