arabic-eou-model / README.md
HossamEL-Dein's picture
Update README.md
b540f4b verified
---
language: ar
license: apache-2.0
tags:
- arabic
- saudi-arabic
- eou
- end-of-utterance
- conversational-ai
- livekit
- turn-detection
datasets:
- HossamEL-Dein/arabic-eou-dataset
base_model: aubmindlab/bert-base-arabertv02
---
# Arabic End-of-Utterance Detection Model
## Model Description
This model detects End-of-Utterance (EOU) in Arabic conversations, specifically optimized for Saudi dialects. It predicts the probability that a speaker has finished their conversational turn based on text transcription.
**Use Case**: Real-time conversational AI agents (voice assistants, chatbots, customer service)
## Performance
| Metric | Score |
|--------|-------|
| **Test Accuracy** | 99.6% |
| **Precision** | 100% |
| **Recall** | 99.45% |
| **F1 Score** | 99.73% |
| **AUC-ROC** | 99.96% |
| **Inference Time** | ~15-20ms |
## Training Data
- **Total samples**: 5,000
- **SADA22 (Real Saudi audio)**: 104 samples (2.1%)
- **Synthetic (Saudi patterns)**: 4,896 samples (97.9%)
- **Splits**: 80% train / 10% validation / 10% test
## Quick Start
### Installation
```bash
pip install transformers torch
```
### Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model
model = AutoModelForSequenceClassification.from_pretrained("HossamEL-Dein/arabic-eou-model")
tokenizer = AutoTokenizer.from_pretrained("HossamEL-Dein/arabic-eou-model")
model.eval()
# Predict EOU
text = "ู…ุฑุญุจุง ูƒูŠู ุญุงู„ูƒ ุงู„ูŠูˆู…"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
eou_probability = probs[0][1].item()
print(f"EOU Probability: {eou_probability:.2%}")
# Output: EOU Probability: 98.56%
```
### Integration with LiveKit
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
class EOUDetector:
def __init__(self, threshold=0.7):
self.model = AutoModelForSequenceClassification.from_pretrained("HossamEL-Dein/arabic-eou-model")
self.tokenizer = AutoTokenizer.from_pretrained("HossamEL-Dein/arabic-eou-model")
self.model.eval()
self.threshold = threshold
def check_eou(self, transcript_text):
inputs = self.tokenizer(transcript_text, return_tensors="pt")
with torch.no_grad():
outputs = self.model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
eou_prob = probs[0][1].item()
return {
'probability': eou_prob,
'is_eou': eou_prob > self.threshold
}
# Use in LiveKit agent
detector = EOUDetector()
result = detector.check_eou("ู…ุฑุญุจุง ูƒูŠู ุญุงู„ูƒ")
if result['is_eou']:
print("User finished speaking!")
```
## Model Architecture
- **Base Model**: aubmindlab/bert-base-arabertv02
- **Task**: Binary sequence classification
- **Input**: Arabic text (up to 128 tokens)
- **Output**: 2-class probability distribution [Non-EOU, EOU]
- **Parameters**: 136M
## Training Details
- **Framework**: PyTorch + Transformers
- **Epochs**: 3
- **Batch Size**: 16
- **Learning Rate**: 2e-5
- **Optimizer**: AdamW
- **Training Time**: ~3 hours on T4 GPU
## Intended Use
### Primary Use Cases
- โœ… Real-time voice assistants
- โœ… Arabic conversational AI
- โœ… Turn-taking detection in dialogues
- โœ… LiveKit agent integration
### Limitations
- Trained primarily on Saudi dialect patterns
- Requires text input (not raw audio)
- Best for conversational context (5-10 seconds)
- May need threshold tuning for specific use cases
## Dataset
Training dataset available at: [HossamEL-Dein/arabic-eou-dataset](https://huggingface.co/datasets/HossamEL-Dein/arabic-eou-dataset)
## Citation
```bibtex
@misc{arabic-eou-2024,
author = {HossamEL-Dein},
title = {Arabic End-of-Utterance Detection Model},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/HossamEL-Dein/arabic-eou-model}
}
```
## License
Apache 2.0
## Contact
For questions or issues, please open an issue on the model repository.