--- language: - ar license: apache-2.0 library_name: transformers tags: - arabic - end-of-utterance - eou - turn-detection - voice-agent - saudi-dialect - conversational-ai - livekit datasets: - Amr-h/EOU_Arabic_Saudi base_model: UBC-NLP/MARBERT pipeline_tag: text-classification metrics: - accuracy - f1 model-index: - name: arabic-eou-marbert results: - task: type: text-classification name: End of Utterance Detection metrics: - type: accuracy value: 0.918 name: OOD Accuracy - type: f1 value: 0.9176 name: Weighted F1 --- # Arabic End-of-Utterance (EOU) Detection Model This model detects whether a speaker has finished their turn in Arabic conversations, with emphasis on **Saudi dialect**. It's designed for real-time voice agent applications like LiveKit. ## Model Description - **Base Model:** [UBC-NLP/MARBERT](https://huggingface.co/UBC-NLP/MARBERT) - **Language:** Arabic (with focus on Saudi/Gulf dialect) - **Task:** Binary classification (Complete vs Incomplete utterance) - **Use Case:** Real-time turn detection for voice agents ## Labels | Label | ID | Description | |-------|-----|-------------| | INCOMPLETE | 0 | Speaker has not finished their turn | | COMPLETE | 1 | Speaker has finished their turn | ## Performance ### Out-of-Distribution Test Results (200 samples) | Metric | Complete (1) | Incomplete (0) | |--------|--------------|----------------| | Precision | 100.00% | 85.94% | | Recall | 83.64% | 100.00% | | F1-Score | 91.09% | 92.44% | **Overall Weighted F1: 91.76%** ### Key Characteristics - ✅ **Zero false interruptions** - Model never incorrectly predicts "Complete" for incomplete utterances - ✅ **Conservative behavior** - Ideal for voice agents (better to wait than interrupt) - ✅ **Fast inference** - Suitable for real-time applications ## Usage ### Quick Start ```python from transformers import pipeline # Load the model eou_detector = pipeline( "text-classification", model="Amr-h/arabic-eou-marbert", device=0 # Use GPU, or -1 for CPU ) # Detect end of utterance text = "هل بلغوك انهم بيحتاجون ساعات اضافيه؟" result = eou_detector(text)[0] print(f"Label: {result['label']}") # COMPLETE or INCOMPLETE print(f"Confidence: {result['score']:.2%}") ``` ### With Model and Tokenizer ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch # Load model and tokenizer model = AutoModelForSequenceClassification.from_pretrained("Amr-h/arabic-eou-marbert") tokenizer = AutoTokenizer.from_pretrained("Amr-h/arabic-eou-marbert") # Inference text = "انتظر خلني اشوف وين حطيت ال" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=64) with torch.no_grad(): outputs = model(**inputs) prediction = torch.argmax(outputs.logits, dim=-1).item() label = "COMPLETE" if prediction == 1 else "INCOMPLETE" print(f"Prediction: {label}") ``` ### For LiveKit Integration ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch class ArabicEOUDetector: def __init__(self, model_name="Amr-h/arabic-eou-marbert"): self.model = AutoModelForSequenceClassification.from_pretrained(model_name) self.tokenizer = AutoTokenizer.from_pretrained(model_name) self.model.eval() def predict(self, text: str) -> tuple[bool, float]: """ Returns (is_complete, confidence) """ inputs = self.tokenizer( text, return_tensors="pt", truncation=True, max_length=64 ) with torch.no_grad(): outputs = self.model(**inputs) probs = torch.softmax(outputs.logits, dim=-1) prediction = torch.argmax(probs, dim=-1).item() confidence = probs[0][prediction].item() is_complete = prediction == 1 return is_complete, confidence ``` ## Training Details - **Training Data:** ~12,000 Saudi dialect Arabic utterances - **Validation Data:** ~1,500 samples - **Test Data:** ~1,500 samples - **Epochs:** 3 - **Learning Rate:** 2e-5 - **Batch Size:** 32 - **Max Length:** 64 tokens ## Intended Use - Real-time voice agents and conversational AI - Turn-taking detection in Arabic dialogue systems - LiveKit agent integration - Customer service voice bots ## Limitations - Optimized for Saudi/Gulf Arabic dialect - May require fine-tuning for other Arabic dialects - Designed for spoken/conversational text, not formal written Arabic ## Citation If you use this model, please cite: ```bibtex @misc{arabic-eou-marbert, author = {YOUR_NAME}, title = {Arabic End-of-Utterance Detection Model}, year = {2024}, publisher = {Hugging Face}, url = {https://huggingface.co/Amr-h/arabic-eou-marbert} } ``` ## License Apache 2.0