--- language: ar license: apache-2.0 tags: - arabic - saudi-arabic - eou - end-of-utterance - conversational-ai - livekit - turn-detection datasets: - HossamEL-Dein/arabic-eou-dataset base_model: aubmindlab/bert-base-arabertv02 --- # Arabic End-of-Utterance Detection Model ## Model Description This model detects End-of-Utterance (EOU) in Arabic conversations, specifically optimized for Saudi dialects. It predicts the probability that a speaker has finished their conversational turn based on text transcription. **Use Case**: Real-time conversational AI agents (voice assistants, chatbots, customer service) ## Performance | Metric | Score | |--------|-------| | **Test Accuracy** | 99.6% | | **Precision** | 100% | | **Recall** | 99.45% | | **F1 Score** | 99.73% | | **AUC-ROC** | 99.96% | | **Inference Time** | ~15-20ms | ## Training Data - **Total samples**: 5,000 - **SADA22 (Real Saudi audio)**: 104 samples (2.1%) - **Synthetic (Saudi patterns)**: 4,896 samples (97.9%) - **Splits**: 80% train / 10% validation / 10% test ## Quick Start ### Installation ```bash pip install transformers torch ``` ### Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model model = AutoModelForSequenceClassification.from_pretrained("HossamEL-Dein/arabic-eou-model") tokenizer = AutoTokenizer.from_pretrained("HossamEL-Dein/arabic-eou-model") model.eval() # Predict EOU text = "مرحبا كيف حالك اليوم" inputs = tokenizer(text, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) probs = torch.softmax(outputs.logits, dim=-1) eou_probability = probs[0][1].item() print(f"EOU Probability: {eou_probability:.2%}") # Output: EOU Probability: 98.56% ``` ### Integration with LiveKit ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch class EOUDetector: def __init__(self, threshold=0.7): self.model = AutoModelForSequenceClassification.from_pretrained("HossamEL-Dein/arabic-eou-model") self.tokenizer = AutoTokenizer.from_pretrained("HossamEL-Dein/arabic-eou-model") self.model.eval() self.threshold = threshold def check_eou(self, transcript_text): inputs = self.tokenizer(transcript_text, return_tensors="pt") with torch.no_grad(): outputs = self.model(**inputs) probs = torch.softmax(outputs.logits, dim=-1) eou_prob = probs[0][1].item() return { 'probability': eou_prob, 'is_eou': eou_prob > self.threshold } # Use in LiveKit agent detector = EOUDetector() result = detector.check_eou("مرحبا كيف حالك") if result['is_eou']: print("User finished speaking!") ``` ## Model Architecture - **Base Model**: aubmindlab/bert-base-arabertv02 - **Task**: Binary sequence classification - **Input**: Arabic text (up to 128 tokens) - **Output**: 2-class probability distribution [Non-EOU, EOU] - **Parameters**: 136M ## Training Details - **Framework**: PyTorch + Transformers - **Epochs**: 3 - **Batch Size**: 16 - **Learning Rate**: 2e-5 - **Optimizer**: AdamW - **Training Time**: ~3 hours on T4 GPU ## Intended Use ### Primary Use Cases - ✅ Real-time voice assistants - ✅ Arabic conversational AI - ✅ Turn-taking detection in dialogues - ✅ LiveKit agent integration ### Limitations - Trained primarily on Saudi dialect patterns - Requires text input (not raw audio) - Best for conversational context (5-10 seconds) - May need threshold tuning for specific use cases ## Dataset Training dataset available at: [HossamEL-Dein/arabic-eou-dataset](https://huggingface.co/datasets/HossamEL-Dein/arabic-eou-dataset) ## Citation ```bibtex @misc{arabic-eou-2024, author = {HossamEL-Dein}, title = {Arabic End-of-Utterance Detection Model}, year = {2024}, publisher = {HuggingFace}, url = {https://huggingface.co/HossamEL-Dein/arabic-eou-model} } ``` ## License Apache 2.0 ## Contact For questions or issues, please open an issue on the model repository.