--- language: - ar license: apache-2.0 tags: - arabic - end-of-utterance - eou - turn-detection - conversational-ai - livekit - bert - arabert datasets: - arabic-eou-detection-10k metrics: - accuracy - f1 - precision - recall model-index: - name: Arabic End-of-Utterance Detector results: - task: type: text-classification name: End-of-Utterance Detection dataset: name: Arabic EOU Detection type: arabic-eou-detection-10k metrics: - type: accuracy value: 0.90 name: Accuracy - type: f1 value: 0.92 name: F1 Score (EOU) - type: precision value: 0.90 name: Precision (EOU) - type: recall value: 0.93 name: Recall (EOU) --- # Arabic End-of-Utterance (EOU) Detector **Detect when a speaker has finished their utterance in Arabic conversations.** This model is fine-tuned from [AraBERT v2](https://huggingface.co/aubmindlab/bert-base-arabertv2) for binary classification of Arabic text to determine if an utterance is complete (EOU) or incomplete (No EOU). ## Model Description - **Model Type**: BERT-based binary classifier - **Base Model**: [aubmindlab/bert-base-arabertv2](https://huggingface.co/aubmindlab/bert-base-arabertv2) - **Language**: Arabic (ar) - **Task**: End-of-Utterance Detection - **License**: Apache 2.0 ## Performance | Metric | Value | |--------|-------| | **Accuracy** | 90% | | **Precision (EOU)** | 0.90 | | **Recall (EOU)** | 0.93 | | **F1-Score (EOU)** | 0.92 | | **Test Samples** | 1,001 | ### Confusion Matrix ``` Predicted No EOU EOU Actual No 333 62 (84.3% correct) EOU 42 564 (93.1% correct) ``` ## Available Formats This repository includes three model formats: 1. **PyTorch** (`pytorch_model.bin` or `model.safetensors`) - For training and fine-tuning 2. **ONNX** (`model.onnx`) - For optimized CPU/GPU inference (~2-3x faster) 3. **Quantized ONNX** (`model_quantized.onnx`) - For production (75% smaller, 2-3x faster) ## Quick Start ### Installation ```bash pip install transformers torch onnxruntime ``` ### PyTorch Inference ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer model_name = "your-username/arabic-eou-detector" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Inference def predict_eou(text: str): inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True) with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits probs = torch.softmax(logits, dim=-1) is_eou = torch.argmax(probs, dim=-1).item() == 1 confidence = probs[0, 1].item() return is_eou, confidence # Test text = "مرحبا كيف حالك" is_eou, conf = predict_eou(text) print(f"Is EOU: {is_eou}, Confidence: {conf:.4f}") ``` ### ONNX Inference (Recommended for Production) ```python import onnxruntime as ort import numpy as np from transformers import AutoTokenizer # Load model and tokenizer model_name = "your-username/arabic-eou-detector" tokenizer = AutoTokenizer.from_pretrained(model_name) # Load ONNX model (use model_quantized.onnx for best performance) session = ort.InferenceSession( "model_quantized.onnx", # or "model.onnx" providers=['CPUExecutionProvider'] ) # Inference def predict_eou(text: str): inputs = tokenizer( text, padding="max_length", max_length=512, truncation=True, return_tensors="np" ) outputs = session.run( None, { 'input_ids': inputs['input_ids'].astype(np.int64), 'attention_mask': inputs['attention_mask'].astype(np.int64) } ) logits = outputs[0] probs = np.exp(logits) / np.sum(np.exp(logits), axis=-1, keepdims=True) is_eou = np.argmax(probs, axis=-1)[0] == 1 confidence = float(probs[0, 1]) return is_eou, confidence # Test text = "مرحبا كيف حالك" is_eou, conf = predict_eou(text) print(f"Is EOU: {is_eou}, Confidence: {conf:.4f}") ``` ## Use Cases - **Voice Assistants**: Detect when user has finished speaking - **Conversational AI**: Improve turn-taking in Arabic chatbots - **LiveKit Agents**: Custom turn detection for Arabic conversations - **Speech Recognition**: Post-processing for better utterance segmentation ## Integration with LiveKit ```python from livekit.plugins.arabic_turn_detector import ArabicTurnDetector # Download model from HuggingFace from huggingface_hub import hf_hub_download model_path = hf_hub_download( repo_id="your-username/arabic-eou-detector", filename="model_quantized.onnx" ) # Create turn detector turn_detector = ArabicTurnDetector( model_path=model_path, unlikely_threshold=0.7 ) # Use in agent session = AgentSession( turn_detector=turn_detector, # ... other config ) ``` ## Training Details ### Training Data - **Dataset**: Arabic EOU Detection (10,072 samples) - **Train/Val/Test Split**: 80/10/10 - **Classes**: - `0`: Incomplete utterance (No EOU) - `1`: Complete utterance (EOU) ### Training Hyperparameters - **Base Model**: aubmindlab/bert-base-arabertv2 - **Learning Rate**: 2e-5 - **Batch Size**: 32 - **Epochs**: 10 - **Optimizer**: AdamW - **Weight Decay**: 0.01 - **Max Sequence Length**: 512 ### Preprocessing - AraBERT normalization (diacritics removal, character normalization) - Tokenization with AraBERT tokenizer - Padding to max length (512 tokens) ## Limitations - **Language**: Optimized for Modern Standard Arabic (MSA) - **Domain**: Trained on conversational Arabic text - **Sequence Length**: Maximum 512 tokens - **Dialects**: May have reduced accuracy on dialectal Arabic ## Citation If you use this model, please cite: ```bibtex @misc{arabic-eou-detector, author = {Your Name}, title = {Arabic End-of-Utterance Detector}, year = {2025}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/your-username/arabic-eou-detector}} } ``` ## License Apache 2.0 ## Acknowledgments - **AraBERT**: [aubmindlab/bert-base-arabertv2](https://huggingface.co/aubmindlab/bert-base-arabertv2) - **HuggingFace Transformers**: Model training and inference - **ONNX Runtime**: Model optimization and deployment ## Contact For issues or questions, please open an issue on the [GitHub repository](https://github.com/Ahmed-Ezzat20/hams_task).