arabic-eou-model / README.md
HossamEL-Dein's picture
Update README.md
b540f4b verified
metadata
language: ar
license: apache-2.0
tags:
  - arabic
  - saudi-arabic
  - eou
  - end-of-utterance
  - conversational-ai
  - livekit
  - turn-detection
datasets:
  - HossamEL-Dein/arabic-eou-dataset
base_model: aubmindlab/bert-base-arabertv02

Arabic End-of-Utterance Detection Model

Model Description

This model detects End-of-Utterance (EOU) in Arabic conversations, specifically optimized for Saudi dialects. It predicts the probability that a speaker has finished their conversational turn based on text transcription.

Use Case: Real-time conversational AI agents (voice assistants, chatbots, customer service)

Performance

Metric Score
Test Accuracy 99.6%
Precision 100%
Recall 99.45%
F1 Score 99.73%
AUC-ROC 99.96%
Inference Time ~15-20ms

Training Data

  • Total samples: 5,000
  • SADA22 (Real Saudi audio): 104 samples (2.1%)
  • Synthetic (Saudi patterns): 4,896 samples (97.9%)
  • Splits: 80% train / 10% validation / 10% test

Quick Start

Installation

pip install transformers torch

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model
model = AutoModelForSequenceClassification.from_pretrained("HossamEL-Dein/arabic-eou-model")
tokenizer = AutoTokenizer.from_pretrained("HossamEL-Dein/arabic-eou-model")
model.eval()

# Predict EOU
text = "مرحبا كيف حالك اليوم"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    eou_probability = probs[0][1].item()

print(f"EOU Probability: {eou_probability:.2%}")
# Output: EOU Probability: 98.56%

Integration with LiveKit

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

class EOUDetector:
    def __init__(self, threshold=0.7):
        self.model = AutoModelForSequenceClassification.from_pretrained("HossamEL-Dein/arabic-eou-model")
        self.tokenizer = AutoTokenizer.from_pretrained("HossamEL-Dein/arabic-eou-model")
        self.model.eval()
        self.threshold = threshold
    
    def check_eou(self, transcript_text):
        inputs = self.tokenizer(transcript_text, return_tensors="pt")
        with torch.no_grad():
            outputs = self.model(**inputs)
            probs = torch.softmax(outputs.logits, dim=-1)
            eou_prob = probs[0][1].item()
        
        return {
            'probability': eou_prob,
            'is_eou': eou_prob > self.threshold
        }

# Use in LiveKit agent
detector = EOUDetector()
result = detector.check_eou("مرحبا كيف حالك")
if result['is_eou']:
    print("User finished speaking!")

Model Architecture

  • Base Model: aubmindlab/bert-base-arabertv02
  • Task: Binary sequence classification
  • Input: Arabic text (up to 128 tokens)
  • Output: 2-class probability distribution [Non-EOU, EOU]
  • Parameters: 136M

Training Details

  • Framework: PyTorch + Transformers
  • Epochs: 3
  • Batch Size: 16
  • Learning Rate: 2e-5
  • Optimizer: AdamW
  • Training Time: ~3 hours on T4 GPU

Intended Use

Primary Use Cases

  • ✅ Real-time voice assistants
  • ✅ Arabic conversational AI
  • ✅ Turn-taking detection in dialogues
  • ✅ LiveKit agent integration

Limitations

  • Trained primarily on Saudi dialect patterns
  • Requires text input (not raw audio)
  • Best for conversational context (5-10 seconds)
  • May need threshold tuning for specific use cases

Dataset

Training dataset available at: HossamEL-Dein/arabic-eou-dataset

Citation

@misc{arabic-eou-2024,
  author = {HossamEL-Dein},
  title = {Arabic End-of-Utterance Detection Model},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/HossamEL-Dein/arabic-eou-model}
}

License

Apache 2.0

Contact

For questions or issues, please open an issue on the model repository.