arabic-eou-detector / README.md
MrEzzat's picture
Upload Arabic EOU detection model
e5caf1c verified
---
language:
- ar
license: apache-2.0
tags:
- arabic
- end-of-utterance
- eou
- turn-detection
- conversational-ai
- livekit
- bert
- arabert
datasets:
- arabic-eou-detection-10k
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: Arabic End-of-Utterance Detector
results:
- task:
type: text-classification
name: End-of-Utterance Detection
dataset:
name: Arabic EOU Detection
type: arabic-eou-detection-10k
metrics:
- type: accuracy
value: 0.90
name: Accuracy
- type: f1
value: 0.92
name: F1 Score (EOU)
- type: precision
value: 0.90
name: Precision (EOU)
- type: recall
value: 0.93
name: Recall (EOU)
---
# Arabic End-of-Utterance (EOU) Detector
**Detect when a speaker has finished their utterance in Arabic conversations.**
This model is fine-tuned from [AraBERT v2](https://huggingface.co/aubmindlab/bert-base-arabertv2) for binary classification of Arabic text to determine if an utterance is complete (EOU) or incomplete (No EOU).
## Model Description
- **Model Type**: BERT-based binary classifier
- **Base Model**: [aubmindlab/bert-base-arabertv2](https://huggingface.co/aubmindlab/bert-base-arabertv2)
- **Language**: Arabic (ar)
- **Task**: End-of-Utterance Detection
- **License**: Apache 2.0
## Performance
| Metric | Value |
|--------|-------|
| **Accuracy** | 90% |
| **Precision (EOU)** | 0.90 |
| **Recall (EOU)** | 0.93 |
| **F1-Score (EOU)** | 0.92 |
| **Test Samples** | 1,001 |
### Confusion Matrix
```
Predicted
No EOU EOU
Actual No 333 62 (84.3% correct)
EOU 42 564 (93.1% correct)
```
## Available Formats
This repository includes three model formats:
1. **PyTorch** (`pytorch_model.bin` or `model.safetensors`) - For training and fine-tuning
2. **ONNX** (`model.onnx`) - For optimized CPU/GPU inference (~2-3x faster)
3. **Quantized ONNX** (`model_quantized.onnx`) - For production (75% smaller, 2-3x faster)
## Quick Start
### Installation
```bash
pip install transformers torch onnxruntime
```
### PyTorch Inference
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "your-username/arabic-eou-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Inference
def predict_eou(text: str):
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.softmax(logits, dim=-1)
is_eou = torch.argmax(probs, dim=-1).item() == 1
confidence = probs[0, 1].item()
return is_eou, confidence
# Test
text = "مرحبا كيف حالك"
is_eou, conf = predict_eou(text)
print(f"Is EOU: {is_eou}, Confidence: {conf:.4f}")
```
### ONNX Inference (Recommended for Production)
```python
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
# Load model and tokenizer
model_name = "your-username/arabic-eou-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Load ONNX model (use model_quantized.onnx for best performance)
session = ort.InferenceSession(
"model_quantized.onnx", # or "model.onnx"
providers=['CPUExecutionProvider']
)
# Inference
def predict_eou(text: str):
inputs = tokenizer(
text,
padding="max_length",
max_length=512,
truncation=True,
return_tensors="np"
)
outputs = session.run(
None,
{
'input_ids': inputs['input_ids'].astype(np.int64),
'attention_mask': inputs['attention_mask'].astype(np.int64)
}
)
logits = outputs[0]
probs = np.exp(logits) / np.sum(np.exp(logits), axis=-1, keepdims=True)
is_eou = np.argmax(probs, axis=-1)[0] == 1
confidence = float(probs[0, 1])
return is_eou, confidence
# Test
text = "مرحبا كيف حالك"
is_eou, conf = predict_eou(text)
print(f"Is EOU: {is_eou}, Confidence: {conf:.4f}")
```
## Use Cases
- **Voice Assistants**: Detect when user has finished speaking
- **Conversational AI**: Improve turn-taking in Arabic chatbots
- **LiveKit Agents**: Custom turn detection for Arabic conversations
- **Speech Recognition**: Post-processing for better utterance segmentation
## Integration with LiveKit
```python
from livekit.plugins.arabic_turn_detector import ArabicTurnDetector
# Download model from HuggingFace
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="your-username/arabic-eou-detector",
filename="model_quantized.onnx"
)
# Create turn detector
turn_detector = ArabicTurnDetector(
model_path=model_path,
unlikely_threshold=0.7
)
# Use in agent
session = AgentSession(
turn_detector=turn_detector,
# ... other config
)
```
## Training Details
### Training Data
- **Dataset**: Arabic EOU Detection (10,072 samples)
- **Train/Val/Test Split**: 80/10/10
- **Classes**:
- `0`: Incomplete utterance (No EOU)
- `1`: Complete utterance (EOU)
### Training Hyperparameters
- **Base Model**: aubmindlab/bert-base-arabertv2
- **Learning Rate**: 2e-5
- **Batch Size**: 32
- **Epochs**: 10
- **Optimizer**: AdamW
- **Weight Decay**: 0.01
- **Max Sequence Length**: 512
### Preprocessing
- AraBERT normalization (diacritics removal, character normalization)
- Tokenization with AraBERT tokenizer
- Padding to max length (512 tokens)
## Limitations
- **Language**: Optimized for Modern Standard Arabic (MSA)
- **Domain**: Trained on conversational Arabic text
- **Sequence Length**: Maximum 512 tokens
- **Dialects**: May have reduced accuracy on dialectal Arabic
## Citation
If you use this model, please cite:
```bibtex
@misc{arabic-eou-detector,
author = {Your Name},
title = {Arabic End-of-Utterance Detector},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/your-username/arabic-eou-detector}}
}
```
## License
Apache 2.0
## Acknowledgments
- **AraBERT**: [aubmindlab/bert-base-arabertv2](https://huggingface.co/aubmindlab/bert-base-arabertv2)
- **HuggingFace Transformers**: Model training and inference
- **ONNX Runtime**: Model optimization and deployment
## Contact
For issues or questions, please open an issue on the [GitHub repository](https://github.com/Ahmed-Ezzat20/hams_task).