---
language:
- ar
license: apache-2.0
library_name: transformers
tags:
- arabic
- end-of-utterance
- eou
- turn-detection
- voice-agent
- saudi-dialect
- conversational-ai
- livekit
datasets:
- Amr-h/EOU_Arabic_Saudi
base_model: UBC-NLP/MARBERT
pipeline_tag: text-classification
metrics:
- accuracy
- f1
model-index:
- name: arabic-eou-marbert
  results:
  - task:
      type: text-classification
      name: End of Utterance Detection
    metrics:
    - type: accuracy
      value: 0.918
      name: OOD Accuracy
    - type: f1
      value: 0.9176
      name: Weighted F1
---

# Arabic End-of-Utterance (EOU) Detection Model

This model detects whether a speaker has finished their turn in Arabic conversations, with emphasis on **Saudi dialect**. It's designed for real-time voice agent applications like LiveKit.

## Model Description

- **Base Model:** [UBC-NLP/MARBERT](https://huggingface.co/UBC-NLP/MARBERT)
- **Language:** Arabic (with focus on Saudi/Gulf dialect)
- **Task:** Binary classification (Complete vs Incomplete utterance)
- **Use Case:** Real-time turn detection for voice agents

## Labels

| Label | ID | Description |
|-------|-----|-------------|
| INCOMPLETE | 0 | Speaker has not finished their turn |
| COMPLETE | 1 | Speaker has finished their turn |

## Performance

### Out-of-Distribution Test Results (200 samples)

| Metric | Complete (1) | Incomplete (0) |
|--------|--------------|----------------|
| Precision | 100.00% | 85.94% |
| Recall | 83.64% | 100.00% |
| F1-Score | 91.09% | 92.44% |

**Overall Weighted F1: 91.76%**

### Key Characteristics
- ✅ **Zero false interruptions** - Model never incorrectly predicts "Complete" for incomplete utterances
- ✅ **Conservative behavior** - Ideal for voice agents (better to wait than interrupt)
- ✅ **Fast inference** - Suitable for real-time applications

## Usage

### Quick Start

```python
from transformers import pipeline

# Load the model
eou_detector = pipeline(
    "text-classification",
    model="Amr-h/arabic-eou-marbert",
    device=0  # Use GPU, or -1 for CPU
)

# Detect end of utterance
text = "هل بلغوك انهم بيحتاجون ساعات اضافيه؟"
result = eou_detector(text)[0]

print(f"Label: {result['label']}")  # COMPLETE or INCOMPLETE
print(f"Confidence: {result['score']:.2%}")
```

### With Model and Tokenizer

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("Amr-h/arabic-eou-marbert")
tokenizer = AutoTokenizer.from_pretrained("Amr-h/arabic-eou-marbert")

# Inference
text = "انتظر خلني اشوف وين حطيت ال"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=64)

with torch.no_grad():
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=-1).item()

label = "COMPLETE" if prediction == 1 else "INCOMPLETE"
print(f"Prediction: {label}")
```

### For LiveKit Integration

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

class ArabicEOUDetector:
    def __init__(self, model_name="Amr-h/arabic-eou-marbert"):
        self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model.eval()
        
    def predict(self, text: str) -> tuple[bool, float]:
        """
        Returns (is_complete, confidence)
        """
        inputs = self.tokenizer(
            text, 
            return_tensors="pt", 
            truncation=True, 
            max_length=64
        )
        
        with torch.no_grad():
            outputs = self.model(**inputs)
            probs = torch.softmax(outputs.logits, dim=-1)
            prediction = torch.argmax(probs, dim=-1).item()
            confidence = probs[0][prediction].item()
        
        is_complete = prediction == 1
        return is_complete, confidence
```

## Training Details

- **Training Data:** ~12,000 Saudi dialect Arabic utterances
- **Validation Data:** ~1,500 samples
- **Test Data:** ~1,500 samples
- **Epochs:** 3
- **Learning Rate:** 2e-5
- **Batch Size:** 32
- **Max Length:** 64 tokens

## Intended Use

- Real-time voice agents and conversational AI
- Turn-taking detection in Arabic dialogue systems
- LiveKit agent integration
- Customer service voice bots

## Limitations

- Optimized for Saudi/Gulf Arabic dialect
- May require fine-tuning for other Arabic dialects
- Designed for spoken/conversational text, not formal written Arabic

## Citation

If you use this model, please cite:

```bibtex
@misc{arabic-eou-marbert,
  author = {YOUR_NAME},
  title = {Arabic End-of-Utterance Detection Model},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Amr-h/arabic-eou-marbert}
}
```

## License

Apache 2.0