File size: 2,516 Bytes
4136011
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
language:
- ar
tags:
- text-classification
- eou
- end-of-utterance
- turn-detection
- arabic
- saudi-dialect
- marbert
base_model: UBC-NLP/MARBERT
license: apache-2.0
metrics:
- accuracy
- f1
- precision
- recall
---

# MARBERT Arabic End-of-Utterance Detection

Fine-tuned MARBERT model for Arabic End-of-Utterance (EOU) detection in real-time voice agents.

## Model Description

- **Base Model:** UBC-NLP/MARBERT (163M parameters)
- **Task:** Binary sequence classification (complete vs incomplete utterance)
- **Language:** Arabic (emphasis on Saudi/Gulf dialect)
- **Training Data:** 125K samples from SADA22 dataset
- **Inference Speed:** ~30ms average latency on CPU

## Performance

| Metric | Score |
|--------|-------|
| **F1 Score** | 0.8174 |
| **Accuracy** | 0.7995 |
| **Precision** | 0.7506 |
| **Recall** | 0.8971 |
| **AUC-ROC** | 0.8249 |

**Test Set:** 31,289 samples (50% complete, 50% incomplete)

## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained("azeddinShr/marbert-arabic-eou")
tokenizer = AutoTokenizer.from_pretrained("azeddinShr/marbert-arabic-eou")

def predict_eou(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)
        eou_prob = probs[0][1].item()
    return eou_prob

# Example
text = "شكرا جزيلا على المساعدة"
prob = predict_eou(text)
is_complete = prob > 0.5
print(f"EOU Probability: {prob:.3f} - {'Complete' if is_complete else 'Incomplete'}")
```

## Training Details

- **Epochs:** 6
- **Batch Size:** 16 (train), 32 (eval)
- **Learning Rate:** 2e-5
- **Optimizer:** AdamW
- **Max Length:** 128 tokens
- **Training Time:** ~2 minutes (GPU)

## Use Cases

- Real-time Arabic voice agents
- Turn-taking detection in conversations
- Streaming speech-to-text applications
- Voice assistant interrupt handling

## Limitations

- Best performance on Saudi/Gulf Arabic dialects
- Requires Arabic text input (not audio)

## Citation
```bibtex
@model{marbert-arabic-eou,
  author = {azeddinShr},
  title = {MARBERT Arabic End-of-Utterance Detection},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/azeddinShr/marbert-arabic-eou}
}
```

## Dataset

Training dataset: [azeddinShr/arabic-eou-sada22](https://huggingface.co/datasets/azeddinShr/arabic-eou-sada22)