File size: 2,270 Bytes
ca9ff46 6eebab8 ca9ff46 6eebab8 ca9ff46 6eebab8 ca9ff46 6eebab8 ca9ff46 6eebab8 ca9ff46 6eebab8 ca9ff46 6eebab8 ca9ff46 d2d9f53 ca9ff46 d2d9f53 ca9ff46 d2d9f53 ca9ff46 d2d9f53 6eebab8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | # Arabic End-of-Turn (EOU) Detection Model โ AraBERT Fine-Tuned
This model fine-tunes **AraBERT** for detecting **end-of-turn (EOU)** boundaries in Arabic dialogue.
It predicts whether a given user message represents a **continuation** or an **end of turn**.
- **Repository:** `nihad-ask/Arabert-EOU-detection-model`
- **Task:** Binary End-of-Utterance Classification
- **Language:** Arabic (MSA + Dialects)
- **Base Model:** `aubmindlab/bert-base-arabertv2`
---
## ๐ฆ Task Definition
This is a **binary classification** task:
| Label | Meaning |
|-------|----------|
| **0** | Speaker will continue (NOT end of turn) |
| **1** | End of turn (EOU detected) |
---
## ๐ Use Cases
- Conversational AI / Chatbots
- Dialogue Systems
- Turn-taking prediction
- Speech-to-text segmentation
- Customer support automation
---
## ๐ Evaluation
### **Balanced Validation Set**
**Accuracy:** `0.9539`
| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| **0 โ Continue** | 0.9494 | 0.9589 | 0.9541 | 1702 |
| **1 โ End of Turn** | 0.9585 | 0.9489 | 0.9536 | 1702 |
**Overall:**
| Metric | Score |
|--------|--------|
| Accuracy | 0.9539 |
| Macro Avg F1 | 0.9539 |
| Weighted Avg F1 | 0.9539 |
| Total Samples | 3404 |
---
### **Test Set**
**Accuracy:** `0.8919`
| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| **0 โ Continue** | 0.7671 | 0.9445 | 0.8466 | 3097 |
| **1 โ End of Turn** | 0.9713 | 0.8676 | 0.9165 | 6705 |
**Overall:**
| Metric | Score |
|--------|--------|
| Accuracy | 0.8919 |
| Macro Avg F1 | 0.8815 |
| Weighted Avg F1 | 0.8944 |
| Total Samples | 9802 |
---
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "nihad-ask/Arabert-EOU-detection-model"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "ุชู
ุงู
ู ุจุนุฏููุ"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()
if prediction == 1:
print("End of turn")
else:
print("Speaker will continue")
|