File size: 2,270 Bytes

ca9ff46
6eebab8
ca9ff46
6eebab8
 
ca9ff46
 
 
 
6eebab8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca9ff46
6eebab8
 
 
 
 
ca9ff46
6eebab8
 
 
ca9ff46
 
6eebab8
 
 
 
 
ca9ff46
 
 
6eebab8
 
ca9ff46
 
 
d2d9f53
ca9ff46
d2d9f53
ca9ff46
 
 
 
d2d9f53
ca9ff46
 
 
 
 
 
 
 
 
 
d2d9f53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6eebab8

# Arabic End-of-Turn (EOU) Detection Model — AraBERT Fine-Tuned

This model fine-tunes **AraBERT** for detecting **end-of-turn (EOU)** boundaries in Arabic dialogue.  
It predicts whether a given user message represents a **continuation** or an **end of turn**.

- **Repository:** `nihad-ask/Arabert-EOU-detection-model`  
- **Task:** Binary End-of-Utterance Classification  
- **Language:** Arabic (MSA + Dialects)  
- **Base Model:** `aubmindlab/bert-base-arabertv2`  

---

## 🚦 Task Definition

This is a **binary classification** task:

| Label | Meaning |
|-------|----------|
| **0** | Speaker will continue (NOT end of turn) |
| **1** | End of turn (EOU detected) |

---

## 📌 Use Cases

- Conversational AI / Chatbots  
- Dialogue Systems  
- Turn-taking prediction  
- Speech-to-text segmentation  
- Customer support automation  


---

## 📊 Evaluation

### **Balanced Validation Set**

**Accuracy:** `0.9539`

| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| **0 – Continue** | 0.9494 | 0.9589 | 0.9541 | 1702 |
| **1 – End of Turn** | 0.9585 | 0.9489 | 0.9536 | 1702 |

**Overall:**

| Metric | Score |
|--------|--------|
| Accuracy | 0.9539 |
| Macro Avg F1 | 0.9539 |
| Weighted Avg F1 | 0.9539 |
| Total Samples | 3404 |

---

### **Test Set**

**Accuracy:** `0.8919`

| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| **0 – Continue** | 0.7671 | 0.9445 | 0.8466 | 3097 |
| **1 – End of Turn** | 0.9713 | 0.8676 | 0.9165 | 6705 |

**Overall:**

| Metric | Score |
|--------|--------|
| Accuracy | 0.8919 |
| Macro Avg F1 | 0.8815 |
| Weighted Avg F1 | 0.8944 |
| Total Samples | 9802 |

---

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "nihad-ask/Arabert-EOU-detection-model"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "تمام و بعدين؟"

inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()

if prediction == 1:
    print("End of turn")
else:
    print("Speaker will continue")