File size: 2,270 Bytes
ca9ff46
6eebab8
ca9ff46
6eebab8
 
ca9ff46
 
 
 
6eebab8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca9ff46
6eebab8
 
 
 
 
ca9ff46
6eebab8
 
 
ca9ff46
 
6eebab8
 
 
 
 
ca9ff46
 
 
6eebab8
 
ca9ff46
 
 
d2d9f53
ca9ff46
d2d9f53
ca9ff46
 
 
 
d2d9f53
ca9ff46
 
 
 
 
 
 
 
 
 
d2d9f53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6eebab8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# Arabic End-of-Turn (EOU) Detection Model โ€” AraBERT Fine-Tuned

This model fine-tunes **AraBERT** for detecting **end-of-turn (EOU)** boundaries in Arabic dialogue.  
It predicts whether a given user message represents a **continuation** or an **end of turn**.

- **Repository:** `nihad-ask/Arabert-EOU-detection-model`  
- **Task:** Binary End-of-Utterance Classification  
- **Language:** Arabic (MSA + Dialects)  
- **Base Model:** `aubmindlab/bert-base-arabertv2`  

---

## ๐Ÿšฆ Task Definition

This is a **binary classification** task:

| Label | Meaning |
|-------|----------|
| **0** | Speaker will continue (NOT end of turn) |
| **1** | End of turn (EOU detected) |

---

## ๐Ÿ“Œ Use Cases

- Conversational AI / Chatbots  
- Dialogue Systems  
- Turn-taking prediction  
- Speech-to-text segmentation  
- Customer support automation  


---

## ๐Ÿ“Š Evaluation

### **Balanced Validation Set**

**Accuracy:** `0.9539`

| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| **0 โ€“ Continue** | 0.9494 | 0.9589 | 0.9541 | 1702 |
| **1 โ€“ End of Turn** | 0.9585 | 0.9489 | 0.9536 | 1702 |

**Overall:**

| Metric | Score |
|--------|--------|
| Accuracy | 0.9539 |
| Macro Avg F1 | 0.9539 |
| Weighted Avg F1 | 0.9539 |
| Total Samples | 3404 |

---

### **Test Set**

**Accuracy:** `0.8919`

| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| **0 โ€“ Continue** | 0.7671 | 0.9445 | 0.8466 | 3097 |
| **1 โ€“ End of Turn** | 0.9713 | 0.8676 | 0.9165 | 6705 |

**Overall:**

| Metric | Score |
|--------|--------|
| Accuracy | 0.8919 |
| Macro Avg F1 | 0.8815 |
| Weighted Avg F1 | 0.8944 |
| Total Samples | 9802 |

---

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "nihad-ask/Arabert-EOU-detection-model"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "ุชู…ุงู… ูˆ ุจุนุฏูŠู†ุŸ"

inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()

if prediction == 1:
    print("End of turn")
else:
    print("Speaker will continue")