nihad-ask's picture
Update README.md
ca9ff46 verified
# Arabic End-of-Turn (EOU) Detection Model โ€” AraBERT Fine-Tuned
This model fine-tunes **AraBERT** for detecting **end-of-turn (EOU)** boundaries in Arabic dialogue.
It predicts whether a given user message represents a **continuation** or an **end of turn**.
- **Repository:** `nihad-ask/Arabert-EOU-detection-model`
- **Task:** Binary End-of-Utterance Classification
- **Language:** Arabic (MSA + Dialects)
- **Base Model:** `aubmindlab/bert-base-arabertv2`
---
## ๐Ÿšฆ Task Definition
This is a **binary classification** task:
| Label | Meaning |
|-------|----------|
| **0** | Speaker will continue (NOT end of turn) |
| **1** | End of turn (EOU detected) |
---
## ๐Ÿ“Œ Use Cases
- Conversational AI / Chatbots
- Dialogue Systems
- Turn-taking prediction
- Speech-to-text segmentation
- Customer support automation
---
## ๐Ÿ“Š Evaluation
### **Balanced Validation Set**
**Accuracy:** `0.9539`
| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| **0 โ€“ Continue** | 0.9494 | 0.9589 | 0.9541 | 1702 |
| **1 โ€“ End of Turn** | 0.9585 | 0.9489 | 0.9536 | 1702 |
**Overall:**
| Metric | Score |
|--------|--------|
| Accuracy | 0.9539 |
| Macro Avg F1 | 0.9539 |
| Weighted Avg F1 | 0.9539 |
| Total Samples | 3404 |
---
### **Test Set**
**Accuracy:** `0.8919`
| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| **0 โ€“ Continue** | 0.7671 | 0.9445 | 0.8466 | 3097 |
| **1 โ€“ End of Turn** | 0.9713 | 0.8676 | 0.9165 | 6705 |
**Overall:**
| Metric | Score |
|--------|--------|
| Accuracy | 0.8919 |
| Macro Avg F1 | 0.8815 |
| Weighted Avg F1 | 0.8944 |
| Total Samples | 9802 |
---
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "nihad-ask/Arabert-EOU-detection-model"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "ุชู…ุงู… ูˆ ุจุนุฏูŠู†ุŸ"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()
if prediction == 1:
print("End of turn")
else:
print("Speaker will continue")