nihad-ask commited on
Commit
ca9ff46
Β·
verified Β·
1 Parent(s): 0beddc7

Update README.md

Browse files

@misc {arabert_eou_2025,
author = {Nihad Askri},
title = {ARABERT Arabic End-of-Utterance Detection},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/nihad-ask/arabert-arabic-EOU-detection-model}}
}

Files changed (1) hide show
  1. README.md +31 -26
README.md CHANGED
@@ -1,12 +1,12 @@
1
- # Arabic End-of-Turn (EOU) Detection Model β€” MARBERT Fine-Tuned
2
 
3
- This model fine-tunes **MARBERT** for detecting **end-of-turn (EOU)** boundaries in Arabic dialogue.
4
  It predicts whether a given user message represents a **continuation** or an **end of turn**.
5
 
6
- - **Repository:** `nihad-ask/Arabert-EOU-detection-model`
7
- - **Task:** Binary End-of-Utterance Classification
8
- - **Language:** Arabic (MSA + saudi dilect)
9
- - **Base Model:** `UBC-NLP/MARBERT`
10
 
11
  ---
12
 
@@ -19,8 +19,6 @@ This is a **binary classification** task:
19
  | **0** | Speaker will continue (NOT end of turn) |
20
  | **1** | End of turn (EOU detected) |
21
 
22
- This helps conversational agents determine if the user has finished typing or is likely to continue.
23
-
24
  ---
25
 
26
  ## πŸ“Œ Use Cases
@@ -31,34 +29,50 @@ This helps conversational agents determine if the user has finished typing or is
31
  - Speech-to-text segmentation
32
  - Customer support automation
33
 
34
- ---
35
 
 
36
 
37
  ## πŸ“Š Evaluation
38
 
39
  ### **Balanced Validation Set**
40
 
41
- **Accuracy:** `0.9098`
42
 
43
  | Class | Precision | Recall | F1-score | Support |
44
  |-------|-----------|--------|----------|---------|
45
- | **0 – Continue** | 0.9058 | 0.9148 | 0.9103 | 1702 |
46
- | **1 – End of Turn** | 0.9139 | 0.9048 | 0.9094 | 1702 |
47
 
48
  **Overall:**
49
 
50
  | Metric | Score |
51
  |--------|--------|
52
- | Accuracy | 0.9098 |
53
- | Macro Avg F1 | 0.9098 |
54
- | Weighted Avg F1 | 0.9098 |
55
  | Total Samples | 3404 |
56
 
 
 
 
57
 
 
58
 
59
- ## πŸ§ͺ How to Use
 
 
 
60
 
61
- ### **Python (PyTorch)**
 
 
 
 
 
 
 
 
 
62
 
63
  ```python
64
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
@@ -79,12 +93,3 @@ if prediction == 1:
79
  print("End of turn")
80
  else:
81
  print("Speaker will continue")
82
-
83
-
84
- @misc{marbert_eou_2025,
85
- author = {Nihad Askri},
86
- title = {MARBERT Arabic End-of-Utterance Detection},
87
- year = {2025},
88
- publisher = {Hugging Face},
89
- howpublished = {\url{https://huggingface.co/nihad-ask/marbert-arabic-EOU-detection-model}}
90
- }
 
1
+ # Arabic End-of-Turn (EOU) Detection Model β€” AraBERT Fine-Tuned
2
 
3
+ This model fine-tunes **AraBERT** for detecting **end-of-turn (EOU)** boundaries in Arabic dialogue.
4
  It predicts whether a given user message represents a **continuation** or an **end of turn**.
5
 
6
+ - **Repository:** `nihad-ask/Arabert-EOU-detection-model`
7
+ - **Task:** Binary End-of-Utterance Classification
8
+ - **Language:** Arabic (MSA + Dialects)
9
+ - **Base Model:** `aubmindlab/bert-base-arabertv2`
10
 
11
  ---
12
 
 
19
  | **0** | Speaker will continue (NOT end of turn) |
20
  | **1** | End of turn (EOU detected) |
21
 
 
 
22
  ---
23
 
24
  ## πŸ“Œ Use Cases
 
29
  - Speech-to-text segmentation
30
  - Customer support automation
31
 
 
32
 
33
+ ---
34
 
35
  ## πŸ“Š Evaluation
36
 
37
  ### **Balanced Validation Set**
38
 
39
+ **Accuracy:** `0.9539`
40
 
41
  | Class | Precision | Recall | F1-score | Support |
42
  |-------|-----------|--------|----------|---------|
43
+ | **0 – Continue** | 0.9494 | 0.9589 | 0.9541 | 1702 |
44
+ | **1 – End of Turn** | 0.9585 | 0.9489 | 0.9536 | 1702 |
45
 
46
  **Overall:**
47
 
48
  | Metric | Score |
49
  |--------|--------|
50
+ | Accuracy | 0.9539 |
51
+ | Macro Avg F1 | 0.9539 |
52
+ | Weighted Avg F1 | 0.9539 |
53
  | Total Samples | 3404 |
54
 
55
+ ---
56
+
57
+ ### **Test Set**
58
 
59
+ **Accuracy:** `0.8919`
60
 
61
+ | Class | Precision | Recall | F1-score | Support |
62
+ |-------|-----------|--------|----------|---------|
63
+ | **0 – Continue** | 0.7671 | 0.9445 | 0.8466 | 3097 |
64
+ | **1 – End of Turn** | 0.9713 | 0.8676 | 0.9165 | 6705 |
65
 
66
+ **Overall:**
67
+
68
+ | Metric | Score |
69
+ |--------|--------|
70
+ | Accuracy | 0.8919 |
71
+ | Macro Avg F1 | 0.8815 |
72
+ | Weighted Avg F1 | 0.8944 |
73
+ | Total Samples | 9802 |
74
+
75
+ ---
76
 
77
  ```python
78
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
 
93
  print("End of turn")
94
  else:
95
  print("Speaker will continue")