nihad-ask commited on
Commit
43c70a7
Β·
verified Β·
1 Parent(s): ca9ff46

Update readme.md

Browse files
Files changed (1) hide show
  1. readme.md +15 -71
readme.md CHANGED
@@ -1,12 +1,12 @@
1
- # Arabic End-of-Turn (EOU) Detection Model β€” MARBERT Fine-Tuned
2
 
3
- This model fine-tunes **MARBERT** for detecting **end-of-turn (EOU)** boundaries in Arabic dialogue.
4
  It predicts whether a given user message represents a **continuation** or an **end of turn**.
5
 
6
  - **Repository:** `nihad-ask/Arabert-EOU-detection-model`
7
  - **Task:** Binary End-of-Utterance Classification
8
  - **Language:** Arabic (MSA + Dialects)
9
- - **Base Model:** `UBC-NLP/MARBERT`
10
 
11
  ---
12
 
@@ -19,8 +19,6 @@ This is a **binary classification** task:
19
  | **0** | Speaker will continue (NOT end of turn) |
20
  | **1** | End of turn (EOU detected) |
21
 
22
- This helps conversational agents determine if the user has finished typing or is likely to continue.
23
-
24
  ---
25
 
26
  ## πŸ“Œ Use Cases
@@ -31,18 +29,6 @@ This helps conversational agents determine if the user has finished typing or is
31
  - Speech-to-text segmentation
32
  - Customer support automation
33
 
34
- ---
35
-
36
- ## 🧠 Model Details
37
-
38
- - Base architecture: MARBERT (Arabic-focused RoBERTa variant)
39
- - Added: Classification head (2 classes)
40
- - Framework: Hugging Face Transformers
41
- - Max sequence length: 128
42
-
43
- ---
44
-
45
-
46
 
47
  ---
48
 
@@ -50,86 +36,44 @@ This helps conversational agents determine if the user has finished typing or is
50
 
51
  ### **Balanced Validation Set**
52
 
53
- **Accuracy:** `0.9098`
54
 
55
  | Class | Precision | Recall | F1-score | Support |
56
  |-------|-----------|--------|----------|---------|
57
- | **0 – Continue** | 0.9058 | 0.9148 | 0.9103 | 1702 |
58
- | **1 – End of Turn** | 0.9139 | 0.9048 | 0.9094 | 1702 |
59
 
60
  **Overall:**
61
 
62
  | Metric | Score |
63
  |--------|--------|
64
- | Accuracy | 0.9098 |
65
- | Macro Avg F1 | 0.9098 |
66
- | Weighted Avg F1 | 0.9098 |
67
  | Total Samples | 3404 |
68
 
69
  ---
70
 
71
  ### **Test Set**
72
 
73
- **Accuracy:** `0.8764`
74
 
75
  | Class | Precision | Recall | F1-score | Support |
76
  |-------|-----------|--------|----------|---------|
77
- | **0 – Continue** | 0.7650 | 0.8786 | 0.8179 | 3097 |
78
- | **1 – End of Turn** | 0.9398 | 0.8753 | 0.9064 | 6705 |
79
 
80
  **Overall:**
81
 
82
  | Metric | Score |
83
  |--------|--------|
84
- | Accuracy | 0.8764 |
85
- | Macro Avg F1 | 0.8621 |
86
- | Weighted Avg F1 | 0.8784 |
87
  | Total Samples | 9802 |
88
 
89
  ---
90
 
91
- <details>
92
- <summary><strong>Full Classification Reports</strong></summary>
93
-
94
- **Balanced Validation Set**
95
- Accuracy: 0.9098119858989424
96
- precision recall f1-score support
97
-
98
- 0 0.9058 0.9148 0.9103 1702
99
- 1 0.9139 0.9048 0.9094 1702
100
-
101
- accuracy 0.9098 3404
102
-
103
-
104
- macro avg 0.9099 0.9098 0.9098 3404
105
- weighted avg 0.9099 0.9098 0.9098 3404
106
-
107
-
108
- **Test Set**
109
-
110
-
111
-
112
- Accuracy: 0.8763517649459294
113
- precision recall f1-score support
114
-
115
- 0 0.7650 0.8786 0.8179 3097
116
- 1 0.9398 0.8753 0.9064 6705
117
-
118
- accuracy 0.8764 9802
119
-
120
-
121
- macro avg 0.8524 0.8770 0.8621 9802
122
- weighted avg 0.8846 0.8764 0.8784 9802
123
-
124
-
125
- </details>
126
-
127
- ---
128
-
129
- ## πŸ§ͺ How to Use
130
-
131
- ### **Python (PyTorch)**
132
-
133
  ```python
134
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
135
  import torch
 
1
+ # Arabic End-of-Turn (EOU) Detection Model β€” AraBERT Fine-Tuned
2
 
3
+ This model fine-tunes **AraBERT** for detecting **end-of-turn (EOU)** boundaries in Arabic dialogue.
4
  It predicts whether a given user message represents a **continuation** or an **end of turn**.
5
 
6
  - **Repository:** `nihad-ask/Arabert-EOU-detection-model`
7
  - **Task:** Binary End-of-Utterance Classification
8
  - **Language:** Arabic (MSA + Dialects)
9
+ - **Base Model:** `aubmindlab/bert-base-arabertv2`
10
 
11
  ---
12
 
 
19
  | **0** | Speaker will continue (NOT end of turn) |
20
  | **1** | End of turn (EOU detected) |
21
 
 
 
22
  ---
23
 
24
  ## πŸ“Œ Use Cases
 
29
  - Speech-to-text segmentation
30
  - Customer support automation
31
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  ---
34
 
 
36
 
37
  ### **Balanced Validation Set**
38
 
39
+ **Accuracy:** `0.9539`
40
 
41
  | Class | Precision | Recall | F1-score | Support |
42
  |-------|-----------|--------|----------|---------|
43
+ | **0 – Continue** | 0.9494 | 0.9589 | 0.9541 | 1702 |
44
+ | **1 – End of Turn** | 0.9585 | 0.9489 | 0.9536 | 1702 |
45
 
46
  **Overall:**
47
 
48
  | Metric | Score |
49
  |--------|--------|
50
+ | Accuracy | 0.9539 |
51
+ | Macro Avg F1 | 0.9539 |
52
+ | Weighted Avg F1 | 0.9539 |
53
  | Total Samples | 3404 |
54
 
55
  ---
56
 
57
  ### **Test Set**
58
 
59
+ **Accuracy:** `0.8919`
60
 
61
  | Class | Precision | Recall | F1-score | Support |
62
  |-------|-----------|--------|----------|---------|
63
+ | **0 – Continue** | 0.7671 | 0.9445 | 0.8466 | 3097 |
64
+ | **1 – End of Turn** | 0.9713 | 0.8676 | 0.9165 | 6705 |
65
 
66
  **Overall:**
67
 
68
  | Metric | Score |
69
  |--------|--------|
70
+ | Accuracy | 0.8919 |
71
+ | Macro Avg F1 | 0.8815 |
72
+ | Weighted Avg F1 | 0.8944 |
73
  | Total Samples | 9802 |
74
 
75
  ---
76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
  ```python
78
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
79
  import torch