Update README.md

ca9ff46 verified 2 months ago

2.27 kB

	# Arabic End-of-Turn (EOU) Detection Model — AraBERT Fine-Tuned

	This model fine-tunes AraBERT for detecting end-of-turn (EOU) boundaries in Arabic dialogue.
	It predicts whether a given user message represents a continuation or an end of turn.

	- Repository: `nihad-ask/Arabert-EOU-detection-model`
	- Task: Binary End-of-Utterance Classification
	- Language: Arabic (MSA + Dialects)
	- Base Model: `aubmindlab/bert-base-arabertv2`

	---

	## 🚦 Task Definition

	This is a binary classification task:

	\| Label \| Meaning \|
	\|-------\|----------\|
	\| 0 \| Speaker will continue (NOT end of turn) \|
	\| 1 \| End of turn (EOU detected) \|

	---

	## 📌 Use Cases

	- Conversational AI / Chatbots
	- Dialogue Systems
	- Turn-taking prediction
	- Speech-to-text segmentation
	- Customer support automation


	---

	## 📊 Evaluation

	### Balanced Validation Set

	Accuracy: `0.9539`

	\| Class \| Precision \| Recall \| F1-score \| Support \|
	\|-------\|-----------\|--------\|----------\|---------\|
	\| 0 – Continue \| 0.9494 \| 0.9589 \| 0.9541 \| 1702 \|
	\| 1 – End of Turn \| 0.9585 \| 0.9489 \| 0.9536 \| 1702 \|

	Overall:

	\| Metric \| Score \|
	\|--------\|--------\|
	\| Accuracy \| 0.9539 \|
	\| Macro Avg F1 \| 0.9539 \|
	\| Weighted Avg F1 \| 0.9539 \|
	\| Total Samples \| 3404 \|

	---

	### Test Set

	Accuracy: `0.8919`

	\| Class \| Precision \| Recall \| F1-score \| Support \|
	\|-------\|-----------\|--------\|----------\|---------\|
	\| 0 – Continue \| 0.7671 \| 0.9445 \| 0.8466 \| 3097 \|
	\| 1 – End of Turn \| 0.9713 \| 0.8676 \| 0.9165 \| 6705 \|

	Overall:

	\| Metric \| Score \|
	\|--------\|--------\|
	\| Accuracy \| 0.8919 \|
	\| Macro Avg F1 \| 0.8815 \|
	\| Weighted Avg F1 \| 0.8944 \|
	\| Total Samples \| 9802 \|

	---

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model_name = "nihad-ask/Arabert-EOU-detection-model"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	text = "تمام و بعدين؟"

	inputs = tokenizer(text, return_tensors="pt")
	outputs = model(**inputs)
	prediction = torch.argmax(outputs.logits, dim=1).item()

	if prediction == 1:
	print("End of turn")
	else:
	print("Speaker will continue")

	# Arabic End-of-Turn (EOU) Detection Model — AraBERT Fine-Tuned

	This model fine-tunes AraBERT for detecting end-of-turn (EOU) boundaries in Arabic dialogue.
	It predicts whether a given user message represents a continuation or an end of turn.

	- Repository: `nihad-ask/Arabert-EOU-detection-model`
	- Task: Binary End-of-Utterance Classification
	- Language: Arabic (MSA + Dialects)
	- Base Model: `aubmindlab/bert-base-arabertv2`

	---

	## 🚦 Task Definition

	This is a binary classification task:

	\| Label \| Meaning \|
	\|-------\|----------\|
	\| 0 \| Speaker will continue (NOT end of turn) \|
	\| 1 \| End of turn (EOU detected) \|

	---

	## 📌 Use Cases

	- Conversational AI / Chatbots
	- Dialogue Systems
	- Turn-taking prediction
	- Speech-to-text segmentation
	- Customer support automation


	---

	## 📊 Evaluation

	### Balanced Validation Set

	Accuracy: `0.9539`

	\| Class \| Precision \| Recall \| F1-score \| Support \|
	\|-------\|-----------\|--------\|----------\|---------\|
	\| 0 – Continue \| 0.9494 \| 0.9589 \| 0.9541 \| 1702 \|
	\| 1 – End of Turn \| 0.9585 \| 0.9489 \| 0.9536 \| 1702 \|

	Overall:

	\| Metric \| Score \|
	\|--------\|--------\|
	\| Accuracy \| 0.9539 \|
	\| Macro Avg F1 \| 0.9539 \|
	\| Weighted Avg F1 \| 0.9539 \|
	\| Total Samples \| 3404 \|

	---

	### Test Set

	Accuracy: `0.8919`

	\| Class \| Precision \| Recall \| F1-score \| Support \|
	\|-------\|-----------\|--------\|----------\|---------\|
	\| 0 – Continue \| 0.7671 \| 0.9445 \| 0.8466 \| 3097 \|
	\| 1 – End of Turn \| 0.9713 \| 0.8676 \| 0.9165 \| 6705 \|

	Overall:

	\| Metric \| Score \|
	\|--------\|--------\|
	\| Accuracy \| 0.8919 \|
	\| Macro Avg F1 \| 0.8815 \|
	\| Weighted Avg F1 \| 0.8944 \|
	\| Total Samples \| 9802 \|

	---

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model_name = "nihad-ask/Arabert-EOU-detection-model"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	text = "تمام و بعدين؟"

	inputs = tokenizer(text, return_tensors="pt")
	outputs = model(**inputs)
	prediction = torch.argmax(outputs.logits, dim=1).item()

	if prediction == 1:
	print("End of turn")
	else:
	print("Speaker will continue")