arabic-eou-marbert / README.md

Upload Arabic EOU detection model (MARBERT fine-tuned)

265095b verified about 1 month ago

4.9 kB

	---
	language:
	- ar
	license: apache-2.0
	library_name: transformers
	tags:
	- arabic
	- end-of-utterance
	- eou
	- turn-detection
	- voice-agent
	- saudi-dialect
	- conversational-ai
	- livekit
	datasets:
	- Amr-h/EOU_Arabic_Saudi
	base_model: UBC-NLP/MARBERT
	pipeline_tag: text-classification
	metrics:
	- accuracy
	- f1
	model-index:
	- name: arabic-eou-marbert
	results:
	- task:
	type: text-classification
	name: End of Utterance Detection
	metrics:
	- type: accuracy
	value: 0.918
	name: OOD Accuracy
	- type: f1
	value: 0.9176
	name: Weighted F1
	---

	# Arabic End-of-Utterance (EOU) Detection Model

	This model detects whether a speaker has finished their turn in Arabic conversations, with emphasis on Saudi dialect. It's designed for real-time voice agent applications like LiveKit.

	## Model Description

	- Base Model: [UBC-NLP/MARBERT](https://huggingface.co/UBC-NLP/MARBERT)
	- Language: Arabic (with focus on Saudi/Gulf dialect)
	- Task: Binary classification (Complete vs Incomplete utterance)
	- Use Case: Real-time turn detection for voice agents

	## Labels

	\| Label \| ID \| Description \|
	\|-------\|-----\|-------------\|
	\| INCOMPLETE \| 0 \| Speaker has not finished their turn \|
	\| COMPLETE \| 1 \| Speaker has finished their turn \|

	## Performance

	### Out-of-Distribution Test Results (200 samples)

	\| Metric \| Complete (1) \| Incomplete (0) \|
	\|--------\|--------------\|----------------\|
	\| Precision \| 100.00% \| 85.94% \|
	\| Recall \| 83.64% \| 100.00% \|
	\| F1-Score \| 91.09% \| 92.44% \|

	Overall Weighted F1: 91.76%

	### Key Characteristics
	- ✅ Zero false interruptions - Model never incorrectly predicts "Complete" for incomplete utterances
	- ✅ Conservative behavior - Ideal for voice agents (better to wait than interrupt)
	- ✅ Fast inference - Suitable for real-time applications

	## Usage

	### Quick Start

	```python
	from transformers import pipeline

	# Load the model
	eou_detector = pipeline(
	"text-classification",
	model="Amr-h/arabic-eou-marbert",
	device=0 # Use GPU, or -1 for CPU
	)

	# Detect end of utterance
	text = "هل بلغوك انهم بيحتاجون ساعات اضافيه؟"
	result = eou_detector(text)[0]

	print(f"Label: {result['label']}") # COMPLETE or INCOMPLETE
	print(f"Confidence: {result['score']:.2%}")
	```

	### With Model and Tokenizer

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	# Load model and tokenizer
	model = AutoModelForSequenceClassification.from_pretrained("Amr-h/arabic-eou-marbert")
	tokenizer = AutoTokenizer.from_pretrained("Amr-h/arabic-eou-marbert")

	# Inference
	text = "انتظر خلني اشوف وين حطيت ال"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=64)

	with torch.no_grad():
	outputs = model(**inputs)
	prediction = torch.argmax(outputs.logits, dim=-1).item()

	label = "COMPLETE" if prediction == 1 else "INCOMPLETE"
	print(f"Prediction: {label}")
	```

	### For LiveKit Integration

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	class ArabicEOUDetector:
	def __init__(self, model_name="Amr-h/arabic-eou-marbert"):
	self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
	self.tokenizer = AutoTokenizer.from_pretrained(model_name)
	self.model.eval()

	def predict(self, text: str) -> tuple[bool, float]:
	"""
	Returns (is_complete, confidence)
	"""
	inputs = self.tokenizer(
	text,
	return_tensors="pt",
	truncation=True,
	max_length=64
	)

	with torch.no_grad():
	outputs = self.model(**inputs)
	probs = torch.softmax(outputs.logits, dim=-1)
	prediction = torch.argmax(probs, dim=-1).item()
	confidence = probs[0][prediction].item()

	is_complete = prediction == 1
	return is_complete, confidence
	```

	## Training Details

	- Training Data: ~12,000 Saudi dialect Arabic utterances
	- Validation Data: ~1,500 samples
	- Test Data: ~1,500 samples
	- Epochs: 3
	- Learning Rate: 2e-5
	- Batch Size: 32
	- Max Length: 64 tokens

	## Intended Use

	- Real-time voice agents and conversational AI
	- Turn-taking detection in Arabic dialogue systems
	- LiveKit agent integration
	- Customer service voice bots

	## Limitations

	- Optimized for Saudi/Gulf Arabic dialect
	- May require fine-tuning for other Arabic dialects
	- Designed for spoken/conversational text, not formal written Arabic

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{arabic-eou-marbert,
	author = {YOUR_NAME},
	title = {Arabic End-of-Utterance Detection Model},
	year = {2024},
	publisher = {Hugging Face},
	url = {https://huggingface.co/Amr-h/arabic-eou-marbert}
	}
	```

	## License

	Apache 2.0