turn-detector-v2 / README.md

v2.0: Semantic rule improvements + dataset expansion (+9082 samples)

ab5f062 verified about 1 month ago

11.9 kB

	---
	language: tr
	license: other
	license_name: siriusai-premium-v1
	license_link: LICENSE
	tags:
	- turkish
	- text-classification
	- bert
	- nlp
	- transformers
	- turn-detection
	- voice-assistant
	- latency-optimization
	- siriusai
	- production-ready
	- enterprise
	base_model: dbmdz/bert-base-turkish-uncased
	datasets:
	- custom
	metrics:
	- f1
	- precision
	- recall
	- accuracy
	- mcc
	library_name: transformers
	pipeline_tag: text-classification
	model-index:
	- name: turn-detector-v2
	results:
	- task:
	type: text-classification
	name: Text Classification
	metrics:
	- type: f1
	value: 0.9769
	name: Macro F1
	- type: mcc
	value: 0.9544
	name: MCC
	- type: accuracy
	value: 97.94
	name: Accuracy
	---

	# turn-detector-v2 - Turkish Turn Detection Model

	<p align="center">
	<a href="https://huggingface.co/hayatiali/turn-detector-v2"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-turn--detector--v2-yellow" alt="Hugging Face"></a>
	<a href="https://huggingface.co/hayatiali/turn-detector-v2"><img src="https://img.shields.io/badge/Model-Production%20Ready-brightgreen" alt="Production Ready"></a>
	<img src="https://img.shields.io/badge/Language-Turkish-blue" alt="Turkish">
	<img src="https://img.shields.io/badge/Task-Turn%20Detection-orange" alt="Turn Detection">
	<img src="https://img.shields.io/badge/F1-97.69%25-success" alt="F1 Score">
	</p>

	This model is designed for detecting turn-taking patterns in Turkish conversations, optimizing voice assistant latency by identifying when user utterances require LLM processing vs. simple acknowledgments.

	Developed by SiriusAI Tech Brain Team

	---

	## Mission

	> To optimize voice assistant response latency by detecting when user utterances require LLM processing vs. simple acknowledgments.

	The `turn-detector-v2` model analyzes conversational turn pairs (bot utterance + user response) and classifies whether the user's response requires LLM processing (agent_response) or is just a backchannel acknowledgment that can be handled without LLM (backchannel).

	### Key Benefits

	\| Benefit \| Description \|
	\|---------\|-------------\|
	\| Latency Reduction \| Skip LLM calls for backchannels, saving 500-2000ms per interaction \|
	\| Cost Optimization \| Reduce LLM API costs by filtering unnecessary calls \|
	\| Natural Conversation \| Return immediate filler responses ("hmm", "tamam") for acknowledgments \|
	\| High Accuracy \| 97.94% accuracy ensures reliable real-world performance \|

	---

	## Model Overview

	\| Property \| Value \|
	\|----------\|-------\|
	\| Architecture \| BertForSequenceClassification \|
	\| Base Model \| `dbmdz/bert-base-turkish-uncased` \|
	\| Task \| Binary Text Classification \|
	\| Language \| Turkish (tr) \|
	\| Labels \| 2 (agent_response, backchannel) \|
	\| Model Size \| ~110M parameters \|
	\| Inference Time \| ~10-15ms (GPU) / ~40-50ms (CPU) \|

	---

	## Performance Metrics

	### Final Evaluation Results

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Macro F1 \| 0.9769 \|
	\| Micro F1 \| 0.9794 \|
	\| MCC \| 0.9544 \|
	\| Accuracy \| 97.94% \|

	### Per-Class Performance

	\| Category \| Accuracy \| Samples \|
	\|----------\|----------\|---------\|
	\| agent_response \| 99.57% \| 8,553 \|
	\| backchannel \| 94.83% \| 4,470 \|

	---

	## Semantic Classification Rules

	### When to Classify as `backchannel` (Skip LLM)

	\| Condition \| Examples \|
	\|-----------\|----------\|
	\| Bot gives info + User short acknowledgment \| "tamam", "anladim", "ok", "peki" \|
	\| Bot gives info + User rhetorical question \| "oyle mi?", "harbi mi?", "cidden mi?" \|
	\| Bot gives info + User minimal response \| "hmm", "hi hi", "evet" \|

	### When to Classify as `agent_response` (Send to LLM)

	\| Condition \| Examples \|
	\|-----------\|----------\|
	\| Bot asks question + User gives any answer \| "[bot] adi nedir [sep] [user] ahmet" \|
	\| Bot gives info + User asks real question \| "[bot] faturaniz kesildi [sep] [user] ne zaman?" \|
	\| Bot gives info + User makes request \| "[bot] kargonuz yolda [sep] [user] adresi degistirmek istiyorum" \|
	\| User provides detailed information \| "[bot] bilgi verir misiniz [sep] [user] sunu sunu istiyorum cunku..." \|

	### Golden Rule

	```
	If bot asked a question → Always agent_response
	If bot gave info + User short acknowledgment → backchannel
	```

	---

	## Dataset

	### Dataset Statistics

	\| Split \| Samples \|
	\|-------\|---------\|
	\| Train \| 52,287 \|
	\| Test \| 13,023 \|
	\| Total \| 65,310 \|

	### Label Distribution

	\| Label \| Count \| Percentage \|
	\|-------\|-------\|------------\|
	\| agent_response \| 35,223 \| 67.4% \|
	\| backchannel \| 17,064 \| 32.6% \|

	### Domain Coverage

	- E-commerce (kargo, iade, teslimat)
	- Banking (hesap, bakiye, kredi)
	- Telecom (numara tasima, data, hat)
	- Insurance (prim, police, teminat, kasko)
	- General Support (sikayet, yonetici, eskalasyon)
	- Identity Verification (TC, gorusuyorum, soyadi)

	---

	## Label Definitions

	\| Label \| ID \| Description \|
	\|-------\|-----\|-------------\|
	\| agent_response \| 0 \| User response requires LLM processing - questions, requests, confirmations to questions, corrections \|
	\| backchannel \| 1 \| Simple acknowledgment - LLM skipped, filler returned (tamam, anladim, ok) \|

	### Input Format

	```
	[bot] <bot utterance> [sep] [user] <user response>
	```

	### Example Classifications

	agent_response (Send to LLM):
	```
	[bot] size nasil yardimci olabilirim [sep] [user] fatura sorgulamak istiyorum
	[bot] ahmet bey ile mi gorusuyorum [sep] [user] evet benim
	[bot] islemi onayliyor musunuz [sep] [user] evet onayliyorum
	[bot] kargonuz yolda [sep] [user] ne zaman gelir
	[bot] poliçeniz aktif [sep] [user] teminat limitini ogrenebilir miyim
	```

	backchannel (Skip LLM, return filler):
	```
	[bot] faturaniz 150 tl gorunuyor [sep] [user] tamam
	[bot] siparisiniz 3 gun icinde teslim edilecek [sep] [user] anladim
	[bot] kaydinizi kontrol ediyorum [sep] [user] peki
	[bot] policeniz yenilendi [sep] [user] tesekkurler
	[bot] sifreni sms ile gonderdik [sep] [user] ok aldim
	```

	---

	## Training

	### Hyperparameters

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Base Model \| `dbmdz/bert-base-turkish-uncased` \|
	\| Max Sequence Length \| 128 tokens \|
	\| Batch Size \| 16 \|
	\| Learning Rate \| 3e-5 \|
	\| Epochs \| 4 \|
	\| Optimizer \| AdamW \|
	\| Weight Decay \| 0.01 \|
	\| Loss Function \| CrossEntropyLoss \|
	\| Hardware \| Apple Silicon (MPS) \|

	---

	## Usage

	### Installation

	```bash
	pip install transformers torch
	```

	### Quick Start

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model_name = "hayatiali/turn-detector-v2"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)
	model.eval()

	LABELS = ["agent_response", "backchannel"]

	def predict(text):
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
	with torch.no_grad():
	outputs = model(**inputs)
	probs = torch.softmax(outputs.logits, dim=-1)[0]

	scores = {label: float(prob) for label, prob in zip(LABELS, probs)}
	return {"label": max(scores, key=scores.get), "confidence": max(scores.values())}

	# Bot asks question → agent_response
	print(predict("[bot] ahmet bey ile mi gorusuyorum [sep] [user] evet benim"))
	# Output: {'label': 'agent_response', 'confidence': 0.99}

	# Bot gives info + User acknowledges → backchannel
	print(predict("[bot] faturaniz 150 tl gorunuyor [sep] [user] tamam"))
	# Output: {'label': 'backchannel', 'confidence': 0.98}
	```

	### Production Integration

	```python
	class TurnDetector:
	"""Production-ready turn detection for voice assistants."""

	LABELS = ["agent_response", "backchannel"]
	FILLER_RESPONSES = ["hmm", "evet", "tamam", "anlıyorum"]

	def __init__(self, model_path="hayatiali/turn-detector-v2"):
	self.tokenizer = AutoTokenizer.from_pretrained(model_path)
	self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
	self.device = "cuda" if torch.cuda.is_available() else "cpu"
	self.model.to(self.device).eval()

	def should_call_llm(self, bot_text: str, user_text: str) -> dict:
	"""
	Determines if user response should go to LLM.

	Returns:
	dict with 'call_llm' (bool), 'label', 'confidence', 'filler' (if backchannel)
	"""
	text = f"[bot] {bot_text} [sep] [user] {user_text}"
	inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
	inputs = {k: v.to(self.device) for k, v in inputs.items()}

	with torch.no_grad():
	probs = torch.softmax(self.model(**inputs).logits, dim=-1)[0].cpu()

	label_idx = probs.argmax().item()
	label = self.LABELS[label_idx]
	confidence = probs[label_idx].item()

	result = {
	"call_llm": label == "agent_response",
	"label": label,
	"confidence": confidence
	}

	if label == "backchannel":
	import random
	result["filler"] = random.choice(self.FILLER_RESPONSES)

	return result

	# Usage
	detector = TurnDetector()

	# Case 1: Bot asks, user confirms → Send to LLM
	result = detector.should_call_llm("siparis iptal etmek ister misiniz", "evet iptal et")
	# {'call_llm': True, 'label': 'agent_response', 'confidence': 0.99}

	# Case 2: Bot informs, user acknowledges → Return filler
	result = detector.should_call_llm("siparisiz yola cikti", "tamam")
	# {'call_llm': False, 'label': 'backchannel', 'confidence': 0.97, 'filler': 'hmm'}
	```

	---

	## Limitations

	\| Limitation \| Details \|
	\|------------\|---------\|
	\| Language \| Turkish only, may struggle with heavy dialects \|
	\| Context \| Single-turn analysis, no multi-turn memory \|
	\| Domain \| Trained on customer service, may need fine-tuning for other domains \|
	\| Edge Cases \| Ambiguous short responses may have lower confidence \|

	---

	## Citation

	```bibtex
	@misc{turn-detector-v2-2025,
	title={turn-detector-v2: Turkish Turn Detection for Voice Assistants},
	author={SiriusAI Tech Brain Team},
	year={2025},
	publisher={Hugging Face},
	howpublished={\url{https://huggingface.co/hayatiali/turn-detector-v2}},
	note={Fine-tuned from dbmdz/bert-base-turkish-uncased}
	}
	```

	---

	## Contact

	- Developer: SiriusAI Tech Brain Team
	- Email: info@siriusaitech.com
	- Repository: [GitHub](https://github.com/sirius-tedarik)

	---

	## Changelog

	### v2.0 (Current)

	Semantic Rule Improvements:
	- If bot asks a question → always `agent_response` (731 rows corrected)
	- Rhetorical questions ("really?", "is that so?") → remain as `backchannel`
	- If user asks a real question ("when?", "how?") → `agent_response`

	Dataset Expansion (+9,082 samples):

	\| Category \| Added Patterns \|
	\|----------\|----------------\|
	\| Insurance \| premium, policy, coverage, comprehensive, interest, late fees \|
	\| Telecom \| number porting, data exhausted, line transfer, GB remaining \|
	\| E-commerce \| shipping cost, free shipping, returns, delivery \|
	\| Price/Budget \| expensive, budget, too much, will think about it, not suitable \|
	\| Identity Verification \| national ID, "am I speaking with...", surname, date of birth \|
	\| Objection/Complaint \| unacceptable, not satisfied, complaint, impossible \|
	\| Escalation \| manager, director, supervisor \|
	\| Hold Requests \| one moment, busy right now, not now, later \|

	Metrics: Macro F1: 0.9769, Accuracy: 97.94%

	> Note: Metrics appear slightly lower than v1.0, but this is a more accurate model.
	> v1.0 had mislabeled data (bot asked question + "yes" = backchannel),
	> which the model memorized. v2.0 ensures semantic consistency.

	### v1.0
	- Initial release
	- Dataset: 56,228 samples
	- Macro F1: 0.9924, Accuracy: 99.3%

	---

	License: SiriusAI Tech Premium License v1.0

	Commercial Use: Requires Premium License. Contact: info@siriusaitech.com