File size: 11,949 Bytes
ab5f062 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 |
---
language: tr
license: other
license_name: siriusai-premium-v1
license_link: LICENSE
tags:
- turkish
- text-classification
- bert
- nlp
- transformers
- turn-detection
- voice-assistant
- latency-optimization
- siriusai
- production-ready
- enterprise
base_model: dbmdz/bert-base-turkish-uncased
datasets:
- custom
metrics:
- f1
- precision
- recall
- accuracy
- mcc
library_name: transformers
pipeline_tag: text-classification
model-index:
- name: turn-detector-v2
results:
- task:
type: text-classification
name: Text Classification
metrics:
- type: f1
value: 0.9769
name: Macro F1
- type: mcc
value: 0.9544
name: MCC
- type: accuracy
value: 97.94
name: Accuracy
---
# turn-detector-v2 - Turkish Turn Detection Model
<p align="center">
<a href="https://huggingface.co/hayatiali/turn-detector-v2"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-turn--detector--v2-yellow" alt="Hugging Face"></a>
<a href="https://huggingface.co/hayatiali/turn-detector-v2"><img src="https://img.shields.io/badge/Model-Production%20Ready-brightgreen" alt="Production Ready"></a>
<img src="https://img.shields.io/badge/Language-Turkish-blue" alt="Turkish">
<img src="https://img.shields.io/badge/Task-Turn%20Detection-orange" alt="Turn Detection">
<img src="https://img.shields.io/badge/F1-97.69%25-success" alt="F1 Score">
</p>
This model is designed for detecting turn-taking patterns in Turkish conversations, optimizing voice assistant latency by identifying when user utterances require LLM processing vs. simple acknowledgments.
*Developed by SiriusAI Tech Brain Team*
---
## Mission
> **To optimize voice assistant response latency by detecting when user utterances require LLM processing vs. simple acknowledgments.**
The `turn-detector-v2` model analyzes **conversational turn pairs** (bot utterance + user response) and classifies whether the user's response requires LLM processing (**agent_response**) or is just a backchannel acknowledgment that can be handled without LLM (**backchannel**).
### Key Benefits
| Benefit | Description |
|---------|-------------|
| **Latency Reduction** | Skip LLM calls for backchannels, saving 500-2000ms per interaction |
| **Cost Optimization** | Reduce LLM API costs by filtering unnecessary calls |
| **Natural Conversation** | Return immediate filler responses ("hmm", "tamam") for acknowledgments |
| **High Accuracy** | 97.94% accuracy ensures reliable real-world performance |
---
## Model Overview
| Property | Value |
|----------|-------|
| **Architecture** | BertForSequenceClassification |
| **Base Model** | `dbmdz/bert-base-turkish-uncased` |
| **Task** | Binary Text Classification |
| **Language** | Turkish (tr) |
| **Labels** | 2 (agent_response, backchannel) |
| **Model Size** | ~110M parameters |
| **Inference Time** | ~10-15ms (GPU) / ~40-50ms (CPU) |
---
## Performance Metrics
### Final Evaluation Results
| Metric | Score |
|--------|-------|
| **Macro F1** | **0.9769** |
| **Micro F1** | **0.9794** |
| **MCC** | **0.9544** |
| **Accuracy** | **97.94%** |
### Per-Class Performance
| Category | Accuracy | Samples |
|----------|----------|---------|
| **agent_response** | 99.57% | 8,553 |
| **backchannel** | 94.83% | 4,470 |
---
## Semantic Classification Rules
### When to Classify as `backchannel` (Skip LLM)
| Condition | Examples |
|-----------|----------|
| Bot gives info + User short acknowledgment | "tamam", "anladim", "ok", "peki" |
| Bot gives info + User rhetorical question | "oyle mi?", "harbi mi?", "cidden mi?" |
| Bot gives info + User minimal response | "hmm", "hi hi", "evet" |
### When to Classify as `agent_response` (Send to LLM)
| Condition | Examples |
|-----------|----------|
| Bot asks question + User gives any answer | "[bot] adi nedir [sep] [user] ahmet" |
| Bot gives info + User asks real question | "[bot] faturaniz kesildi [sep] [user] ne zaman?" |
| Bot gives info + User makes request | "[bot] kargonuz yolda [sep] [user] adresi degistirmek istiyorum" |
| User provides detailed information | "[bot] bilgi verir misiniz [sep] [user] sunu sunu istiyorum cunku..." |
### Golden Rule
```
If bot asked a question → Always agent_response
If bot gave info + User short acknowledgment → backchannel
```
---
## Dataset
### Dataset Statistics
| Split | Samples |
|-------|---------|
| **Train** | 52,287 |
| **Test** | 13,023 |
| **Total** | 65,310 |
### Label Distribution
| Label | Count | Percentage |
|-------|-------|------------|
| **agent_response** | 35,223 | 67.4% |
| **backchannel** | 17,064 | 32.6% |
### Domain Coverage
- E-commerce (kargo, iade, teslimat)
- Banking (hesap, bakiye, kredi)
- Telecom (numara tasima, data, hat)
- Insurance (prim, police, teminat, kasko)
- General Support (sikayet, yonetici, eskalasyon)
- Identity Verification (TC, gorusuyorum, soyadi)
---
## Label Definitions
| Label | ID | Description |
|-------|-----|-------------|
| **agent_response** | 0 | User response requires LLM processing - questions, requests, confirmations to questions, corrections |
| **backchannel** | 1 | Simple acknowledgment - LLM skipped, filler returned (tamam, anladim, ok) |
### Input Format
```
[bot] <bot utterance> [sep] [user] <user response>
```
### Example Classifications
**agent_response** (Send to LLM):
```
[bot] size nasil yardimci olabilirim [sep] [user] fatura sorgulamak istiyorum
[bot] ahmet bey ile mi gorusuyorum [sep] [user] evet benim
[bot] islemi onayliyor musunuz [sep] [user] evet onayliyorum
[bot] kargonuz yolda [sep] [user] ne zaman gelir
[bot] poliçeniz aktif [sep] [user] teminat limitini ogrenebilir miyim
```
**backchannel** (Skip LLM, return filler):
```
[bot] faturaniz 150 tl gorunuyor [sep] [user] tamam
[bot] siparisiniz 3 gun icinde teslim edilecek [sep] [user] anladim
[bot] kaydinizi kontrol ediyorum [sep] [user] peki
[bot] policeniz yenilendi [sep] [user] tesekkurler
[bot] sifreni sms ile gonderdik [sep] [user] ok aldim
```
---
## Training
### Hyperparameters
| Parameter | Value |
|-----------|-------|
| **Base Model** | `dbmdz/bert-base-turkish-uncased` |
| **Max Sequence Length** | 128 tokens |
| **Batch Size** | 16 |
| **Learning Rate** | 3e-5 |
| **Epochs** | 4 |
| **Optimizer** | AdamW |
| **Weight Decay** | 0.01 |
| **Loss Function** | CrossEntropyLoss |
| **Hardware** | Apple Silicon (MPS) |
---
## Usage
### Installation
```bash
pip install transformers torch
```
### Quick Start
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "hayatiali/turn-detector-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()
LABELS = ["agent_response", "backchannel"]
def predict(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)[0]
scores = {label: float(prob) for label, prob in zip(LABELS, probs)}
return {"label": max(scores, key=scores.get), "confidence": max(scores.values())}
# Bot asks question → agent_response
print(predict("[bot] ahmet bey ile mi gorusuyorum [sep] [user] evet benim"))
# Output: {'label': 'agent_response', 'confidence': 0.99}
# Bot gives info + User acknowledges → backchannel
print(predict("[bot] faturaniz 150 tl gorunuyor [sep] [user] tamam"))
# Output: {'label': 'backchannel', 'confidence': 0.98}
```
### Production Integration
```python
class TurnDetector:
"""Production-ready turn detection for voice assistants."""
LABELS = ["agent_response", "backchannel"]
FILLER_RESPONSES = ["hmm", "evet", "tamam", "anlıyorum"]
def __init__(self, model_path="hayatiali/turn-detector-v2"):
self.tokenizer = AutoTokenizer.from_pretrained(model_path)
self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
self.device = "cuda" if torch.cuda.is_available() else "cpu"
self.model.to(self.device).eval()
def should_call_llm(self, bot_text: str, user_text: str) -> dict:
"""
Determines if user response should go to LLM.
Returns:
dict with 'call_llm' (bool), 'label', 'confidence', 'filler' (if backchannel)
"""
text = f"[bot] {bot_text} [sep] [user] {user_text}"
inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
inputs = {k: v.to(self.device) for k, v in inputs.items()}
with torch.no_grad():
probs = torch.softmax(self.model(**inputs).logits, dim=-1)[0].cpu()
label_idx = probs.argmax().item()
label = self.LABELS[label_idx]
confidence = probs[label_idx].item()
result = {
"call_llm": label == "agent_response",
"label": label,
"confidence": confidence
}
if label == "backchannel":
import random
result["filler"] = random.choice(self.FILLER_RESPONSES)
return result
# Usage
detector = TurnDetector()
# Case 1: Bot asks, user confirms → Send to LLM
result = detector.should_call_llm("siparis iptal etmek ister misiniz", "evet iptal et")
# {'call_llm': True, 'label': 'agent_response', 'confidence': 0.99}
# Case 2: Bot informs, user acknowledges → Return filler
result = detector.should_call_llm("siparisiz yola cikti", "tamam")
# {'call_llm': False, 'label': 'backchannel', 'confidence': 0.97, 'filler': 'hmm'}
```
---
## Limitations
| Limitation | Details |
|------------|---------|
| **Language** | Turkish only, may struggle with heavy dialects |
| **Context** | Single-turn analysis, no multi-turn memory |
| **Domain** | Trained on customer service, may need fine-tuning for other domains |
| **Edge Cases** | Ambiguous short responses may have lower confidence |
---
## Citation
```bibtex
@misc{turn-detector-v2-2025,
title={turn-detector-v2: Turkish Turn Detection for Voice Assistants},
author={SiriusAI Tech Brain Team},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/hayatiali/turn-detector-v2}},
note={Fine-tuned from dbmdz/bert-base-turkish-uncased}
}
```
---
## Contact
- **Developer**: SiriusAI Tech Brain Team
- **Email**: info@siriusaitech.com
- **Repository**: [GitHub](https://github.com/sirius-tedarik)
---
## Changelog
### v2.0 (Current)
**Semantic Rule Improvements:**
- If bot asks a question → always `agent_response` (731 rows corrected)
- Rhetorical questions ("really?", "is that so?") → remain as `backchannel`
- If user asks a real question ("when?", "how?") → `agent_response`
**Dataset Expansion (+9,082 samples):**
| Category | Added Patterns |
|----------|----------------|
| **Insurance** | premium, policy, coverage, comprehensive, interest, late fees |
| **Telecom** | number porting, data exhausted, line transfer, GB remaining |
| **E-commerce** | shipping cost, free shipping, returns, delivery |
| **Price/Budget** | expensive, budget, too much, will think about it, not suitable |
| **Identity Verification** | national ID, "am I speaking with...", surname, date of birth |
| **Objection/Complaint** | unacceptable, not satisfied, complaint, impossible |
| **Escalation** | manager, director, supervisor |
| **Hold Requests** | one moment, busy right now, not now, later |
**Metrics:** Macro F1: 0.9769, Accuracy: 97.94%
> Note: Metrics appear slightly lower than v1.0, but this is a more accurate model.
> v1.0 had mislabeled data (bot asked question + "yes" = backchannel),
> which the model memorized. v2.0 ensures semantic consistency.
### v1.0
- Initial release
- Dataset: 56,228 samples
- Macro F1: 0.9924, Accuracy: 99.3%
---
**License**: SiriusAI Tech Premium License v1.0
**Commercial Use**: Requires Premium License. Contact: info@siriusaitech.com
|