|
|
--- |
|
|
language: tr |
|
|
license: other |
|
|
license_name: siriusai-premium-v1 |
|
|
license_link: LICENSE |
|
|
tags: |
|
|
- turkish |
|
|
- text-classification |
|
|
- bert |
|
|
- nlp |
|
|
- transformers |
|
|
- turn-detection |
|
|
- voice-assistant |
|
|
- latency-optimization |
|
|
- siriusai |
|
|
- production-ready |
|
|
- enterprise |
|
|
base_model: dbmdz/bert-base-turkish-uncased |
|
|
datasets: |
|
|
- custom |
|
|
metrics: |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
- accuracy |
|
|
- mcc |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
model-index: |
|
|
- name: turn-detector-v2 |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Text Classification |
|
|
metrics: |
|
|
- type: f1 |
|
|
value: 0.9769 |
|
|
name: Macro F1 |
|
|
- type: mcc |
|
|
value: 0.9544 |
|
|
name: MCC |
|
|
- type: accuracy |
|
|
value: 97.94 |
|
|
name: Accuracy |
|
|
--- |
|
|
|
|
|
# turn-detector-v2 - Turkish Turn Detection Model |
|
|
|
|
|
<p align="center"> |
|
|
<a href="https://huggingface.co/hayatiali/turn-detector-v2"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-turn--detector--v2-yellow" alt="Hugging Face"></a> |
|
|
<a href="https://huggingface.co/hayatiali/turn-detector-v2"><img src="https://img.shields.io/badge/Model-Production%20Ready-brightgreen" alt="Production Ready"></a> |
|
|
<img src="https://img.shields.io/badge/Language-Turkish-blue" alt="Turkish"> |
|
|
<img src="https://img.shields.io/badge/Task-Turn%20Detection-orange" alt="Turn Detection"> |
|
|
<img src="https://img.shields.io/badge/F1-97.69%25-success" alt="F1 Score"> |
|
|
</p> |
|
|
|
|
|
This model is designed for detecting turn-taking patterns in Turkish conversations, optimizing voice assistant latency by identifying when user utterances require LLM processing vs. simple acknowledgments. |
|
|
|
|
|
*Developed by SiriusAI Tech Brain Team* |
|
|
|
|
|
--- |
|
|
|
|
|
## Mission |
|
|
|
|
|
> **To optimize voice assistant response latency by detecting when user utterances require LLM processing vs. simple acknowledgments.** |
|
|
|
|
|
The `turn-detector-v2` model analyzes **conversational turn pairs** (bot utterance + user response) and classifies whether the user's response requires LLM processing (**agent_response**) or is just a backchannel acknowledgment that can be handled without LLM (**backchannel**). |
|
|
|
|
|
### Key Benefits |
|
|
|
|
|
| Benefit | Description | |
|
|
|---------|-------------| |
|
|
| **Latency Reduction** | Skip LLM calls for backchannels, saving 500-2000ms per interaction | |
|
|
| **Cost Optimization** | Reduce LLM API costs by filtering unnecessary calls | |
|
|
| **Natural Conversation** | Return immediate filler responses ("hmm", "tamam") for acknowledgments | |
|
|
| **High Accuracy** | 97.94% accuracy ensures reliable real-world performance | |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| **Architecture** | BertForSequenceClassification | |
|
|
| **Base Model** | `dbmdz/bert-base-turkish-uncased` | |
|
|
| **Task** | Binary Text Classification | |
|
|
| **Language** | Turkish (tr) | |
|
|
| **Labels** | 2 (agent_response, backchannel) | |
|
|
| **Model Size** | ~110M parameters | |
|
|
| **Inference Time** | ~10-15ms (GPU) / ~40-50ms (CPU) | |
|
|
|
|
|
--- |
|
|
|
|
|
## Performance Metrics |
|
|
|
|
|
### Final Evaluation Results |
|
|
|
|
|
| Metric | Score | |
|
|
|--------|-------| |
|
|
| **Macro F1** | **0.9769** | |
|
|
| **Micro F1** | **0.9794** | |
|
|
| **MCC** | **0.9544** | |
|
|
| **Accuracy** | **97.94%** | |
|
|
|
|
|
### Per-Class Performance |
|
|
|
|
|
| Category | Accuracy | Samples | |
|
|
|----------|----------|---------| |
|
|
| **agent_response** | 99.57% | 8,553 | |
|
|
| **backchannel** | 94.83% | 4,470 | |
|
|
|
|
|
--- |
|
|
|
|
|
## Semantic Classification Rules |
|
|
|
|
|
### When to Classify as `backchannel` (Skip LLM) |
|
|
|
|
|
| Condition | Examples | |
|
|
|-----------|----------| |
|
|
| Bot gives info + User short acknowledgment | "tamam", "anladim", "ok", "peki" | |
|
|
| Bot gives info + User rhetorical question | "oyle mi?", "harbi mi?", "cidden mi?" | |
|
|
| Bot gives info + User minimal response | "hmm", "hi hi", "evet" | |
|
|
|
|
|
### When to Classify as `agent_response` (Send to LLM) |
|
|
|
|
|
| Condition | Examples | |
|
|
|-----------|----------| |
|
|
| Bot asks question + User gives any answer | "[bot] adi nedir [sep] [user] ahmet" | |
|
|
| Bot gives info + User asks real question | "[bot] faturaniz kesildi [sep] [user] ne zaman?" | |
|
|
| Bot gives info + User makes request | "[bot] kargonuz yolda [sep] [user] adresi degistirmek istiyorum" | |
|
|
| User provides detailed information | "[bot] bilgi verir misiniz [sep] [user] sunu sunu istiyorum cunku..." | |
|
|
|
|
|
### Golden Rule |
|
|
|
|
|
``` |
|
|
If bot asked a question → Always agent_response |
|
|
If bot gave info + User short acknowledgment → backchannel |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Dataset |
|
|
|
|
|
### Dataset Statistics |
|
|
|
|
|
| Split | Samples | |
|
|
|-------|---------| |
|
|
| **Train** | 52,287 | |
|
|
| **Test** | 13,023 | |
|
|
| **Total** | 65,310 | |
|
|
|
|
|
### Label Distribution |
|
|
|
|
|
| Label | Count | Percentage | |
|
|
|-------|-------|------------| |
|
|
| **agent_response** | 35,223 | 67.4% | |
|
|
| **backchannel** | 17,064 | 32.6% | |
|
|
|
|
|
### Domain Coverage |
|
|
|
|
|
- E-commerce (kargo, iade, teslimat) |
|
|
- Banking (hesap, bakiye, kredi) |
|
|
- Telecom (numara tasima, data, hat) |
|
|
- Insurance (prim, police, teminat, kasko) |
|
|
- General Support (sikayet, yonetici, eskalasyon) |
|
|
- Identity Verification (TC, gorusuyorum, soyadi) |
|
|
|
|
|
--- |
|
|
|
|
|
## Label Definitions |
|
|
|
|
|
| Label | ID | Description | |
|
|
|-------|-----|-------------| |
|
|
| **agent_response** | 0 | User response requires LLM processing - questions, requests, confirmations to questions, corrections | |
|
|
| **backchannel** | 1 | Simple acknowledgment - LLM skipped, filler returned (tamam, anladim, ok) | |
|
|
|
|
|
### Input Format |
|
|
|
|
|
``` |
|
|
[bot] <bot utterance> [sep] [user] <user response> |
|
|
``` |
|
|
|
|
|
### Example Classifications |
|
|
|
|
|
**agent_response** (Send to LLM): |
|
|
``` |
|
|
[bot] size nasil yardimci olabilirim [sep] [user] fatura sorgulamak istiyorum |
|
|
[bot] ahmet bey ile mi gorusuyorum [sep] [user] evet benim |
|
|
[bot] islemi onayliyor musunuz [sep] [user] evet onayliyorum |
|
|
[bot] kargonuz yolda [sep] [user] ne zaman gelir |
|
|
[bot] poliçeniz aktif [sep] [user] teminat limitini ogrenebilir miyim |
|
|
``` |
|
|
|
|
|
**backchannel** (Skip LLM, return filler): |
|
|
``` |
|
|
[bot] faturaniz 150 tl gorunuyor [sep] [user] tamam |
|
|
[bot] siparisiniz 3 gun icinde teslim edilecek [sep] [user] anladim |
|
|
[bot] kaydinizi kontrol ediyorum [sep] [user] peki |
|
|
[bot] policeniz yenilendi [sep] [user] tesekkurler |
|
|
[bot] sifreni sms ile gonderdik [sep] [user] ok aldim |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Training |
|
|
|
|
|
### Hyperparameters |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| **Base Model** | `dbmdz/bert-base-turkish-uncased` | |
|
|
| **Max Sequence Length** | 128 tokens | |
|
|
| **Batch Size** | 16 | |
|
|
| **Learning Rate** | 3e-5 | |
|
|
| **Epochs** | 4 | |
|
|
| **Optimizer** | AdamW | |
|
|
| **Weight Decay** | 0.01 | |
|
|
| **Loss Function** | CrossEntropyLoss | |
|
|
| **Hardware** | Apple Silicon (MPS) | |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install transformers torch |
|
|
``` |
|
|
|
|
|
### Quick Start |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
model_name = "hayatiali/turn-detector-v2" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
model.eval() |
|
|
|
|
|
LABELS = ["agent_response", "backchannel"] |
|
|
|
|
|
def predict(text): |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
probs = torch.softmax(outputs.logits, dim=-1)[0] |
|
|
|
|
|
scores = {label: float(prob) for label, prob in zip(LABELS, probs)} |
|
|
return {"label": max(scores, key=scores.get), "confidence": max(scores.values())} |
|
|
|
|
|
# Bot asks question → agent_response |
|
|
print(predict("[bot] ahmet bey ile mi gorusuyorum [sep] [user] evet benim")) |
|
|
# Output: {'label': 'agent_response', 'confidence': 0.99} |
|
|
|
|
|
# Bot gives info + User acknowledges → backchannel |
|
|
print(predict("[bot] faturaniz 150 tl gorunuyor [sep] [user] tamam")) |
|
|
# Output: {'label': 'backchannel', 'confidence': 0.98} |
|
|
``` |
|
|
|
|
|
### Production Integration |
|
|
|
|
|
```python |
|
|
class TurnDetector: |
|
|
"""Production-ready turn detection for voice assistants.""" |
|
|
|
|
|
LABELS = ["agent_response", "backchannel"] |
|
|
FILLER_RESPONSES = ["hmm", "evet", "tamam", "anlıyorum"] |
|
|
|
|
|
def __init__(self, model_path="hayatiali/turn-detector-v2"): |
|
|
self.tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
|
self.model = AutoModelForSequenceClassification.from_pretrained(model_path) |
|
|
self.device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
self.model.to(self.device).eval() |
|
|
|
|
|
def should_call_llm(self, bot_text: str, user_text: str) -> dict: |
|
|
""" |
|
|
Determines if user response should go to LLM. |
|
|
|
|
|
Returns: |
|
|
dict with 'call_llm' (bool), 'label', 'confidence', 'filler' (if backchannel) |
|
|
""" |
|
|
text = f"[bot] {bot_text} [sep] [user] {user_text}" |
|
|
inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128) |
|
|
inputs = {k: v.to(self.device) for k, v in inputs.items()} |
|
|
|
|
|
with torch.no_grad(): |
|
|
probs = torch.softmax(self.model(**inputs).logits, dim=-1)[0].cpu() |
|
|
|
|
|
label_idx = probs.argmax().item() |
|
|
label = self.LABELS[label_idx] |
|
|
confidence = probs[label_idx].item() |
|
|
|
|
|
result = { |
|
|
"call_llm": label == "agent_response", |
|
|
"label": label, |
|
|
"confidence": confidence |
|
|
} |
|
|
|
|
|
if label == "backchannel": |
|
|
import random |
|
|
result["filler"] = random.choice(self.FILLER_RESPONSES) |
|
|
|
|
|
return result |
|
|
|
|
|
# Usage |
|
|
detector = TurnDetector() |
|
|
|
|
|
# Case 1: Bot asks, user confirms → Send to LLM |
|
|
result = detector.should_call_llm("siparis iptal etmek ister misiniz", "evet iptal et") |
|
|
# {'call_llm': True, 'label': 'agent_response', 'confidence': 0.99} |
|
|
|
|
|
# Case 2: Bot informs, user acknowledges → Return filler |
|
|
result = detector.should_call_llm("siparisiz yola cikti", "tamam") |
|
|
# {'call_llm': False, 'label': 'backchannel', 'confidence': 0.97, 'filler': 'hmm'} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations |
|
|
|
|
|
| Limitation | Details | |
|
|
|------------|---------| |
|
|
| **Language** | Turkish only, may struggle with heavy dialects | |
|
|
| **Context** | Single-turn analysis, no multi-turn memory | |
|
|
| **Domain** | Trained on customer service, may need fine-tuning for other domains | |
|
|
| **Edge Cases** | Ambiguous short responses may have lower confidence | |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{turn-detector-v2-2025, |
|
|
title={turn-detector-v2: Turkish Turn Detection for Voice Assistants}, |
|
|
author={SiriusAI Tech Brain Team}, |
|
|
year={2025}, |
|
|
publisher={Hugging Face}, |
|
|
howpublished={\url{https://huggingface.co/hayatiali/turn-detector-v2}}, |
|
|
note={Fine-tuned from dbmdz/bert-base-turkish-uncased} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Contact |
|
|
|
|
|
- **Developer**: SiriusAI Tech Brain Team |
|
|
- **Email**: info@siriusaitech.com |
|
|
- **Repository**: [GitHub](https://github.com/sirius-tedarik) |
|
|
|
|
|
--- |
|
|
|
|
|
## Changelog |
|
|
|
|
|
### v2.0 (Current) |
|
|
|
|
|
**Semantic Rule Improvements:** |
|
|
- If bot asks a question → always `agent_response` (731 rows corrected) |
|
|
- Rhetorical questions ("really?", "is that so?") → remain as `backchannel` |
|
|
- If user asks a real question ("when?", "how?") → `agent_response` |
|
|
|
|
|
**Dataset Expansion (+9,082 samples):** |
|
|
|
|
|
| Category | Added Patterns | |
|
|
|----------|----------------| |
|
|
| **Insurance** | premium, policy, coverage, comprehensive, interest, late fees | |
|
|
| **Telecom** | number porting, data exhausted, line transfer, GB remaining | |
|
|
| **E-commerce** | shipping cost, free shipping, returns, delivery | |
|
|
| **Price/Budget** | expensive, budget, too much, will think about it, not suitable | |
|
|
| **Identity Verification** | national ID, "am I speaking with...", surname, date of birth | |
|
|
| **Objection/Complaint** | unacceptable, not satisfied, complaint, impossible | |
|
|
| **Escalation** | manager, director, supervisor | |
|
|
| **Hold Requests** | one moment, busy right now, not now, later | |
|
|
|
|
|
**Metrics:** Macro F1: 0.9769, Accuracy: 97.94% |
|
|
|
|
|
> Note: Metrics appear slightly lower than v1.0, but this is a more accurate model. |
|
|
> v1.0 had mislabeled data (bot asked question + "yes" = backchannel), |
|
|
> which the model memorized. v2.0 ensures semantic consistency. |
|
|
|
|
|
### v1.0 |
|
|
- Initial release |
|
|
- Dataset: 56,228 samples |
|
|
- Macro F1: 0.9924, Accuracy: 99.3% |
|
|
|
|
|
--- |
|
|
|
|
|
**License**: SiriusAI Tech Premium License v1.0 |
|
|
|
|
|
**Commercial Use**: Requires Premium License. Contact: info@siriusaitech.com |
|
|
|