hayatiali commited on
Commit
29ec639
·
verified ·
1 Parent(s): df72c1f

Upload model via Fine-tune Assistant

Browse files
README.md ADDED
@@ -0,0 +1,362 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: tr
3
+ license: other
4
+ license_name: siriusai-premium-v1
5
+ license_link: LICENSE
6
+ tags:
7
+ - turkish
8
+ - text-classification
9
+ - bert
10
+ - nlp
11
+ - transformers
12
+ - siriusai
13
+ - production-ready
14
+ - enterprise
15
+ base_model: dbmdz/bert-base-turkish-uncased
16
+ datasets:
17
+ - custom
18
+ metrics:
19
+ - f1
20
+ - precision
21
+ - recall
22
+ - accuracy
23
+ - mcc
24
+ library_name: transformers
25
+ pipeline_tag: text-classification
26
+ model-index:
27
+ - name: turn-detector
28
+ results:
29
+ - task:
30
+ type: text-classification
31
+ name: Text Classification
32
+ metrics:
33
+ - type: f1
34
+ value: 0.9924276856095726
35
+ name: Macro F1
36
+ - type: mcc
37
+ value: 0.9848560799888242
38
+ ---
39
+
40
+ # turn-detector - Turkish Text Classification Model
41
+
42
+ <p align="center">
43
+ <a href="https://huggingface.co/hayatiali/turn-detector"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-turn--detector-yellow" alt="Hugging Face"></a>
44
+ <a href="https://huggingface.co/hayatiali/turn-detector"><img src="https://img.shields.io/badge/Model-Production%20Ready-brightgreen" alt="Production Ready"></a>
45
+ <img src="https://img.shields.io/badge/Language-Turkish-blue" alt="Turkish">
46
+ <img src="https://img.shields.io/badge/Task-Text%20Classification-orange" alt="Text Classification">
47
+ </p>
48
+
49
+ This model is designed for classifying Turkish text into different turn-taking categories in a conversation.
50
+
51
+ *Developed by SiriusAI Tech Brain Team*
52
+
53
+ ---
54
+
55
+ ## Mission
56
+
57
+ > **To enhance conversational AI by accurately detecting turn-taking dynamics in Turkish dialogues, enabling more natural and engaging interactions.**
58
+
59
+ The `turn-detector` model is capable of classifying responses in Turkish conversations into two distinct categories: **agent_response** and **backchannel**. This functionality is crucial for developing advanced voice assistants and dialogue systems that better understand human interactions. By leveraging the power of the `BertForSequenceClassification` architecture, the model achieves remarkable accuracy and reliability.
60
+
61
+ ### Why This Model Matters
62
+
63
+ - **High Accuracy**: With an impressive accuracy of over 99%, this model ensures reliable classifications in real-world applications.
64
+ - **Enterprise-Grade Performance**: Designed for production use, it meets the stringent requirements of enterprise clients.
65
+ - **NLP Expertise**: Developed using state-of-the-art natural language processing techniques, it provides a competitive edge in understanding Turkish conversations.
66
+ - **Scalable Solution**: Easily integratable into existing systems, allowing for seamless deployment in various applications.
67
+ - **Robust Training**: Trained on a substantial dataset, ensuring its effectiveness across diverse conversational contexts.
68
+
69
+ ---
70
+
71
+ ## Model Overview
72
+
73
+ | Property | Value |
74
+ |----------|-------|
75
+ | **Architecture** | BertForSequenceClassification |
76
+ | **Base Model** | `dbmdz/bert-base-turkish-uncased` |
77
+ | **Task** | Text Classification |
78
+ | **Language** | Turkish (tr) |
79
+ | **Categories** | 2 labels |
80
+ | **Model Size** | ~110M parameters |
81
+ | **Inference Time** | ~10-15ms (GPU) / ~40-50ms (CPU) |
82
+
83
+ ---
84
+
85
+ ## Performance Metrics
86
+
87
+ ### Final Evaluation Results
88
+
89
+ | Metric | Score | Description |
90
+ |--------|-------|-------------|
91
+ | **Macro F1** | **0.9924** | Harmonic mean of precision and recall |
92
+ | **MCC** | **0.9849** | Matthews Correlation Coefficient |
93
+ | **Accuracy** | **99.3242%** | Ratio of correctly predicted instances to total instances |
94
+
95
+ ### Per-Class Performance
96
+
97
+ | Category | Accuracy | Correct | Total |
98
+ |----------|----------|---------|-------|
99
+ | **agent_response** | 99.5% | 7,429 | 7,464 |
100
+ | **backchannel** | 98.9% | 3,741 | 3,782 |
101
+
102
+ ---
103
+
104
+ ## Dataset
105
+
106
+ ### Dataset Statistics
107
+
108
+ | Split | Samples | Purpose |
109
+ |-------|---------|---------|
110
+ | **Train** | 44,982 | Model training |
111
+ | **Test** | 11,246 | Model evaluation |
112
+ | **Total** | 56,228 | Complete dataset |
113
+
114
+ ### Category Distribution
115
+
116
+ | Category | Samples | Percentage | Description |
117
+ |----------|---------|------------|-------------|
118
+ | **turn_action** | 56,228 | 100.0% | turn_action category |
119
+
120
+ ### Subcategory Breakdown
121
+
122
+ | Category | Subcategories |
123
+ |----------|---------------|
124
+ | **turn_action** | agent_response, backchannel |
125
+
126
+ ---
127
+
128
+ ## Label Definitions
129
+
130
+ | Label | ID | Description | Turkish Examples |
131
+ |-------|-----|-------------|------------------|
132
+ | **agent_response** | 0 | Represents a direct response from the agent in a conversation | "Merhaba, size nasıl yardımcı olabilirim?" |
133
+ | **backchannel** | 1 | Indicates acknowledgment or encouragement from the listener | "Evet", "Anladım" |
134
+
135
+ ### Important: Category Boundaries
136
+
137
+ The distinction between **agent_response** and **backchannel** is critical. An **agent_response** represents a substantive reply to a query, while **backchannel** responses are brief acknowledgments that do not provide new information.
138
+
139
+ ---
140
+
141
+ ## Training Procedure
142
+
143
+ ### Hyperparameters
144
+
145
+ | Parameter | Value |
146
+ |-----------|-------|
147
+ | **Base Model** | `dbmdz/bert-base-turkish-uncased` |
148
+ | **Max Sequence Length** | 128 tokens |
149
+ | **Batch Size** | 16 |
150
+ | **Learning Rate** | 2e-5 |
151
+ | **Epochs** | 3 |
152
+ | **Optimizer** | AdamW |
153
+ | **Weight Decay** | 0.01 |
154
+ | **Loss Function** | CrossEntropyLoss / Focal Loss |
155
+ | **Problem Type** | Single-label / Multi-label Classification |
156
+
157
+ ### Training Environment
158
+
159
+ | Resource | Specification |
160
+ |----------|---------------|
161
+ | **Hardware** | Apple Silicon (MPS) / CUDA GPU |
162
+ | **Framework** | PyTorch + Transformers |
163
+ | **Training Time** | Varies based on dataset size |
164
+
165
+ ---
166
+
167
+ ## Usage
168
+
169
+ ### Installation
170
+
171
+ ```bash
172
+ pip install transformers torch
173
+ ```
174
+
175
+ ### Quick Start
176
+
177
+ ```python
178
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
179
+ import torch
180
+
181
+ model_name = "hayatiali/turn-detector"
182
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
183
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
184
+ model.eval()
185
+
186
+ LABELS = ["agent_response", "backchannel"]
187
+
188
+ def predict(text):
189
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
190
+ with torch.no_grad():
191
+ outputs = model(**inputs)
192
+ probs = torch.softmax(outputs.logits, dim=-1)[0]
193
+
194
+ scores = {label: float(prob) for label, prob in zip(LABELS, probs)}
195
+ primary = max(scores, key=scores.get)
196
+ return {"category": primary, "confidence": scores[primary], "all_scores": scores}
197
+
198
+ # Examples
199
+ print(predict("Merhaba, nasılsınız?"))
200
+ ```
201
+
202
+ ### Production Class
203
+
204
+ ```python
205
+ class TurnDetectorClassifier:
206
+ LABELS = ["agent_response", "backchannel"]
207
+
208
+ def __init__(self, model_path="hayatiali/turn-detector"):
209
+ self.tokenizer = AutoTokenizer.from_pretrained(model_path)
210
+ self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
211
+ self.device = "cuda" if torch.cuda.is_available() else "cpu"
212
+ self.model.to(self.device).eval()
213
+
214
+ def predict(self, text: str) -> dict:
215
+ inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
216
+ inputs = {k: v.to(self.device) for k, v in inputs.items()}
217
+
218
+ with torch.no_grad():
219
+ logits = self.model(**inputs).logits
220
+ probs = torch.softmax(logits, dim=-1)[0].cpu().numpy()
221
+
222
+ scores = dict(zip(self.LABELS, probs))
223
+ return {"category": max(scores, key=scores.get), "confidence": max(scores.values()), "scores": scores}
224
+ ```
225
+
226
+ ### Batch Inference
227
+
228
+ ```python
229
+ def predict_batch(texts: list, batch_size: int = 32) -> list:
230
+ results = []
231
+ for i in range(0, len(texts), batch_size):
232
+ batch = texts[i:i + batch_size]
233
+ inputs = tokenizer(batch, return_tensors="pt", truncation=True, max_length=128, padding=True)
234
+ inputs = {k: v.to(device) for k, v in inputs.items()}
235
+
236
+ with torch.no_grad():
237
+ probs = torch.softmax(model(**inputs).logits, dim=-1).cpu().numpy()
238
+
239
+ for prob in probs:
240
+ scores = dict(zip(LABELS, prob))
241
+ results.append(scores)
242
+ return results
243
+ ```
244
+
245
+ ---
246
+
247
+ ## Limitations & Known Issues
248
+
249
+ ### ⚠️ Model Limitations
250
+
251
+ | Limitation | Details | Impact |
252
+ |------------|---------|--------|
253
+ | **Dataset Bias** | Model performance may vary on conversational data outside the training set. | Could lead to inaccuracies in specific domains. |
254
+ | **Language Nuance** | Captures standard Turkish but may struggle with dialects or highly informal speech. | Reduced accuracy in non-standard language use. |
255
+ | **Context Understanding** | Limited ability to understand context beyond single-turn interactions. | May misclassify responses that rely on previous context. |
256
+
257
+ ### ⚠️ Production Deployment Considerations
258
+
259
+ | Consideration | Details | Recommendation |
260
+ |---------------|---------|----------------|
261
+ | **Model Size** | Large model size may impact deployment on limited-resource environments. | Consider model distillation or quantization for constrained environments. |
262
+
263
+ ### Not Suitable For
264
+
265
+ - Real-time critical applications without human oversight.
266
+ - Scenarios requiring high levels of contextual understanding across multiple turns.
267
+ - Use cases in non-Turkish languages without adaptation.
268
+
269
+ ---
270
+
271
+ ## Ethical Considerations
272
+
273
+ ### Intended Use
274
+
275
+ - Conversational AI applications.
276
+ - Voice assistants and chatbots.
277
+ - Customer service automation.
278
+
279
+ ### Risks
280
+
281
+ - **Bias in Training Data**: If the training data is biased, the model may perpetuate those biases in its predictions.
282
+ - **Misuse of Technology**: Potential for the model to be used in contexts that require ethical considerations, such as surveillance or deceptive practices.
283
+
284
+ ### Recommendations
285
+
286
+ 1. **Human Oversight**: Always implement human oversight in applications that utilize the model.
287
+ 2. **Monitoring**: Continuously monitor model outputs for unexpected or biased behavior.
288
+ 3. **Updates**: Regularly update the model with new data to improve accuracy and mitigate biases.
289
+
290
+ ---
291
+
292
+ ## Technical Specifications
293
+
294
+ ### Model Architecture
295
+
296
+ ```
297
+ BertForSequenceClassification(
298
+ (bert): BertModel(
299
+ (embeddings): BertEmbeddings
300
+ (encoder): BertEncoder (12 layers)
301
+ (pooler): BertPooler
302
+ )
303
+ (dropout): Dropout(p=0.1)
304
+ (classifier): Linear(in_features=768, out_features=2)
305
+ )
306
+
307
+ Total Parameters: ~110M
308
+ ```
309
+
310
+ ### Input/Output
311
+
312
+ - **Input**: Turkish text (max 128 tokens)
313
+ - **Output**: 2-dimensional probability vector
314
+ - **Tokenizer**: BERTurk WordPiece (32k vocab)
315
+
316
+ ---
317
+
318
+ ## Citation
319
+
320
+ ```bibtex
321
+ @misc{turn-detector-2025,
322
+ title={turn-detector - Turkish Text Classification Model},
323
+ author={SiriusAI Tech Brain Team},
324
+ year={2025},
325
+ publisher={Hugging Face},
326
+ howpublished={\url{https://huggingface.co/hayatiali/turn-detector}},
327
+ note={Fine-tuned from dbmdz/bert-base-turkish-uncased}
328
+ }
329
+ ```
330
+
331
+ ---
332
+
333
+ ## Model Card Authors
334
+
335
+ **SiriusAI Tech Brain Team**
336
+
337
+ ## Contact
338
+
339
+ - **Email**: info@siriusaitech.com
340
+ - **Repository**: [GitHub](https://github.com/sirius-tedarik)
341
+
342
+ ---
343
+
344
+ ## Changelog
345
+
346
+ ### v1.0 (Current)
347
+ - Initial release
348
+ - 2-category text classification
349
+ - Macro F1: 0.9924, MCC: 0.9849
350
+
351
+ ---
352
+
353
+ **License**: SiriusAI Tech Premium License v1.0
354
+
355
+ **Commercial Use**: Requires Premium License. Contact: info@siriusaitech.com
356
+
357
+ **Free Use Allowed For**:
358
+ - Academic research and education
359
+ - Non-profit organizations (with approval)
360
+ - Evaluation (30 days)
361
+
362
+ **Disclaimer**: This model is designed for text classification applications. Always implement with appropriate safeguards and human oversight. Model predictions should inform decisions, not replace human judgment.
benchmark/adversarial_samples.csv ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ text,expected_label,predicted_label,difficulty,confidence,is_correct
2
+ "Ömer, nasıl yardımcı olabilirim?",agent_response,agent_response,baseline,0.8812354207038879,True
3
+ "Merhaba, hangi konuda yardım edebilirim?",agent_response,agent_response,baseline,0.9911393523216248,True
4
+ "Tabii ki, size bununla ilgili bilgi verebilirim.",agent_response,agent_response,baseline,0.5156869292259216,True
5
+ "Elbette, bu konuda size destek olacağım.",agent_response,backchannel,baseline,0.5563545823097229,False
6
+ "Anladım, hemen kontrol ediyorum.",agent_response,backchannel,baseline,0.5459677577018738,False
7
+ "Lütfen bekleyin, birazdan yanıt vereceğim.",agent_response,agent_response,baseline,0.8637070655822754,True
8
+ Bu konuda yardımcı olmaktan memnuniyet duyarım.,agent_response,agent_response,baseline,0.6278860569000244,True
9
+ Hemen sizin için araştırıyorum.,agent_response,agent_response,baseline,0.7357267737388611,True
10
+ "Endişelenmeyin, bu konuyu halledeceğiz.",agent_response,agent_response,baseline,0.6491527557373047,True
11
+ "Herhangi başka bir sorunuz varsa, sormaktan çekinmeyin.",agent_response,agent_response,baseline,0.9041098952293396,True
12
+ totes agree lol,agent_response,backchannel,length_noise,0.9879427552223206,False
13
+ yup yup yup yup yup,agent_response,backchannel,length_noise,0.988431453704834,False
14
+ "OMG cant believe u did that, like seriously, i mean come on, its just too much, you know what i mean? cuz if you dont then idk what to say, like seriously",agent_response,agent_response,length_noise,0.909318745136261,True
15
+ nah bro,agent_response,backchannel,length_noise,0.9873980283737183,False
16
+ yasss that's wassup,agent_response,backchannel,length_noise,0.974721372127533,False
17
+ okay okay okay i get it already no need to repeat urself over and over again like i'm not deaf or whatever,agent_response,agent_response,length_noise,0.9450967907905579,True
18
+ omg thts crazee,agent_response,backchannel,length_noise,0.9885514974594116,False
19
+ u r kidding right?,agent_response,backchannel,length_noise,0.9817968606948853,False
20
+ "wow just wow, i mean, wow! i never thought that this would happen, like ever, not in a million years, and yet here we are, unbelievable, just totally unbelievable, you feel me?",agent_response,agent_response,length_noise,0.8823995590209961,True
21
+ hah lol whatevs,agent_response,backchannel,length_noise,0.9895368814468384,False
22
+ "Ah, anlıyorum. Devam edebilir misiniz?",agent_response,agent_response,semantic_overlap,0.8250168561935425,True
23
+ "Hmm, bunu biraz daha açabilir misiniz?",agent_response,agent_response,semantic_overlap,0.745111882686615,True
24
+ "Evet, bu gerçekten ilginç. Daha fazla bilgi verebilir misiniz?",agent_response,agent_response,semantic_overlap,0.9849535226821899,True
25
+ Bu konuda düşündüğünüz başka bir şey var mı?,agent_response,agent_response,semantic_overlap,0.9519035220146179,True
26
+ "Hımm, pekala. Başka bir açıdan bakacak olursak?",agent_response,backchannel,semantic_overlap,0.903683066368103,False
27
+ "Evet, kesinlikle. Peki başka hangi yönlerini ele alabiliriz?",agent_response,agent_response,semantic_overlap,0.9927364587783813,True
28
+ "Tamam, peki buna ek olarak ne söyleyebilirsiniz?",agent_response,agent_response,semantic_overlap,0.9534065127372742,True
29
+ "Anladım, devam etmek ister misiniz?",agent_response,agent_response,semantic_overlap,0.974102795124054,True
30
+ "Evet, peki başka bir detaya dikkat çekmek ister misiniz?",agent_response,agent_response,semantic_overlap,0.9879535436630249,True
31
+ "Hmm, çok iyi bir nokta. Bunu biraz daha açar mısınız?",agent_response,agent_response,semantic_overlap,0.9757851362228394,True
32
+ "Oh great, another software update that will surely make everything run faster, just like last time.",agent_response,agent_response,edge_cases,0.895721971988678,True
33
+ "I'm sure the server downtime at exactly 5 PM on a Friday was purely coincidental, and not at all inconvenient.",agent_response,agent_response,edge_cases,0.849263072013855,True
34
+ "Yeah, because deleting the database with a single command is exactly what everyone wants, right?",agent_response,agent_response,edge_cases,0.7744247317314148,True
35
+ "I just love it when my AI assistant corrects me even when I'm right, it's like having a personal grammar teacher.",agent_response,agent_response,edge_cases,0.5396984815597534,True
36
+ "No, I absolutely don't need any more disk space. Who needs to store files anyway?",agent_response,agent_response,edge_cases,0.9811112284660339,True
37
+ "Sure, let's implement the new feature without any testing. What could possibly go wrong?",agent_response,agent_response,edge_cases,0.9612233638763428,True
38
+ "Oh, another meeting about meetings? This is exactly why I got into tech.",agent_response,agent_response,edge_cases,0.9544288516044617,True
39
+ I'm really looking forward to debugging this code at 2 AM again. It's the highlight of my week.,agent_response,agent_response,edge_cases,0.8809834122657776,True
40
+ The best part of working with AI is when it confidently gives you the wrong answer.,agent_response,agent_response,edge_cases,0.8558328151702881,True
41
+ "Of course, let’s deploy the untested code on a Friday evening, I have nothing better to do.",agent_response,agent_response,edge_cases,0.7736720442771912,True
42
+ "Evet, seni anlıyorum.",backchannel,backchannel,baseline,0.8567759990692139,True
43
+ "Hmm, ilginç.",backchannel,backchannel,baseline,0.985055685043335,True
44
+ "Evet, devam et.",backchannel,backchannel,baseline,0.8956389427185059,True
45
+ Gerçekten mi?,backchannel,backchannel,baseline,0.9868144989013672,True
46
+ "Tamam, bu mantıklı.",backchannel,backchannel,baseline,0.7614496946334839,True
47
+ Anladım.,backchannel,backchannel,baseline,0.9884626269340515,True
48
+ "Evet, bu doğru.",backchannel,backchannel,baseline,0.8082573413848877,True
49
+ "Ah, şimdi anlıyorum.",backchannel,backchannel,baseline,0.9578026533126831,True
50
+ Bu ilginç bir nokta.,backchannel,backchannel,baseline,0.6748051643371582,True
51
+ "Evet, buna katılıyorum.",backchannel,backchannel,baseline,0.8088875412940979,True
52
+ yaaaa broooo,backchannel,backchannel,length_noise,0.9909811615943909,True
53
+ huh? r u srz??,backchannel,backchannel,length_noise,0.9855925440788269,True
54
+ OMG this is like the most amazing thing ever I mean I can't even begin to explain how incredible this whole situation is because it's just that awesome you know what I mean like seriously wow just wow ok???,backchannel,agent_response,length_noise,0.7402034997940063,False
55
+ idk wat u mean,backchannel,backchannel,length_noise,0.9897193908691406,True
56
+ sure sure sure sure sure,backchannel,backchannel,length_noise,0.9763302206993103,True
57
+ omg totally 100% agree with you on that one no doubt about it in fact I was just thinking the same thing the other day and it's crazy how we're like on the same wavelength all the time isn't it?,backchannel,agent_response,length_noise,0.9482101798057556,False
58
+ no wayyyy,backchannel,backchannel,length_noise,0.991447925567627,True
59
+ "heyyy, u ther?",backchannel,backchannel,length_noise,0.990699827671051,True
60
+ wow cant believe it happened like that i mean who would have thought that everything would turn out this way after all the planning we did it just goes to show that sometimes things have a way of working out on their own despite all the odds and challenges we faced right from the start,backchannel,agent_response,length_noise,0.971515953540802,False
61
+ kk thx bye,backchannel,backchannel,length_noise,0.99072265625,True
62
+ "Hmm, ilginç bir nokta.",backchannel,backchannel,semantic_overlap,0.9318225979804993,True
63
+ "Anladım, peki ya sonra?",backchannel,backchannel,semantic_overlap,0.9160572290420532,True
64
+ "Hmm, o konuda biraz daha bilgi verir misin?",backchannel,agent_response,semantic_overlap,0.7073332667350769,False
65
+ Gerçekten mi? Daha fazla duymak isterim.,backchannel,backchannel,semantic_overlap,0.7160800099372864,True
66
+ "Bu mantıklı, başka neler oldu?",backchannel,agent_response,semantic_overlap,0.812181293964386,False
67
+ "Hmm, bunu daha önce duymamıştım.",backchannel,backchannel,semantic_overlap,0.8978190422058105,True
68
+ "Bir dakika, bunu doğru mu anlıyorum?",backchannel,agent_response,semantic_overlap,0.7696111798286438,False
69
+ "Peki, sonra ne yaptılar?",backchannel,backchannel,semantic_overlap,0.6477120518684387,True
70
+ Gerçekten mi? Bu beni düşündürdü.,backchannel,backchannel,semantic_overlap,0.9161955714225769,True
71
+ "İlginç, devam et lütfen.",backchannel,backchannel,semantic_overlap,0.7439655661582947,True
72
+ "Evet evet, tabii ki de tebrik ederim, dünya harikası bir iş çıkardın (!)",backchannel,agent_response,edge_cases,0.877510666847229,False
73
+ "Çok güzel, bu kadar net bir çözüm bulduğunu(!) hiç düşünmemiştim doğrusu.",backchannel,agent_response,edge_cases,0.9826798439025879,False
74
+ "Ah, tabii ki! Çünkü herkes daima müşteri hizmetlerinin ne kadar hızlı olduğunu söyler (!)",backchannel,agent_response,edge_cases,0.5608082413673401,False
75
+ "Eğer bu kadar 'yaratıcı' bir fikir daha duyar mıyım diye düşünüyordum, teşekkürler!",backchannel,agent_response,edge_cases,0.9653686881065369,False
76
+ Bir işin en iyi nasıl yapılmaması gerektiğini görmek için harika (!) bir örnekti.,backchannel,agent_response,edge_cases,0.9630967378616333,False
77
+ "Evet, kesinlikle bugünkü toplantıda hiçbir şey anlaşılmadı diyemem.",backchannel,agent_response,edge_cases,0.8947334289550781,False
78
+ "Harika, seninki gibi bir çözüm sayesinde sorunlarımız iki katına çıkacak (!)",backchannel,agent_response,edge_cases,0.8286226987838745,False
79
+ "Tabii ki de, Türk çayı yurt dışında sudan bile ucuzdur (!).",backchannel,backchannel,edge_cases,0.8883209228515625,True
80
+ "Bu kadar ‘detaylı’ bir analiz için üç cümle yeterli oldu, harikasın!",backchannel,agent_response,edge_cases,0.610821545124054,False
81
+ "Elbette, herkesin sabırsızlıkla beklediği o 'harika' PowerPoint sunumunu bir daha görelim.",backchannel,agent_response,edge_cases,0.9590893387794495,False
benchmark/benchmark_results.json ADDED
@@ -0,0 +1,873 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "turn-detector",
3
+ "generated_at": "2025-12-14T21:37:22.477273",
4
+ "difficulty_results": {
5
+ "baseline": {
6
+ "total": 20,
7
+ "correct": 18,
8
+ "accuracy": 0.9
9
+ },
10
+ "length_noise": {
11
+ "total": 20,
12
+ "correct": 10,
13
+ "accuracy": 0.5
14
+ },
15
+ "semantic_overlap": {
16
+ "total": 20,
17
+ "correct": 16,
18
+ "accuracy": 0.8
19
+ },
20
+ "edge_cases": {
21
+ "total": 20,
22
+ "correct": 11,
23
+ "accuracy": 0.55
24
+ }
25
+ },
26
+ "overall_accuracy": 0.6875,
27
+ "total_samples": 80,
28
+ "correct_samples": 55,
29
+ "samples": [
30
+ {
31
+ "text": "Ömer, nasıl yardımcı olabilirim?",
32
+ "expected_label": "agent_response",
33
+ "difficulty": "baseline",
34
+ "predicted_label": "agent_response",
35
+ "confidence": 0.8812354207038879,
36
+ "is_correct": true
37
+ },
38
+ {
39
+ "text": "Merhaba, hangi konuda yardım edebilirim?",
40
+ "expected_label": "agent_response",
41
+ "difficulty": "baseline",
42
+ "predicted_label": "agent_response",
43
+ "confidence": 0.9911393523216248,
44
+ "is_correct": true
45
+ },
46
+ {
47
+ "text": "Tabii ki, size bununla ilgili bilgi verebilirim.",
48
+ "expected_label": "agent_response",
49
+ "difficulty": "baseline",
50
+ "predicted_label": "agent_response",
51
+ "confidence": 0.5156869292259216,
52
+ "is_correct": true
53
+ },
54
+ {
55
+ "text": "Elbette, bu konuda size destek olacağım.",
56
+ "expected_label": "agent_response",
57
+ "difficulty": "baseline",
58
+ "predicted_label": "backchannel",
59
+ "confidence": 0.5563545823097229,
60
+ "is_correct": false
61
+ },
62
+ {
63
+ "text": "Anladım, hemen kontrol ediyorum.",
64
+ "expected_label": "agent_response",
65
+ "difficulty": "baseline",
66
+ "predicted_label": "backchannel",
67
+ "confidence": 0.5459677577018738,
68
+ "is_correct": false
69
+ },
70
+ {
71
+ "text": "Lütfen bekleyin, birazdan yanıt vereceğim.",
72
+ "expected_label": "agent_response",
73
+ "difficulty": "baseline",
74
+ "predicted_label": "agent_response",
75
+ "confidence": 0.8637070655822754,
76
+ "is_correct": true
77
+ },
78
+ {
79
+ "text": "Bu konuda yardımcı olmaktan memnuniyet duyarım.",
80
+ "expected_label": "agent_response",
81
+ "difficulty": "baseline",
82
+ "predicted_label": "agent_response",
83
+ "confidence": 0.6278860569000244,
84
+ "is_correct": true
85
+ },
86
+ {
87
+ "text": "Hemen sizin için araştırıyorum.",
88
+ "expected_label": "agent_response",
89
+ "difficulty": "baseline",
90
+ "predicted_label": "agent_response",
91
+ "confidence": 0.7357267737388611,
92
+ "is_correct": true
93
+ },
94
+ {
95
+ "text": "Endişelenmeyin, bu konuyu halledeceğiz.",
96
+ "expected_label": "agent_response",
97
+ "difficulty": "baseline",
98
+ "predicted_label": "agent_response",
99
+ "confidence": 0.6491527557373047,
100
+ "is_correct": true
101
+ },
102
+ {
103
+ "text": "Herhangi başka bir sorunuz varsa, sormaktan çekinmeyin.",
104
+ "expected_label": "agent_response",
105
+ "difficulty": "baseline",
106
+ "predicted_label": "agent_response",
107
+ "confidence": 0.9041098952293396,
108
+ "is_correct": true
109
+ },
110
+ {
111
+ "text": "totes agree lol",
112
+ "expected_label": "agent_response",
113
+ "difficulty": "length_noise",
114
+ "predicted_label": "backchannel",
115
+ "confidence": 0.9879427552223206,
116
+ "is_correct": false
117
+ },
118
+ {
119
+ "text": "yup yup yup yup yup",
120
+ "expected_label": "agent_response",
121
+ "difficulty": "length_noise",
122
+ "predicted_label": "backchannel",
123
+ "confidence": 0.988431453704834,
124
+ "is_correct": false
125
+ },
126
+ {
127
+ "text": "OMG cant believe u did that, like seriously, i mean come on, its just too much, you know what i mean? cuz if you dont then idk what to say, like seriously",
128
+ "expected_label": "agent_response",
129
+ "difficulty": "length_noise",
130
+ "predicted_label": "agent_response",
131
+ "confidence": 0.909318745136261,
132
+ "is_correct": true
133
+ },
134
+ {
135
+ "text": "nah bro",
136
+ "expected_label": "agent_response",
137
+ "difficulty": "length_noise",
138
+ "predicted_label": "backchannel",
139
+ "confidence": 0.9873980283737183,
140
+ "is_correct": false
141
+ },
142
+ {
143
+ "text": "yasss that's wassup",
144
+ "expected_label": "agent_response",
145
+ "difficulty": "length_noise",
146
+ "predicted_label": "backchannel",
147
+ "confidence": 0.974721372127533,
148
+ "is_correct": false
149
+ },
150
+ {
151
+ "text": "okay okay okay i get it already no need to repeat urself over and over again like i'm not deaf or whatever",
152
+ "expected_label": "agent_response",
153
+ "difficulty": "length_noise",
154
+ "predicted_label": "agent_response",
155
+ "confidence": 0.9450967907905579,
156
+ "is_correct": true
157
+ },
158
+ {
159
+ "text": "omg thts crazee",
160
+ "expected_label": "agent_response",
161
+ "difficulty": "length_noise",
162
+ "predicted_label": "backchannel",
163
+ "confidence": 0.9885514974594116,
164
+ "is_correct": false
165
+ },
166
+ {
167
+ "text": "u r kidding right?",
168
+ "expected_label": "agent_response",
169
+ "difficulty": "length_noise",
170
+ "predicted_label": "backchannel",
171
+ "confidence": 0.9817968606948853,
172
+ "is_correct": false
173
+ },
174
+ {
175
+ "text": "wow just wow, i mean, wow! i never thought that this would happen, like ever, not in a million years, and yet here we are, unbelievable, just totally unbelievable, you feel me?",
176
+ "expected_label": "agent_response",
177
+ "difficulty": "length_noise",
178
+ "predicted_label": "agent_response",
179
+ "confidence": 0.8823995590209961,
180
+ "is_correct": true
181
+ },
182
+ {
183
+ "text": "hah lol whatevs",
184
+ "expected_label": "agent_response",
185
+ "difficulty": "length_noise",
186
+ "predicted_label": "backchannel",
187
+ "confidence": 0.9895368814468384,
188
+ "is_correct": false
189
+ },
190
+ {
191
+ "text": "Ah, anlıyorum. Devam edebilir misiniz?",
192
+ "expected_label": "agent_response",
193
+ "difficulty": "semantic_overlap",
194
+ "predicted_label": "agent_response",
195
+ "confidence": 0.8250168561935425,
196
+ "is_correct": true
197
+ },
198
+ {
199
+ "text": "Hmm, bunu biraz daha açabilir misiniz?",
200
+ "expected_label": "agent_response",
201
+ "difficulty": "semantic_overlap",
202
+ "predicted_label": "agent_response",
203
+ "confidence": 0.745111882686615,
204
+ "is_correct": true
205
+ },
206
+ {
207
+ "text": "Evet, bu gerçekten ilginç. Daha fazla bilgi verebilir misiniz?",
208
+ "expected_label": "agent_response",
209
+ "difficulty": "semantic_overlap",
210
+ "predicted_label": "agent_response",
211
+ "confidence": 0.9849535226821899,
212
+ "is_correct": true
213
+ },
214
+ {
215
+ "text": "Bu konuda düşündüğünüz başka bir şey var mı?",
216
+ "expected_label": "agent_response",
217
+ "difficulty": "semantic_overlap",
218
+ "predicted_label": "agent_response",
219
+ "confidence": 0.9519035220146179,
220
+ "is_correct": true
221
+ },
222
+ {
223
+ "text": "Hımm, pekala. Başka bir açıdan bakacak olursak?",
224
+ "expected_label": "agent_response",
225
+ "difficulty": "semantic_overlap",
226
+ "predicted_label": "backchannel",
227
+ "confidence": 0.903683066368103,
228
+ "is_correct": false
229
+ },
230
+ {
231
+ "text": "Evet, kesinlikle. Peki başka hangi yönlerini ele alabiliriz?",
232
+ "expected_label": "agent_response",
233
+ "difficulty": "semantic_overlap",
234
+ "predicted_label": "agent_response",
235
+ "confidence": 0.9927364587783813,
236
+ "is_correct": true
237
+ },
238
+ {
239
+ "text": "Tamam, peki buna ek olarak ne söyleyebilirsiniz?",
240
+ "expected_label": "agent_response",
241
+ "difficulty": "semantic_overlap",
242
+ "predicted_label": "agent_response",
243
+ "confidence": 0.9534065127372742,
244
+ "is_correct": true
245
+ },
246
+ {
247
+ "text": "Anladım, devam etmek ister misiniz?",
248
+ "expected_label": "agent_response",
249
+ "difficulty": "semantic_overlap",
250
+ "predicted_label": "agent_response",
251
+ "confidence": 0.974102795124054,
252
+ "is_correct": true
253
+ },
254
+ {
255
+ "text": "Evet, peki başka bir detaya dikkat çekmek ister misiniz?",
256
+ "expected_label": "agent_response",
257
+ "difficulty": "semantic_overlap",
258
+ "predicted_label": "agent_response",
259
+ "confidence": 0.9879535436630249,
260
+ "is_correct": true
261
+ },
262
+ {
263
+ "text": "Hmm, çok iyi bir nokta. Bunu biraz daha açar mısınız?",
264
+ "expected_label": "agent_response",
265
+ "difficulty": "semantic_overlap",
266
+ "predicted_label": "agent_response",
267
+ "confidence": 0.9757851362228394,
268
+ "is_correct": true
269
+ },
270
+ {
271
+ "text": "Oh great, another software update that will surely make everything run faster, just like last time.",
272
+ "expected_label": "agent_response",
273
+ "difficulty": "edge_cases",
274
+ "predicted_label": "agent_response",
275
+ "confidence": 0.895721971988678,
276
+ "is_correct": true
277
+ },
278
+ {
279
+ "text": "I'm sure the server downtime at exactly 5 PM on a Friday was purely coincidental, and not at all inconvenient.",
280
+ "expected_label": "agent_response",
281
+ "difficulty": "edge_cases",
282
+ "predicted_label": "agent_response",
283
+ "confidence": 0.849263072013855,
284
+ "is_correct": true
285
+ },
286
+ {
287
+ "text": "Yeah, because deleting the database with a single command is exactly what everyone wants, right?",
288
+ "expected_label": "agent_response",
289
+ "difficulty": "edge_cases",
290
+ "predicted_label": "agent_response",
291
+ "confidence": 0.7744247317314148,
292
+ "is_correct": true
293
+ },
294
+ {
295
+ "text": "I just love it when my AI assistant corrects me even when I'm right, it's like having a personal grammar teacher.",
296
+ "expected_label": "agent_response",
297
+ "difficulty": "edge_cases",
298
+ "predicted_label": "agent_response",
299
+ "confidence": 0.5396984815597534,
300
+ "is_correct": true
301
+ },
302
+ {
303
+ "text": "No, I absolutely don't need any more disk space. Who needs to store files anyway?",
304
+ "expected_label": "agent_response",
305
+ "difficulty": "edge_cases",
306
+ "predicted_label": "agent_response",
307
+ "confidence": 0.9811112284660339,
308
+ "is_correct": true
309
+ },
310
+ {
311
+ "text": "Sure, let's implement the new feature without any testing. What could possibly go wrong?",
312
+ "expected_label": "agent_response",
313
+ "difficulty": "edge_cases",
314
+ "predicted_label": "agent_response",
315
+ "confidence": 0.9612233638763428,
316
+ "is_correct": true
317
+ },
318
+ {
319
+ "text": "Oh, another meeting about meetings? This is exactly why I got into tech.",
320
+ "expected_label": "agent_response",
321
+ "difficulty": "edge_cases",
322
+ "predicted_label": "agent_response",
323
+ "confidence": 0.9544288516044617,
324
+ "is_correct": true
325
+ },
326
+ {
327
+ "text": "I'm really looking forward to debugging this code at 2 AM again. It's the highlight of my week.",
328
+ "expected_label": "agent_response",
329
+ "difficulty": "edge_cases",
330
+ "predicted_label": "agent_response",
331
+ "confidence": 0.8809834122657776,
332
+ "is_correct": true
333
+ },
334
+ {
335
+ "text": "The best part of working with AI is when it confidently gives you the wrong answer.",
336
+ "expected_label": "agent_response",
337
+ "difficulty": "edge_cases",
338
+ "predicted_label": "agent_response",
339
+ "confidence": 0.8558328151702881,
340
+ "is_correct": true
341
+ },
342
+ {
343
+ "text": "Of course, let’s deploy the untested code on a Friday evening, I have nothing better to do.",
344
+ "expected_label": "agent_response",
345
+ "difficulty": "edge_cases",
346
+ "predicted_label": "agent_response",
347
+ "confidence": 0.7736720442771912,
348
+ "is_correct": true
349
+ },
350
+ {
351
+ "text": "Evet, seni anlıyorum.",
352
+ "expected_label": "backchannel",
353
+ "difficulty": "baseline",
354
+ "predicted_label": "backchannel",
355
+ "confidence": 0.8567759990692139,
356
+ "is_correct": true
357
+ },
358
+ {
359
+ "text": "Hmm, ilginç.",
360
+ "expected_label": "backchannel",
361
+ "difficulty": "baseline",
362
+ "predicted_label": "backchannel",
363
+ "confidence": 0.985055685043335,
364
+ "is_correct": true
365
+ },
366
+ {
367
+ "text": "Evet, devam et.",
368
+ "expected_label": "backchannel",
369
+ "difficulty": "baseline",
370
+ "predicted_label": "backchannel",
371
+ "confidence": 0.8956389427185059,
372
+ "is_correct": true
373
+ },
374
+ {
375
+ "text": "Gerçekten mi?",
376
+ "expected_label": "backchannel",
377
+ "difficulty": "baseline",
378
+ "predicted_label": "backchannel",
379
+ "confidence": 0.9868144989013672,
380
+ "is_correct": true
381
+ },
382
+ {
383
+ "text": "Tamam, bu mantıklı.",
384
+ "expected_label": "backchannel",
385
+ "difficulty": "baseline",
386
+ "predicted_label": "backchannel",
387
+ "confidence": 0.7614496946334839,
388
+ "is_correct": true
389
+ },
390
+ {
391
+ "text": "Anladım.",
392
+ "expected_label": "backchannel",
393
+ "difficulty": "baseline",
394
+ "predicted_label": "backchannel",
395
+ "confidence": 0.9884626269340515,
396
+ "is_correct": true
397
+ },
398
+ {
399
+ "text": "Evet, bu doğru.",
400
+ "expected_label": "backchannel",
401
+ "difficulty": "baseline",
402
+ "predicted_label": "backchannel",
403
+ "confidence": 0.8082573413848877,
404
+ "is_correct": true
405
+ },
406
+ {
407
+ "text": "Ah, şimdi anlıyorum.",
408
+ "expected_label": "backchannel",
409
+ "difficulty": "baseline",
410
+ "predicted_label": "backchannel",
411
+ "confidence": 0.9578026533126831,
412
+ "is_correct": true
413
+ },
414
+ {
415
+ "text": "Bu ilginç bir nokta.",
416
+ "expected_label": "backchannel",
417
+ "difficulty": "baseline",
418
+ "predicted_label": "backchannel",
419
+ "confidence": 0.6748051643371582,
420
+ "is_correct": true
421
+ },
422
+ {
423
+ "text": "Evet, buna katılıyorum.",
424
+ "expected_label": "backchannel",
425
+ "difficulty": "baseline",
426
+ "predicted_label": "backchannel",
427
+ "confidence": 0.8088875412940979,
428
+ "is_correct": true
429
+ },
430
+ {
431
+ "text": "yaaaa broooo",
432
+ "expected_label": "backchannel",
433
+ "difficulty": "length_noise",
434
+ "predicted_label": "backchannel",
435
+ "confidence": 0.9909811615943909,
436
+ "is_correct": true
437
+ },
438
+ {
439
+ "text": "huh? r u srz??",
440
+ "expected_label": "backchannel",
441
+ "difficulty": "length_noise",
442
+ "predicted_label": "backchannel",
443
+ "confidence": 0.9855925440788269,
444
+ "is_correct": true
445
+ },
446
+ {
447
+ "text": "OMG this is like the most amazing thing ever I mean I can't even begin to explain how incredible this whole situation is because it's just that awesome you know what I mean like seriously wow just wow ok???",
448
+ "expected_label": "backchannel",
449
+ "difficulty": "length_noise",
450
+ "predicted_label": "agent_response",
451
+ "confidence": 0.7402034997940063,
452
+ "is_correct": false
453
+ },
454
+ {
455
+ "text": "idk wat u mean",
456
+ "expected_label": "backchannel",
457
+ "difficulty": "length_noise",
458
+ "predicted_label": "backchannel",
459
+ "confidence": 0.9897193908691406,
460
+ "is_correct": true
461
+ },
462
+ {
463
+ "text": "sure sure sure sure sure",
464
+ "expected_label": "backchannel",
465
+ "difficulty": "length_noise",
466
+ "predicted_label": "backchannel",
467
+ "confidence": 0.9763302206993103,
468
+ "is_correct": true
469
+ },
470
+ {
471
+ "text": "omg totally 100% agree with you on that one no doubt about it in fact I was just thinking the same thing the other day and it's crazy how we're like on the same wavelength all the time isn't it?",
472
+ "expected_label": "backchannel",
473
+ "difficulty": "length_noise",
474
+ "predicted_label": "agent_response",
475
+ "confidence": 0.9482101798057556,
476
+ "is_correct": false
477
+ },
478
+ {
479
+ "text": "no wayyyy",
480
+ "expected_label": "backchannel",
481
+ "difficulty": "length_noise",
482
+ "predicted_label": "backchannel",
483
+ "confidence": 0.991447925567627,
484
+ "is_correct": true
485
+ },
486
+ {
487
+ "text": "heyyy, u ther?",
488
+ "expected_label": "backchannel",
489
+ "difficulty": "length_noise",
490
+ "predicted_label": "backchannel",
491
+ "confidence": 0.990699827671051,
492
+ "is_correct": true
493
+ },
494
+ {
495
+ "text": "wow cant believe it happened like that i mean who would have thought that everything would turn out this way after all the planning we did it just goes to show that sometimes things have a way of working out on their own despite all the odds and challenges we faced right from the start",
496
+ "expected_label": "backchannel",
497
+ "difficulty": "length_noise",
498
+ "predicted_label": "agent_response",
499
+ "confidence": 0.971515953540802,
500
+ "is_correct": false
501
+ },
502
+ {
503
+ "text": "kk thx bye",
504
+ "expected_label": "backchannel",
505
+ "difficulty": "length_noise",
506
+ "predicted_label": "backchannel",
507
+ "confidence": 0.99072265625,
508
+ "is_correct": true
509
+ },
510
+ {
511
+ "text": "Hmm, ilginç bir nokta.",
512
+ "expected_label": "backchannel",
513
+ "difficulty": "semantic_overlap",
514
+ "predicted_label": "backchannel",
515
+ "confidence": 0.9318225979804993,
516
+ "is_correct": true
517
+ },
518
+ {
519
+ "text": "Anladım, peki ya sonra?",
520
+ "expected_label": "backchannel",
521
+ "difficulty": "semantic_overlap",
522
+ "predicted_label": "backchannel",
523
+ "confidence": 0.9160572290420532,
524
+ "is_correct": true
525
+ },
526
+ {
527
+ "text": "Hmm, o konuda biraz daha bilgi verir misin?",
528
+ "expected_label": "backchannel",
529
+ "difficulty": "semantic_overlap",
530
+ "predicted_label": "agent_response",
531
+ "confidence": 0.7073332667350769,
532
+ "is_correct": false
533
+ },
534
+ {
535
+ "text": "Gerçekten mi? Daha fazla duymak isterim.",
536
+ "expected_label": "backchannel",
537
+ "difficulty": "semantic_overlap",
538
+ "predicted_label": "backchannel",
539
+ "confidence": 0.7160800099372864,
540
+ "is_correct": true
541
+ },
542
+ {
543
+ "text": "Bu mantıklı, başka neler oldu?",
544
+ "expected_label": "backchannel",
545
+ "difficulty": "semantic_overlap",
546
+ "predicted_label": "agent_response",
547
+ "confidence": 0.812181293964386,
548
+ "is_correct": false
549
+ },
550
+ {
551
+ "text": "Hmm, bunu daha önce duymamıştım.",
552
+ "expected_label": "backchannel",
553
+ "difficulty": "semantic_overlap",
554
+ "predicted_label": "backchannel",
555
+ "confidence": 0.8978190422058105,
556
+ "is_correct": true
557
+ },
558
+ {
559
+ "text": "Bir dakika, bunu doğru mu anlıyorum?",
560
+ "expected_label": "backchannel",
561
+ "difficulty": "semantic_overlap",
562
+ "predicted_label": "agent_response",
563
+ "confidence": 0.7696111798286438,
564
+ "is_correct": false
565
+ },
566
+ {
567
+ "text": "Peki, sonra ne yaptılar?",
568
+ "expected_label": "backchannel",
569
+ "difficulty": "semantic_overlap",
570
+ "predicted_label": "backchannel",
571
+ "confidence": 0.6477120518684387,
572
+ "is_correct": true
573
+ },
574
+ {
575
+ "text": "Gerçekten mi? Bu beni düşündürdü.",
576
+ "expected_label": "backchannel",
577
+ "difficulty": "semantic_overlap",
578
+ "predicted_label": "backchannel",
579
+ "confidence": 0.9161955714225769,
580
+ "is_correct": true
581
+ },
582
+ {
583
+ "text": "İlginç, devam et lütfen.",
584
+ "expected_label": "backchannel",
585
+ "difficulty": "semantic_overlap",
586
+ "predicted_label": "backchannel",
587
+ "confidence": 0.7439655661582947,
588
+ "is_correct": true
589
+ },
590
+ {
591
+ "text": "Evet evet, tabii ki de tebrik ederim, dünya harikası bir iş çıkardın (!)",
592
+ "expected_label": "backchannel",
593
+ "difficulty": "edge_cases",
594
+ "predicted_label": "agent_response",
595
+ "confidence": 0.877510666847229,
596
+ "is_correct": false
597
+ },
598
+ {
599
+ "text": "Çok güzel, bu kadar net bir çözüm bulduğunu(!) hiç düşünmemiştim doğrusu.",
600
+ "expected_label": "backchannel",
601
+ "difficulty": "edge_cases",
602
+ "predicted_label": "agent_response",
603
+ "confidence": 0.9826798439025879,
604
+ "is_correct": false
605
+ },
606
+ {
607
+ "text": "Ah, tabii ki! Çünkü herkes daima müşteri hizmetlerinin ne kadar hızlı olduğunu söyler (!)",
608
+ "expected_label": "backchannel",
609
+ "difficulty": "edge_cases",
610
+ "predicted_label": "agent_response",
611
+ "confidence": 0.5608082413673401,
612
+ "is_correct": false
613
+ },
614
+ {
615
+ "text": "Eğer bu kadar 'yaratıcı' bir fikir daha duyar mıyım diye düşünüyordum, teşekkürler!",
616
+ "expected_label": "backchannel",
617
+ "difficulty": "edge_cases",
618
+ "predicted_label": "agent_response",
619
+ "confidence": 0.9653686881065369,
620
+ "is_correct": false
621
+ },
622
+ {
623
+ "text": "Bir işin en iyi nasıl yapılmaması gerektiğini görmek için harika (!) bir örnekti.",
624
+ "expected_label": "backchannel",
625
+ "difficulty": "edge_cases",
626
+ "predicted_label": "agent_response",
627
+ "confidence": 0.9630967378616333,
628
+ "is_correct": false
629
+ },
630
+ {
631
+ "text": "Evet, kesinlikle bugünkü toplantıda hiçbir şey anlaşılmadı diyemem.",
632
+ "expected_label": "backchannel",
633
+ "difficulty": "edge_cases",
634
+ "predicted_label": "agent_response",
635
+ "confidence": 0.8947334289550781,
636
+ "is_correct": false
637
+ },
638
+ {
639
+ "text": "Harika, seninki gibi bir çözüm sayesinde sorunlarımız iki katına çıkacak (!)",
640
+ "expected_label": "backchannel",
641
+ "difficulty": "edge_cases",
642
+ "predicted_label": "agent_response",
643
+ "confidence": 0.8286226987838745,
644
+ "is_correct": false
645
+ },
646
+ {
647
+ "text": "Tabii ki de, Türk çayı yurt dışında sudan bile ucuzdur (!).",
648
+ "expected_label": "backchannel",
649
+ "difficulty": "edge_cases",
650
+ "predicted_label": "backchannel",
651
+ "confidence": 0.8883209228515625,
652
+ "is_correct": true
653
+ },
654
+ {
655
+ "text": "Bu kadar ‘detaylı’ bir analiz için üç cümle yeterli oldu, harikasın!",
656
+ "expected_label": "backchannel",
657
+ "difficulty": "edge_cases",
658
+ "predicted_label": "agent_response",
659
+ "confidence": 0.610821545124054,
660
+ "is_correct": false
661
+ },
662
+ {
663
+ "text": "Elbette, herkesin sabırsızlıkla beklediği o 'harika' PowerPoint sunumunu bir daha görelim.",
664
+ "expected_label": "backchannel",
665
+ "difficulty": "edge_cases",
666
+ "predicted_label": "agent_response",
667
+ "confidence": 0.9590893387794495,
668
+ "is_correct": false
669
+ }
670
+ ],
671
+ "misclassifications": [
672
+ {
673
+ "text": "Elbette, bu konuda size destek olacağım.",
674
+ "expected_label": "agent_response",
675
+ "difficulty": "baseline",
676
+ "predicted_label": "backchannel",
677
+ "confidence": 0.5563545823097229,
678
+ "is_correct": false
679
+ },
680
+ {
681
+ "text": "Anladım, hemen kontrol ediyorum.",
682
+ "expected_label": "agent_response",
683
+ "difficulty": "baseline",
684
+ "predicted_label": "backchannel",
685
+ "confidence": 0.5459677577018738,
686
+ "is_correct": false
687
+ },
688
+ {
689
+ "text": "totes agree lol",
690
+ "expected_label": "agent_response",
691
+ "difficulty": "length_noise",
692
+ "predicted_label": "backchannel",
693
+ "confidence": 0.9879427552223206,
694
+ "is_correct": false
695
+ },
696
+ {
697
+ "text": "yup yup yup yup yup",
698
+ "expected_label": "agent_response",
699
+ "difficulty": "length_noise",
700
+ "predicted_label": "backchannel",
701
+ "confidence": 0.988431453704834,
702
+ "is_correct": false
703
+ },
704
+ {
705
+ "text": "nah bro",
706
+ "expected_label": "agent_response",
707
+ "difficulty": "length_noise",
708
+ "predicted_label": "backchannel",
709
+ "confidence": 0.9873980283737183,
710
+ "is_correct": false
711
+ },
712
+ {
713
+ "text": "yasss that's wassup",
714
+ "expected_label": "agent_response",
715
+ "difficulty": "length_noise",
716
+ "predicted_label": "backchannel",
717
+ "confidence": 0.974721372127533,
718
+ "is_correct": false
719
+ },
720
+ {
721
+ "text": "omg thts crazee",
722
+ "expected_label": "agent_response",
723
+ "difficulty": "length_noise",
724
+ "predicted_label": "backchannel",
725
+ "confidence": 0.9885514974594116,
726
+ "is_correct": false
727
+ },
728
+ {
729
+ "text": "u r kidding right?",
730
+ "expected_label": "agent_response",
731
+ "difficulty": "length_noise",
732
+ "predicted_label": "backchannel",
733
+ "confidence": 0.9817968606948853,
734
+ "is_correct": false
735
+ },
736
+ {
737
+ "text": "hah lol whatevs",
738
+ "expected_label": "agent_response",
739
+ "difficulty": "length_noise",
740
+ "predicted_label": "backchannel",
741
+ "confidence": 0.9895368814468384,
742
+ "is_correct": false
743
+ },
744
+ {
745
+ "text": "Hımm, pekala. Başka bir açıdan bakacak olursak?",
746
+ "expected_label": "agent_response",
747
+ "difficulty": "semantic_overlap",
748
+ "predicted_label": "backchannel",
749
+ "confidence": 0.903683066368103,
750
+ "is_correct": false
751
+ },
752
+ {
753
+ "text": "OMG this is like the most amazing thing ever I mean I can't even begin to explain how incredible this whole situation is because it's just that awesome you know what I mean like seriously wow just wow ok???",
754
+ "expected_label": "backchannel",
755
+ "difficulty": "length_noise",
756
+ "predicted_label": "agent_response",
757
+ "confidence": 0.7402034997940063,
758
+ "is_correct": false
759
+ },
760
+ {
761
+ "text": "omg totally 100% agree with you on that one no doubt about it in fact I was just thinking the same thing the other day and it's crazy how we're like on the same wavelength all the time isn't it?",
762
+ "expected_label": "backchannel",
763
+ "difficulty": "length_noise",
764
+ "predicted_label": "agent_response",
765
+ "confidence": 0.9482101798057556,
766
+ "is_correct": false
767
+ },
768
+ {
769
+ "text": "wow cant believe it happened like that i mean who would have thought that everything would turn out this way after all the planning we did it just goes to show that sometimes things have a way of working out on their own despite all the odds and challenges we faced right from the start",
770
+ "expected_label": "backchannel",
771
+ "difficulty": "length_noise",
772
+ "predicted_label": "agent_response",
773
+ "confidence": 0.971515953540802,
774
+ "is_correct": false
775
+ },
776
+ {
777
+ "text": "Hmm, o konuda biraz daha bilgi verir misin?",
778
+ "expected_label": "backchannel",
779
+ "difficulty": "semantic_overlap",
780
+ "predicted_label": "agent_response",
781
+ "confidence": 0.7073332667350769,
782
+ "is_correct": false
783
+ },
784
+ {
785
+ "text": "Bu mantıklı, başka neler oldu?",
786
+ "expected_label": "backchannel",
787
+ "difficulty": "semantic_overlap",
788
+ "predicted_label": "agent_response",
789
+ "confidence": 0.812181293964386,
790
+ "is_correct": false
791
+ },
792
+ {
793
+ "text": "Bir dakika, bunu doğru mu anlıyorum?",
794
+ "expected_label": "backchannel",
795
+ "difficulty": "semantic_overlap",
796
+ "predicted_label": "agent_response",
797
+ "confidence": 0.7696111798286438,
798
+ "is_correct": false
799
+ },
800
+ {
801
+ "text": "Evet evet, tabii ki de tebrik ederim, dünya harikası bir iş çıkardın (!)",
802
+ "expected_label": "backchannel",
803
+ "difficulty": "edge_cases",
804
+ "predicted_label": "agent_response",
805
+ "confidence": 0.877510666847229,
806
+ "is_correct": false
807
+ },
808
+ {
809
+ "text": "Çok güzel, bu kadar net bir çözüm bulduğunu(!) hiç düşünmemiştim doğrusu.",
810
+ "expected_label": "backchannel",
811
+ "difficulty": "edge_cases",
812
+ "predicted_label": "agent_response",
813
+ "confidence": 0.9826798439025879,
814
+ "is_correct": false
815
+ },
816
+ {
817
+ "text": "Ah, tabii ki! Çünkü herkes daima müşteri hizmetlerinin ne kadar hızlı olduğunu söyler (!)",
818
+ "expected_label": "backchannel",
819
+ "difficulty": "edge_cases",
820
+ "predicted_label": "agent_response",
821
+ "confidence": 0.5608082413673401,
822
+ "is_correct": false
823
+ },
824
+ {
825
+ "text": "Eğer bu kadar 'yaratıcı' bir fikir daha duyar mıyım diye düşünüyordum, teşekkürler!",
826
+ "expected_label": "backchannel",
827
+ "difficulty": "edge_cases",
828
+ "predicted_label": "agent_response",
829
+ "confidence": 0.9653686881065369,
830
+ "is_correct": false
831
+ },
832
+ {
833
+ "text": "Bir işin en iyi nasıl yapılmaması gerektiğini görmek için harika (!) bir örnekti.",
834
+ "expected_label": "backchannel",
835
+ "difficulty": "edge_cases",
836
+ "predicted_label": "agent_response",
837
+ "confidence": 0.9630967378616333,
838
+ "is_correct": false
839
+ },
840
+ {
841
+ "text": "Evet, kesinlikle bugünkü toplantıda hiçbir şey anlaşılmadı diyemem.",
842
+ "expected_label": "backchannel",
843
+ "difficulty": "edge_cases",
844
+ "predicted_label": "agent_response",
845
+ "confidence": 0.8947334289550781,
846
+ "is_correct": false
847
+ },
848
+ {
849
+ "text": "Harika, seninki gibi bir çözüm sayesinde sorunlarımız iki katına çıkacak (!)",
850
+ "expected_label": "backchannel",
851
+ "difficulty": "edge_cases",
852
+ "predicted_label": "agent_response",
853
+ "confidence": 0.8286226987838745,
854
+ "is_correct": false
855
+ },
856
+ {
857
+ "text": "Bu kadar ‘detaylı’ bir analiz için üç cümle yeterli oldu, harikasın!",
858
+ "expected_label": "backchannel",
859
+ "difficulty": "edge_cases",
860
+ "predicted_label": "agent_response",
861
+ "confidence": 0.610821545124054,
862
+ "is_correct": false
863
+ },
864
+ {
865
+ "text": "Elbette, herkesin sabırsızlıkla beklediği o 'harika' PowerPoint sunumunu bir daha görelim.",
866
+ "expected_label": "backchannel",
867
+ "difficulty": "edge_cases",
868
+ "predicted_label": "agent_response",
869
+ "confidence": 0.9590893387794495,
870
+ "is_correct": false
871
+ }
872
+ ]
873
+ }
config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 768,
10
+ "id2label": {
11
+ "0": "agent_response",
12
+ "1": "backchannel"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "label2id": {
17
+ "agent_response": 0,
18
+ "backchannel": 1
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "problem_type": "multi_label_classification",
28
+ "torch_dtype": "float32",
29
+ "transformers_version": "4.52.4",
30
+ "type_vocab_size": 2,
31
+ "use_cache": true,
32
+ "vocab_size": 32000
33
+ }
evaluation_results.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "overall": {
3
+ "macro_f1": 0.9924276856095726,
4
+ "micro_f1": 0.9932420416147963,
5
+ "mcc": 0.9848560799888242,
6
+ "accuracy": 99.32420416147963
7
+ },
8
+ "per_class": {
9
+ "agent_response": {
10
+ "accuracy": 99.53108252947482,
11
+ "correct": 7429,
12
+ "total": 7464
13
+ },
14
+ "backchannel": {
15
+ "accuracy": 98.91591750396616,
16
+ "correct": 3741,
17
+ "total": 3782
18
+ }
19
+ },
20
+ "labels": [
21
+ "agent_response",
22
+ "backchannel"
23
+ ],
24
+ "evaluated_at": "2025-12-14T21:35:53.969061"
25
+ }
label_config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels": [
3
+ "agent_response",
4
+ "backchannel"
5
+ ],
6
+ "id2label": {
7
+ "0": "agent_response",
8
+ "1": "backchannel"
9
+ },
10
+ "label2id": {
11
+ "agent_response": 0,
12
+ "backchannel": 1
13
+ },
14
+ "num_labels": 2,
15
+ "base_model": "dbmdz/bert-base-turkish-uncased",
16
+ "trained_at": "2025-12-14T21:35:28.772117"
17
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0045c21ba8bef6fde11ce20a30700a63b715c2d0c40cf82c1d7bcab17adab137
3
+ size 442499064
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "max_len": 512,
51
+ "model_max_length": 512,
52
+ "never_split": null,
53
+ "pad_token": "[PAD]",
54
+ "sep_token": "[SEP]",
55
+ "strip_accents": null,
56
+ "tokenize_chinese_chars": true,
57
+ "tokenizer_class": "BertTokenizer",
58
+ "unk_token": "[UNK]"
59
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1502a401bd9d1d7469710211efa36b287addea220a6762248357b0afb9e79f51
3
+ size 5841
vocab.txt ADDED
The diff for this file is too large to render. See raw diff