hayatiali commited on
Commit
ab5f062
·
verified ·
1 Parent(s): d0fd2bc

v2.0: Semantic rule improvements + dataset expansion (+9082 samples)

Browse files
README.md ADDED
@@ -0,0 +1,392 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: tr
3
+ license: other
4
+ license_name: siriusai-premium-v1
5
+ license_link: LICENSE
6
+ tags:
7
+ - turkish
8
+ - text-classification
9
+ - bert
10
+ - nlp
11
+ - transformers
12
+ - turn-detection
13
+ - voice-assistant
14
+ - latency-optimization
15
+ - siriusai
16
+ - production-ready
17
+ - enterprise
18
+ base_model: dbmdz/bert-base-turkish-uncased
19
+ datasets:
20
+ - custom
21
+ metrics:
22
+ - f1
23
+ - precision
24
+ - recall
25
+ - accuracy
26
+ - mcc
27
+ library_name: transformers
28
+ pipeline_tag: text-classification
29
+ model-index:
30
+ - name: turn-detector-v2
31
+ results:
32
+ - task:
33
+ type: text-classification
34
+ name: Text Classification
35
+ metrics:
36
+ - type: f1
37
+ value: 0.9769
38
+ name: Macro F1
39
+ - type: mcc
40
+ value: 0.9544
41
+ name: MCC
42
+ - type: accuracy
43
+ value: 97.94
44
+ name: Accuracy
45
+ ---
46
+
47
+ # turn-detector-v2 - Turkish Turn Detection Model
48
+
49
+ <p align="center">
50
+ <a href="https://huggingface.co/hayatiali/turn-detector-v2"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-turn--detector--v2-yellow" alt="Hugging Face"></a>
51
+ <a href="https://huggingface.co/hayatiali/turn-detector-v2"><img src="https://img.shields.io/badge/Model-Production%20Ready-brightgreen" alt="Production Ready"></a>
52
+ <img src="https://img.shields.io/badge/Language-Turkish-blue" alt="Turkish">
53
+ <img src="https://img.shields.io/badge/Task-Turn%20Detection-orange" alt="Turn Detection">
54
+ <img src="https://img.shields.io/badge/F1-97.69%25-success" alt="F1 Score">
55
+ </p>
56
+
57
+ This model is designed for detecting turn-taking patterns in Turkish conversations, optimizing voice assistant latency by identifying when user utterances require LLM processing vs. simple acknowledgments.
58
+
59
+ *Developed by SiriusAI Tech Brain Team*
60
+
61
+ ---
62
+
63
+ ## Mission
64
+
65
+ > **To optimize voice assistant response latency by detecting when user utterances require LLM processing vs. simple acknowledgments.**
66
+
67
+ The `turn-detector-v2` model analyzes **conversational turn pairs** (bot utterance + user response) and classifies whether the user's response requires LLM processing (**agent_response**) or is just a backchannel acknowledgment that can be handled without LLM (**backchannel**).
68
+
69
+ ### Key Benefits
70
+
71
+ | Benefit | Description |
72
+ |---------|-------------|
73
+ | **Latency Reduction** | Skip LLM calls for backchannels, saving 500-2000ms per interaction |
74
+ | **Cost Optimization** | Reduce LLM API costs by filtering unnecessary calls |
75
+ | **Natural Conversation** | Return immediate filler responses ("hmm", "tamam") for acknowledgments |
76
+ | **High Accuracy** | 97.94% accuracy ensures reliable real-world performance |
77
+
78
+ ---
79
+
80
+ ## Model Overview
81
+
82
+ | Property | Value |
83
+ |----------|-------|
84
+ | **Architecture** | BertForSequenceClassification |
85
+ | **Base Model** | `dbmdz/bert-base-turkish-uncased` |
86
+ | **Task** | Binary Text Classification |
87
+ | **Language** | Turkish (tr) |
88
+ | **Labels** | 2 (agent_response, backchannel) |
89
+ | **Model Size** | ~110M parameters |
90
+ | **Inference Time** | ~10-15ms (GPU) / ~40-50ms (CPU) |
91
+
92
+ ---
93
+
94
+ ## Performance Metrics
95
+
96
+ ### Final Evaluation Results
97
+
98
+ | Metric | Score |
99
+ |--------|-------|
100
+ | **Macro F1** | **0.9769** |
101
+ | **Micro F1** | **0.9794** |
102
+ | **MCC** | **0.9544** |
103
+ | **Accuracy** | **97.94%** |
104
+
105
+ ### Per-Class Performance
106
+
107
+ | Category | Accuracy | Samples |
108
+ |----------|----------|---------|
109
+ | **agent_response** | 99.57% | 8,553 |
110
+ | **backchannel** | 94.83% | 4,470 |
111
+
112
+ ---
113
+
114
+ ## Semantic Classification Rules
115
+
116
+ ### When to Classify as `backchannel` (Skip LLM)
117
+
118
+ | Condition | Examples |
119
+ |-----------|----------|
120
+ | Bot gives info + User short acknowledgment | "tamam", "anladim", "ok", "peki" |
121
+ | Bot gives info + User rhetorical question | "oyle mi?", "harbi mi?", "cidden mi?" |
122
+ | Bot gives info + User minimal response | "hmm", "hi hi", "evet" |
123
+
124
+ ### When to Classify as `agent_response` (Send to LLM)
125
+
126
+ | Condition | Examples |
127
+ |-----------|----------|
128
+ | Bot asks question + User gives any answer | "[bot] adi nedir [sep] [user] ahmet" |
129
+ | Bot gives info + User asks real question | "[bot] faturaniz kesildi [sep] [user] ne zaman?" |
130
+ | Bot gives info + User makes request | "[bot] kargonuz yolda [sep] [user] adresi degistirmek istiyorum" |
131
+ | User provides detailed information | "[bot] bilgi verir misiniz [sep] [user] sunu sunu istiyorum cunku..." |
132
+
133
+ ### Golden Rule
134
+
135
+ ```
136
+ If bot asked a question → Always agent_response
137
+ If bot gave info + User short acknowledgment → backchannel
138
+ ```
139
+
140
+ ---
141
+
142
+ ## Dataset
143
+
144
+ ### Dataset Statistics
145
+
146
+ | Split | Samples |
147
+ |-------|---------|
148
+ | **Train** | 52,287 |
149
+ | **Test** | 13,023 |
150
+ | **Total** | 65,310 |
151
+
152
+ ### Label Distribution
153
+
154
+ | Label | Count | Percentage |
155
+ |-------|-------|------------|
156
+ | **agent_response** | 35,223 | 67.4% |
157
+ | **backchannel** | 17,064 | 32.6% |
158
+
159
+ ### Domain Coverage
160
+
161
+ - E-commerce (kargo, iade, teslimat)
162
+ - Banking (hesap, bakiye, kredi)
163
+ - Telecom (numara tasima, data, hat)
164
+ - Insurance (prim, police, teminat, kasko)
165
+ - General Support (sikayet, yonetici, eskalasyon)
166
+ - Identity Verification (TC, gorusuyorum, soyadi)
167
+
168
+ ---
169
+
170
+ ## Label Definitions
171
+
172
+ | Label | ID | Description |
173
+ |-------|-----|-------------|
174
+ | **agent_response** | 0 | User response requires LLM processing - questions, requests, confirmations to questions, corrections |
175
+ | **backchannel** | 1 | Simple acknowledgment - LLM skipped, filler returned (tamam, anladim, ok) |
176
+
177
+ ### Input Format
178
+
179
+ ```
180
+ [bot] <bot utterance> [sep] [user] <user response>
181
+ ```
182
+
183
+ ### Example Classifications
184
+
185
+ **agent_response** (Send to LLM):
186
+ ```
187
+ [bot] size nasil yardimci olabilirim [sep] [user] fatura sorgulamak istiyorum
188
+ [bot] ahmet bey ile mi gorusuyorum [sep] [user] evet benim
189
+ [bot] islemi onayliyor musunuz [sep] [user] evet onayliyorum
190
+ [bot] kargonuz yolda [sep] [user] ne zaman gelir
191
+ [bot] poliçeniz aktif [sep] [user] teminat limitini ogrenebilir miyim
192
+ ```
193
+
194
+ **backchannel** (Skip LLM, return filler):
195
+ ```
196
+ [bot] faturaniz 150 tl gorunuyor [sep] [user] tamam
197
+ [bot] siparisiniz 3 gun icinde teslim edilecek [sep] [user] anladim
198
+ [bot] kaydinizi kontrol ediyorum [sep] [user] peki
199
+ [bot] policeniz yenilendi [sep] [user] tesekkurler
200
+ [bot] sifreni sms ile gonderdik [sep] [user] ok aldim
201
+ ```
202
+
203
+ ---
204
+
205
+ ## Training
206
+
207
+ ### Hyperparameters
208
+
209
+ | Parameter | Value |
210
+ |-----------|-------|
211
+ | **Base Model** | `dbmdz/bert-base-turkish-uncased` |
212
+ | **Max Sequence Length** | 128 tokens |
213
+ | **Batch Size** | 16 |
214
+ | **Learning Rate** | 3e-5 |
215
+ | **Epochs** | 4 |
216
+ | **Optimizer** | AdamW |
217
+ | **Weight Decay** | 0.01 |
218
+ | **Loss Function** | CrossEntropyLoss |
219
+ | **Hardware** | Apple Silicon (MPS) |
220
+
221
+ ---
222
+
223
+ ## Usage
224
+
225
+ ### Installation
226
+
227
+ ```bash
228
+ pip install transformers torch
229
+ ```
230
+
231
+ ### Quick Start
232
+
233
+ ```python
234
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
235
+ import torch
236
+
237
+ model_name = "hayatiali/turn-detector-v2"
238
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
239
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
240
+ model.eval()
241
+
242
+ LABELS = ["agent_response", "backchannel"]
243
+
244
+ def predict(text):
245
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
246
+ with torch.no_grad():
247
+ outputs = model(**inputs)
248
+ probs = torch.softmax(outputs.logits, dim=-1)[0]
249
+
250
+ scores = {label: float(prob) for label, prob in zip(LABELS, probs)}
251
+ return {"label": max(scores, key=scores.get), "confidence": max(scores.values())}
252
+
253
+ # Bot asks question → agent_response
254
+ print(predict("[bot] ahmet bey ile mi gorusuyorum [sep] [user] evet benim"))
255
+ # Output: {'label': 'agent_response', 'confidence': 0.99}
256
+
257
+ # Bot gives info + User acknowledges → backchannel
258
+ print(predict("[bot] faturaniz 150 tl gorunuyor [sep] [user] tamam"))
259
+ # Output: {'label': 'backchannel', 'confidence': 0.98}
260
+ ```
261
+
262
+ ### Production Integration
263
+
264
+ ```python
265
+ class TurnDetector:
266
+ """Production-ready turn detection for voice assistants."""
267
+
268
+ LABELS = ["agent_response", "backchannel"]
269
+ FILLER_RESPONSES = ["hmm", "evet", "tamam", "anlıyorum"]
270
+
271
+ def __init__(self, model_path="hayatiali/turn-detector-v2"):
272
+ self.tokenizer = AutoTokenizer.from_pretrained(model_path)
273
+ self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
274
+ self.device = "cuda" if torch.cuda.is_available() else "cpu"
275
+ self.model.to(self.device).eval()
276
+
277
+ def should_call_llm(self, bot_text: str, user_text: str) -> dict:
278
+ """
279
+ Determines if user response should go to LLM.
280
+
281
+ Returns:
282
+ dict with 'call_llm' (bool), 'label', 'confidence', 'filler' (if backchannel)
283
+ """
284
+ text = f"[bot] {bot_text} [sep] [user] {user_text}"
285
+ inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
286
+ inputs = {k: v.to(self.device) for k, v in inputs.items()}
287
+
288
+ with torch.no_grad():
289
+ probs = torch.softmax(self.model(**inputs).logits, dim=-1)[0].cpu()
290
+
291
+ label_idx = probs.argmax().item()
292
+ label = self.LABELS[label_idx]
293
+ confidence = probs[label_idx].item()
294
+
295
+ result = {
296
+ "call_llm": label == "agent_response",
297
+ "label": label,
298
+ "confidence": confidence
299
+ }
300
+
301
+ if label == "backchannel":
302
+ import random
303
+ result["filler"] = random.choice(self.FILLER_RESPONSES)
304
+
305
+ return result
306
+
307
+ # Usage
308
+ detector = TurnDetector()
309
+
310
+ # Case 1: Bot asks, user confirms → Send to LLM
311
+ result = detector.should_call_llm("siparis iptal etmek ister misiniz", "evet iptal et")
312
+ # {'call_llm': True, 'label': 'agent_response', 'confidence': 0.99}
313
+
314
+ # Case 2: Bot informs, user acknowledges → Return filler
315
+ result = detector.should_call_llm("siparisiz yola cikti", "tamam")
316
+ # {'call_llm': False, 'label': 'backchannel', 'confidence': 0.97, 'filler': 'hmm'}
317
+ ```
318
+
319
+ ---
320
+
321
+ ## Limitations
322
+
323
+ | Limitation | Details |
324
+ |------------|---------|
325
+ | **Language** | Turkish only, may struggle with heavy dialects |
326
+ | **Context** | Single-turn analysis, no multi-turn memory |
327
+ | **Domain** | Trained on customer service, may need fine-tuning for other domains |
328
+ | **Edge Cases** | Ambiguous short responses may have lower confidence |
329
+
330
+ ---
331
+
332
+ ## Citation
333
+
334
+ ```bibtex
335
+ @misc{turn-detector-v2-2025,
336
+ title={turn-detector-v2: Turkish Turn Detection for Voice Assistants},
337
+ author={SiriusAI Tech Brain Team},
338
+ year={2025},
339
+ publisher={Hugging Face},
340
+ howpublished={\url{https://huggingface.co/hayatiali/turn-detector-v2}},
341
+ note={Fine-tuned from dbmdz/bert-base-turkish-uncased}
342
+ }
343
+ ```
344
+
345
+ ---
346
+
347
+ ## Contact
348
+
349
+ - **Developer**: SiriusAI Tech Brain Team
350
+ - **Email**: info@siriusaitech.com
351
+ - **Repository**: [GitHub](https://github.com/sirius-tedarik)
352
+
353
+ ---
354
+
355
+ ## Changelog
356
+
357
+ ### v2.0 (Current)
358
+
359
+ **Semantic Rule Improvements:**
360
+ - If bot asks a question → always `agent_response` (731 rows corrected)
361
+ - Rhetorical questions ("really?", "is that so?") → remain as `backchannel`
362
+ - If user asks a real question ("when?", "how?") → `agent_response`
363
+
364
+ **Dataset Expansion (+9,082 samples):**
365
+
366
+ | Category | Added Patterns |
367
+ |----------|----------------|
368
+ | **Insurance** | premium, policy, coverage, comprehensive, interest, late fees |
369
+ | **Telecom** | number porting, data exhausted, line transfer, GB remaining |
370
+ | **E-commerce** | shipping cost, free shipping, returns, delivery |
371
+ | **Price/Budget** | expensive, budget, too much, will think about it, not suitable |
372
+ | **Identity Verification** | national ID, "am I speaking with...", surname, date of birth |
373
+ | **Objection/Complaint** | unacceptable, not satisfied, complaint, impossible |
374
+ | **Escalation** | manager, director, supervisor |
375
+ | **Hold Requests** | one moment, busy right now, not now, later |
376
+
377
+ **Metrics:** Macro F1: 0.9769, Accuracy: 97.94%
378
+
379
+ > Note: Metrics appear slightly lower than v1.0, but this is a more accurate model.
380
+ > v1.0 had mislabeled data (bot asked question + "yes" = backchannel),
381
+ > which the model memorized. v2.0 ensures semantic consistency.
382
+
383
+ ### v1.0
384
+ - Initial release
385
+ - Dataset: 56,228 samples
386
+ - Macro F1: 0.9924, Accuracy: 99.3%
387
+
388
+ ---
389
+
390
+ **License**: SiriusAI Tech Premium License v1.0
391
+
392
+ **Commercial Use**: Requires Premium License. Contact: info@siriusaitech.com
config.json ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 768,
10
+ "id2label": {
11
+ "0": "agent_response",
12
+ "1": "backchannel"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "label2id": {
17
+ "agent_response": 0,
18
+ "backchannel": 1
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "problem_type": "single_label_classification",
28
+ "torch_dtype": "float32",
29
+ "transformers_version": "4.52.4",
30
+ "type_vocab_size": 2,
31
+ "use_cache": true,
32
+ "vocab_size": 32000,
33
+ "_metadata": {
34
+ "model_name": "turn-detection-v2",
35
+ "version": "1.0.0",
36
+ "published_at": "2025-12-31",
37
+ "author": "Fine-Tune Assistant",
38
+ "license": "Apache-2.0",
39
+ "huggingface_repo": "hayatiali/turn-detection-v2",
40
+ "huggingface_url": "https://huggingface.co/hayatiali/turn-detection-v2"
41
+ },
42
+ "_context_aware": {
43
+ "enabled": true,
44
+ "input_format": "[bot] {bot_message} [sep] [user] {user_message}",
45
+ "special_tokens": [
46
+ "[bot]",
47
+ "[sep]",
48
+ "[user]"
49
+ ],
50
+ "example_input": "[bot] sunucuya katilmak icin ne yapmaliyim [sep] [user] ya davet kodu alabilir miyim",
51
+ "fallback_behavior": "If no [bot] context provided, model uses user text only"
52
+ },
53
+ "_task": {
54
+ "type": "text-classification",
55
+ "name": "Turn Detection V2",
56
+ "description": "Classifies text into 2 categories: agent_response, backchannel",
57
+ "num_labels": 2
58
+ },
59
+ "_labels": {
60
+ "num_labels": 2,
61
+ "id2label": {
62
+ "0": "agent_response",
63
+ "1": "backchannel"
64
+ },
65
+ "label2id": {
66
+ "agent_response": 0,
67
+ "backchannel": 1
68
+ },
69
+ "label_descriptions": {
70
+ "agent_response": "Category: agent_response",
71
+ "backchannel": "Category: backchannel"
72
+ }
73
+ },
74
+ "_domain": {
75
+ "language": "Turkish (tr)",
76
+ "domain": "General",
77
+ "base_model": "dbmdz/bert-base-turkish-uncased"
78
+ },
79
+ "_training": {
80
+ "dataset": {
81
+ "name": "callcenter-turn-detection-classification",
82
+ "total_samples": 65310,
83
+ "train_samples": 52287,
84
+ "test_samples": 13023,
85
+ "label_distribution": {
86
+ "agent_response": "35223 (67.4%)",
87
+ "backchannel": "17064 (32.6%)"
88
+ }
89
+ },
90
+ "hyperparameters": {
91
+ "max_sequence_length": 128,
92
+ "batch_size": 16,
93
+ "learning_rate": 3e-05,
94
+ "epochs": 4,
95
+ "optimizer": "AdamW",
96
+ "weight_decay": 0.01,
97
+ "loss_function": "CrossEntropyLoss"
98
+ },
99
+ "hardware": "mps"
100
+ },
101
+ "_evaluation": {
102
+ "metrics": {
103
+ "macro_f1": 0.9769,
104
+ "micro_f1": 0.9794,
105
+ "mcc": 0.9544,
106
+ "accuracy": 97.94
107
+ },
108
+ "per_class": {
109
+ "agent_response": {
110
+ "accuracy": 99.57,
111
+ "samples": 8553
112
+ },
113
+ "backchannel": {
114
+ "accuracy": 94.83,
115
+ "samples": 4470
116
+ }
117
+ }
118
+ }
119
+ }
evaluation_results.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "overall": {
3
+ "macro_f1": 0.9769330455276665,
4
+ "micro_f1": 0.9794210243415495,
5
+ "mcc": 0.9544096525818544,
6
+ "accuracy": 97.94210243415495
7
+ },
8
+ "per_class": {
9
+ "agent_response": {
10
+ "accuracy": 99.56740325032153,
11
+ "correct": 8516,
12
+ "total": 8553
13
+ },
14
+ "backchannel": {
15
+ "accuracy": 94.83221476510067,
16
+ "correct": 4239,
17
+ "total": 4470
18
+ }
19
+ },
20
+ "labels": [
21
+ "agent_response",
22
+ "backchannel"
23
+ ],
24
+ "evaluated_at": "2025-12-31T22:16:17.601187"
25
+ }
label_config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels": [
3
+ "agent_response",
4
+ "backchannel"
5
+ ],
6
+ "id2label": {
7
+ "0": "agent_response",
8
+ "1": "backchannel"
9
+ },
10
+ "label2id": {
11
+ "agent_response": 0,
12
+ "backchannel": 1
13
+ },
14
+ "num_labels": 2,
15
+ "base_model": "dbmdz/bert-base-turkish-uncased",
16
+ "trained_at": "2025-12-31T22:15:45.605311"
17
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27154b013aebd7565d1d90b6de65fe54e24c66823b1c985605ff9cbda40bcd89
3
+ size 442499064
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "max_len": 512,
51
+ "model_max_length": 512,
52
+ "never_split": null,
53
+ "pad_token": "[PAD]",
54
+ "sep_token": "[SEP]",
55
+ "strip_accents": null,
56
+ "tokenize_chinese_chars": true,
57
+ "tokenizer_class": "BertTokenizer",
58
+ "unk_token": "[UNK]"
59
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ce37056b653bec41c689d47ad2e5467eee5c03e8e20cf17c2a79dd05dfaa8f1
3
+ size 5841
vocab.txt ADDED
The diff for this file is too large to render. See raw diff