File size: 5,556 Bytes
62f41ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
---

language: tr
tags:
- sentiment-analysis
- turkish
- bert
- text-classification
- fine-tuned
license: apache-2.0
base_model: codealchemist01/turkish-sentiment-analysis
datasets:
- winvoker/turkish-sentiment-analysis-dataset
- WhiteAngelss/Turkce-Duygu-Analizi-Dataset
- maydogan/Turkish_SentimentAnalysis_TRSAv1
- turkish-nlp-suite/MusteriYorumlari
- W4nkel/turkish-sentiment-dataset
metrics:
- accuracy
- f1
- precision
- recall
---


# Turkish Sentiment Analysis Model (Fine-tuned)

A fine-tuned version of the [codealchemist01/turkish-sentiment-analysis](https://huggingface.co/codealchemist01/turkish-sentiment-analysis) model, improved with additional balanced training data to enhance neutral and negative class performance.

## Model Details

- **Base Model:** [codealchemist01/turkish-sentiment-analysis](https://huggingface.co/codealchemist01/turkish-sentiment-analysis)
- **Task:** Text Classification (Sentiment Analysis)
- **Language:** Turkish
- **Labels:** positive, negative, neutral
- **Fine-tuning Type:** Continued fine-tuning on balanced dataset

## Training Data

This model was fine-tuned on a balanced combination of the original dataset and additional Turkish sentiment datasets:

### Original Dataset (from base model):
- `winvoker/turkish-sentiment-analysis-dataset` (440,641 samples)
- `WhiteAngelss/Turkce-Duygu-Analizi-Dataset` (440,641 samples)

### Additional Datasets for Fine-tuning:
- `maydogan/Turkish_SentimentAnalysis_TRSAv1` (150,000 samples)
- `turkish-nlp-suite/MusteriYorumlari` (73,920 samples)
- `W4nkel/turkish-sentiment-dataset` (4,800 samples)
- `mustfkeskin/turkish-movie-sentiment-analysis-dataset` (Kaggle, 83,227 samples)

### Final Balanced Dataset:
- **Total:** 556,888 samples
- **Positive:** 237,966 (42.7%)
- **Neutral:** 209,668 (37.6%)
- **Negative:** 109,254 (19.6%)

**Split Distribution:**
- **Training:** 445,510 samples
- **Validation:** 55,689 samples
- **Test:** 55,689 samples

## Training

### Fine-tuning Parameters:
- **Base Model:** codealchemist01/turkish-sentiment-analysis
- **Epochs:** 2
- **Learning Rate:** 1e-5 (lower than initial training for fine-tuning)
- **Batch Size:** 12 (per device)
- **Gradient Accumulation:** 2 (effective batch size: 24)
- **Max Length:** 128 tokens
- **Optimizer:** AdamW
- **Mixed Precision (FP16):** Enabled

## Performance

### Test Set Results (55,689 samples):

**Overall Metrics:**
- **Accuracy:** 91.96%
- **Weighted F1:** 91.93%
- **Weighted Precision:** 91.93%
- **Weighted Recall:** 91.96%

### Per-Class Performance:

| Class    | Precision | Recall | F1-Score | Support |
|----------|-----------|--------|----------|---------|
| Negative | 90.65%    | 86.79% | 88.68%   | 10,926  |
| Neutral  | 90.91%    | 90.24% | 90.57%   | 20,967  |
| Positive | 93.41%    | 95.84% | 94.61%   | 23,796  |

## Improvements Over Base Model

### Key Improvements:
1. **Neutral Class Performance:**
   - Better recognition of neutral expressions
   - Improved handling of ambiguous texts
   - Neutral F1-score: **90.57%** (improved from base model's test performance)

2. **Better Class Balance:**
   - More balanced dataset (reduced class imbalance)
   - Negative class improved with more training examples
   - Neutral class significantly enhanced

3. **General Performance:**
   - Maintained high accuracy (91.96%)
   - Improved F1-scores across all classes
   - Better generalization on diverse Turkish texts

### Test Results Comparison (15 sample test):
- **Base Model Accuracy:** 66.7% (10/15)
- **Fine-tuned Model Accuracy:** 86.7% (13/15)
- **Improvement:** +20.0%

### Per-Class Test Results:
- **Neutral:** 0% → 80% (+80.0% improvement)
- **Negative:** 100% → 80% (slight decrease, but more balanced)
- **Positive:** 100% → 100% (maintained)

## Usage

```python

from transformers import AutoTokenizer, AutoModelForSequenceClassification

import torch



# Load model and tokenizer

model_name = "codealchemist01/turkish-sentiment-analysis-finetuned"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForSequenceClassification.from_pretrained(model_name)



# Example text

text = "Bu ürün normal, beklediğim gibi. Özel bir şey yok."



# Tokenize

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)



# Predict

with torch.no_grad():

    outputs = model(**inputs)

    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

    predicted_label_id = predictions.argmax().item()



# Map to label

id2label = {0: "negative", 1: "neutral", 2: "positive"}

predicted_label = id2label[predicted_label_id]

confidence = predictions[0][predicted_label_id].item()



print(f"Label: {predicted_label}")

print(f"Confidence: {confidence:.4f}")

```

## Limitations

- The model may not perform well on very short texts (< 3 words)
- Performance may vary across different domains (social media, news, reviews)
- Some ambiguous neutral expressions may still be misclassified
- Negative class performance may vary on different text types

## Citation

If you use this model, please cite:

```bibtex

@misc{turkish-sentiment-analysis-finetuned,

  title={Turkish Sentiment Analysis Model (Fine-tuned)},

  author={codealchemist01},

  year={2024},

  base_model={codealchemist01/turkish-sentiment-analysis},

  howpublished={\url{https://huggingface.co/codealchemist01/turkish-sentiment-analysis-finetuned}}

}

```

## License

Apache 2.0