File size: 11,042 Bytes

2c6d2a9

---
language: tr
license: other
license_name: siriusai-premium-v1
license_link: LICENSE
tags:
- turkish
- text-classification
- bert
- nlp
- transformers
- siriusai
- production-ready
- enterprise
base_model: dbmdz/bert-base-turkish-uncased
datasets:
- custom
metrics:
- f1
- precision
- recall
- accuracy
- mcc
library_name: transformers
pipeline_tag: text-classification
model-index:
- name: emotion-tr
  results:
  - task:
      type: text-classification
      name: Text Classification
    metrics:
    - type: f1
      value: 0.9744976471619214
      name: Macro F1
    - type: mcc
      value: 0.9610214790438847
---

# emotion-tr - Turkish Emotion Classification Model

<p align="center">
  <a href="https://huggingface.co/hayatiali/emotion-tr"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-emotion--tr-yellow" alt="Hugging Face"></a>
  <a href="https://huggingface.co/hayatiali/emotion-tr"><img src="https://img.shields.io/badge/Model-Production%20Ready-brightgreen" alt="Production Ready"></a>
  <img src="https://img.shields.io/badge/Language-Turkish-blue" alt="Turkish">
  <img src="https://img.shields.io/badge/Task-Text%20Classification-orange" alt="Text Classification">
</p>

This model is designed for the **classification of emotional sentiments** in Turkish text.

*Developed by SiriusAI Tech Brain Team*

---

## Mission

> **To provide advanced sentiment analysis capabilities for Turkish text, empowering businesses and researchers to understand emotional tones effectively.**

The `emotion-tr` model leverages the **BERT architecture** to deliver high-performance text classification, specifically tailored for the Turkish language. By analyzing sentiments as negative, neutral, or positive, this model facilitates a deeper understanding of customer feedback, social media interactions, and other textual data, proving essential for sentiment-driven applications in various domains.

### Why This Model Matters

- **High Accuracy**: Achieves over **97% accuracy**, making it reliable for various applications.
- **Robust Performance**: Exhibits superior performance across all sentiment categories.
- **Enterprise-Ready**: Designed to meet the demands of production environments with efficient response times.
- **Customizable**: Can be fine-tuned for specific applications beyond emotion classification.
- **Comprehensive Documentation**: Provides extensive guidance for integration and usage.

---

## Model Overview

| Property | Value |
|----------|-------|
| **Architecture** | BertForSequenceClassification |
| **Base Model** | `dbmdz/bert-base-turkish-uncased` |
| **Task** | Text Classification |
| **Language** | Turkish (tr) |
| **Categories** | 3 labels |
| **Model Size** | ~110M parameters |
| **Inference Time** | ~10-15ms (GPU) / ~40-50ms (CPU) |

---

## Performance Metrics

### Final Evaluation Results

| Metric | Score | Description |
|--------|-------|-------------|
| **Macro F1** | **0.9744976471619214** | Harmonic mean of precision and recall |
| **MCC** | **0.9610214790438847** | Matthews Correlation Coefficient |
| **Accuracy** | **97.5557461406518%** | Overall accuracy of the model |

### Per-Class Performance

| Category | Accuracy | Correct | Total |
|----------|----------|---------|-------|
| **negatif** | 97.0% | 700 | 722 |
| **notr** | 98.0% | 1,069 | 1,091 |
| **pozitif** | 97.5% | 506 | 519 |

---

## Dataset

### Dataset Statistics

| Split | Samples | Purpose |
|-------|---------|---------|
| **Train** | 9,322 | Model training |
| **Test** | 2,332 | Model evaluation |
| **Total** | 11,654 | Complete dataset |

### Category Distribution

| Category | Samples | Percentage | Description |
|----------|---------|------------|-------------|
| **sentiment_3class** | 11,654 | 100.0% | sentiment_3class category |

### Subcategory Breakdown

| Category | Subcategories |
|----------|---------------|
| **sentiment_3class** | pozitif, negatif, notr |

---

## Label Definitions

| Label | ID | Description | Turkish Examples |
|-------|-----|-------------|------------------|
| **negatif** | 0 | Indicates negative sentiment | "Bu çok kötü bir film." "Hizmet berbattı." |
| **notr** | 1 | Indicates neutral sentiment | "Bugün hava güzel." "Toplantı yapıldı." |
| **pozitif** | 2 | Indicates positive sentiment | "Harika bir deneyim!" "Çok memnun kaldım." |

### Important: Category Boundaries

When classifying sentiments, the distinction between **notr** and **negatif** can be subtle; for instance, "Bu film sıradan" might be interpreted as neutral, while "Bu film kötü" is clearly negative.

---

## Training Procedure

### Hyperparameters

| Parameter | Value |
|-----------|-------|
| **Base Model** | `dbmdz/bert-base-turkish-uncased` |
| **Max Sequence Length** | 128 tokens |
| **Batch Size** | 16 |
| **Learning Rate** | 2e-5 |
| **Epochs** | 3 |
| **Optimizer** | AdamW |
| **Weight Decay** | 0.01 |
| **Loss Function** | CrossEntropyLoss / Focal Loss |
| **Problem Type** | Single-label Classification |

### Training Environment

| Resource | Specification |
|----------|---------------|
| **Hardware** | Apple Silicon (MPS) / CUDA GPU |
| **Framework** | PyTorch + Transformers |
| **Training Time** | Varies based on dataset size |

---

## Usage

### Installation

```bash
pip install transformers torch
```

### Quick Start

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "hayatiali/emotion-tr"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

LABELS = ["negatif", "notr", "pozitif"]

def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)[0]

    scores = {label: float(prob) for label, prob in zip(LABELS, probs)}
    primary = max(scores, key=scores.get)
    return {"category": primary, "confidence": scores[primary], "all_scores": scores}

# Examples
print(predict("Bu film harika!"))
```

### Production Class

```python
class EmotionClassifier:
    LABELS = ["negatif", "notr", "pozitif"]

    def __init__(self, model_path="hayatiali/emotion-tr"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.model.to(self.device).eval()

    def predict(self, text: str) -> dict:
        inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
        inputs = {k: v.to(self.device) for k, v in inputs.items()}

        with torch.no_grad():
            logits = self.model(**inputs).logits
            probs = torch.softmax(logits, dim=-1)[0].cpu().numpy()

        scores = dict(zip(self.LABELS, probs))
        return {"category": max(scores, key=scores.get), "confidence": max(scores.values()), "scores": scores}
```

### Batch Inference

```python
def predict_batch(texts: list, batch_size: int = 32) -> list:
    results = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        inputs = tokenizer(batch, return_tensors="pt", truncation=True, max_length=128, padding=True)
        inputs = {k: v.to(device) for k, v in inputs.items()}

        with torch.no_grad():
            probs = torch.softmax(model(**inputs).logits, dim=-1).cpu().numpy()

        for prob in probs:
            scores = dict(zip(LABELS, prob))
            results.append(scores)
    return results
```

---

## Limitations & Known Issues

### ⚠️ Model Limitations

| Limitation | Details | Impact |
|------------|---------|--------|
| **Context Sensitivity** | The model may misclassify sentiments in ambiguous contexts | Potentially inaccurate predictions |
| **Domain Adaptability** | Performance may vary across different domains (e.g., social media vs. formal texts) | Requires further fine-tuning for specific applications |
| **Language Nuances** | Subtle linguistic features unique to Turkish may not be perfectly captured | May lead to classification errors in nuanced cases |

### ⚠️ Production Deployment Considerations

| Consideration | Details | Recommendation |
|---------------|---------|----------------|
| **Model Size** | The model is approximately 110M parameters | Ensure adequate resources for deployment |
| **Latency** | Inference time may vary with input length and server load | Optimize batch sizes for improved performance |

### Not Suitable For

- Legal document analysis
- Medical diagnosis based on text
- Any critical decision-making without human oversight

---

## Ethical Considerations

### Intended Use

- Sentiment analysis in customer feedback
- Emotional tone detection in social media posts
- Market research and analysis

### Risks

- **Bias in Data**: The model may reflect biases present in the training data, leading to skewed results.
- **Misinterpretation of Sentiments**: Incorrect sentiment classification could misguide businesses in decision-making.

### Recommendations

1. **Human Oversight**: Always accompany model predictions with human judgment.
2. **Monitoring**: Regularly assess model performance and retrain as necessary.
3. **Updates**: Stay informed about updates to the model and fine-tune based on new data.

---

## Technical Specifications

### Model Architecture

```
BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings
    (encoder): BertEncoder (12 layers)
    (pooler): BertPooler
  )
  (dropout): Dropout(p=0.1)
  (classifier): Linear(in_features=768, out_features=3)
)

Total Parameters: ~110M
```

### Input/Output

- **Input**: Turkish text (max 128 tokens)
- **Output**: 3-dimensional probability vector
- **Tokenizer**: BERTurk WordPiece (32k vocab)

---

## Citation

```bibtex
@misc{emotion-tr-2025,
  title={emotion-tr - Turkish Text Classification Model},
  author={SiriusAI Tech Brain Team},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/hayatiali/emotion-tr}},
  note={Fine-tuned from dbmdz/bert-base-turkish-uncased}
}
```

---

## Model Card Authors

**SiriusAI Tech Brain Team**

## Contact

- **Email**: info@siriusaitech.com
- **Repository**: [GitHub](https://github.com/sirius-tedarik)

---

## Changelog

### v1.0 (Current)
- Initial release
- 3-category text classification
- Macro F1: 0.9744976471619214, MCC: 0.9610214790438847

---

**License**: SiriusAI Tech Premium License v1.0

**Commercial Use**: Requires Premium License. Contact: info@siriusaitech.com

**Free Use Allowed For**:
- Academic research and education
- Non-profit organizations (with approval)
- Evaluation (30 days)

**Disclaimer**: This model is designed for text classification applications. Always implement with appropriate safeguards and human oversight. Model predictions should inform decisions, not replace human judgment.