|
|
--- |
|
|
language: tr |
|
|
license: other |
|
|
license_name: siriusai-premium-v1 |
|
|
license_link: LICENSE |
|
|
tags: |
|
|
- turkish |
|
|
- text-classification |
|
|
- bert |
|
|
- nlp |
|
|
- transformers |
|
|
- siriusai |
|
|
- production-ready |
|
|
- enterprise |
|
|
base_model: dbmdz/bert-base-turkish-uncased |
|
|
datasets: |
|
|
- custom |
|
|
metrics: |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
- accuracy |
|
|
- mcc |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
model-index: |
|
|
- name: emotion-tr |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Text Classification |
|
|
metrics: |
|
|
- type: f1 |
|
|
value: 0.9744976471619214 |
|
|
name: Macro F1 |
|
|
- type: mcc |
|
|
value: 0.9610214790438847 |
|
|
--- |
|
|
|
|
|
# emotion-tr - Turkish Emotion Classification Model |
|
|
|
|
|
<p align="center"> |
|
|
<a href="https://huggingface.co/hayatiali/emotion-tr"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-emotion--tr-yellow" alt="Hugging Face"></a> |
|
|
<a href="https://huggingface.co/hayatiali/emotion-tr"><img src="https://img.shields.io/badge/Model-Production%20Ready-brightgreen" alt="Production Ready"></a> |
|
|
<img src="https://img.shields.io/badge/Language-Turkish-blue" alt="Turkish"> |
|
|
<img src="https://img.shields.io/badge/Task-Text%20Classification-orange" alt="Text Classification"> |
|
|
</p> |
|
|
|
|
|
This model is designed for the **classification of emotional sentiments** in Turkish text. |
|
|
|
|
|
*Developed by SiriusAI Tech Brain Team* |
|
|
|
|
|
--- |
|
|
|
|
|
## Mission |
|
|
|
|
|
> **To provide advanced sentiment analysis capabilities for Turkish text, empowering businesses and researchers to understand emotional tones effectively.** |
|
|
|
|
|
The `emotion-tr` model leverages the **BERT architecture** to deliver high-performance text classification, specifically tailored for the Turkish language. By analyzing sentiments as negative, neutral, or positive, this model facilitates a deeper understanding of customer feedback, social media interactions, and other textual data, proving essential for sentiment-driven applications in various domains. |
|
|
|
|
|
### Why This Model Matters |
|
|
|
|
|
- **High Accuracy**: Achieves over **97% accuracy**, making it reliable for various applications. |
|
|
- **Robust Performance**: Exhibits superior performance across all sentiment categories. |
|
|
- **Enterprise-Ready**: Designed to meet the demands of production environments with efficient response times. |
|
|
- **Customizable**: Can be fine-tuned for specific applications beyond emotion classification. |
|
|
- **Comprehensive Documentation**: Provides extensive guidance for integration and usage. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| **Architecture** | BertForSequenceClassification | |
|
|
| **Base Model** | `dbmdz/bert-base-turkish-uncased` | |
|
|
| **Task** | Text Classification | |
|
|
| **Language** | Turkish (tr) | |
|
|
| **Categories** | 3 labels | |
|
|
| **Model Size** | ~110M parameters | |
|
|
| **Inference Time** | ~10-15ms (GPU) / ~40-50ms (CPU) | |
|
|
|
|
|
--- |
|
|
|
|
|
## Performance Metrics |
|
|
|
|
|
### Final Evaluation Results |
|
|
|
|
|
| Metric | Score | Description | |
|
|
|--------|-------|-------------| |
|
|
| **Macro F1** | **0.9744976471619214** | Harmonic mean of precision and recall | |
|
|
| **MCC** | **0.9610214790438847** | Matthews Correlation Coefficient | |
|
|
| **Accuracy** | **97.5557461406518%** | Overall accuracy of the model | |
|
|
|
|
|
### Per-Class Performance |
|
|
|
|
|
| Category | Accuracy | Correct | Total | |
|
|
|----------|----------|---------|-------| |
|
|
| **negatif** | 97.0% | 700 | 722 | |
|
|
| **notr** | 98.0% | 1,069 | 1,091 | |
|
|
| **pozitif** | 97.5% | 506 | 519 | |
|
|
|
|
|
--- |
|
|
|
|
|
## Dataset |
|
|
|
|
|
### Dataset Statistics |
|
|
|
|
|
| Split | Samples | Purpose | |
|
|
|-------|---------|---------| |
|
|
| **Train** | 9,322 | Model training | |
|
|
| **Test** | 2,332 | Model evaluation | |
|
|
| **Total** | 11,654 | Complete dataset | |
|
|
|
|
|
### Category Distribution |
|
|
|
|
|
| Category | Samples | Percentage | Description | |
|
|
|----------|---------|------------|-------------| |
|
|
| **sentiment_3class** | 11,654 | 100.0% | sentiment_3class category | |
|
|
|
|
|
### Subcategory Breakdown |
|
|
|
|
|
| Category | Subcategories | |
|
|
|----------|---------------| |
|
|
| **sentiment_3class** | pozitif, negatif, notr | |
|
|
|
|
|
--- |
|
|
|
|
|
## Label Definitions |
|
|
|
|
|
| Label | ID | Description | Turkish Examples | |
|
|
|-------|-----|-------------|------------------| |
|
|
| **negatif** | 0 | Indicates negative sentiment | "Bu çok kötü bir film." "Hizmet berbattı." | |
|
|
| **notr** | 1 | Indicates neutral sentiment | "Bugün hava güzel." "Toplantı yapıldı." | |
|
|
| **pozitif** | 2 | Indicates positive sentiment | "Harika bir deneyim!" "Çok memnun kaldım." | |
|
|
|
|
|
### Important: Category Boundaries |
|
|
|
|
|
When classifying sentiments, the distinction between **notr** and **negatif** can be subtle; for instance, "Bu film sıradan" might be interpreted as neutral, while "Bu film kötü" is clearly negative. |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Procedure |
|
|
|
|
|
### Hyperparameters |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| **Base Model** | `dbmdz/bert-base-turkish-uncased` | |
|
|
| **Max Sequence Length** | 128 tokens | |
|
|
| **Batch Size** | 16 | |
|
|
| **Learning Rate** | 2e-5 | |
|
|
| **Epochs** | 3 | |
|
|
| **Optimizer** | AdamW | |
|
|
| **Weight Decay** | 0.01 | |
|
|
| **Loss Function** | CrossEntropyLoss / Focal Loss | |
|
|
| **Problem Type** | Single-label Classification | |
|
|
|
|
|
### Training Environment |
|
|
|
|
|
| Resource | Specification | |
|
|
|----------|---------------| |
|
|
| **Hardware** | Apple Silicon (MPS) / CUDA GPU | |
|
|
| **Framework** | PyTorch + Transformers | |
|
|
| **Training Time** | Varies based on dataset size | |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install transformers torch |
|
|
``` |
|
|
|
|
|
### Quick Start |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
model_name = "hayatiali/emotion-tr" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
model.eval() |
|
|
|
|
|
LABELS = ["negatif", "notr", "pozitif"] |
|
|
|
|
|
def predict(text): |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
probs = torch.softmax(outputs.logits, dim=-1)[0] |
|
|
|
|
|
scores = {label: float(prob) for label, prob in zip(LABELS, probs)} |
|
|
primary = max(scores, key=scores.get) |
|
|
return {"category": primary, "confidence": scores[primary], "all_scores": scores} |
|
|
|
|
|
# Examples |
|
|
print(predict("Bu film harika!")) |
|
|
``` |
|
|
|
|
|
### Production Class |
|
|
|
|
|
```python |
|
|
class EmotionClassifier: |
|
|
LABELS = ["negatif", "notr", "pozitif"] |
|
|
|
|
|
def __init__(self, model_path="hayatiali/emotion-tr"): |
|
|
self.tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
|
self.model = AutoModelForSequenceClassification.from_pretrained(model_path) |
|
|
self.device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
self.model.to(self.device).eval() |
|
|
|
|
|
def predict(self, text: str) -> dict: |
|
|
inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128) |
|
|
inputs = {k: v.to(self.device) for k, v in inputs.items()} |
|
|
|
|
|
with torch.no_grad(): |
|
|
logits = self.model(**inputs).logits |
|
|
probs = torch.softmax(logits, dim=-1)[0].cpu().numpy() |
|
|
|
|
|
scores = dict(zip(self.LABELS, probs)) |
|
|
return {"category": max(scores, key=scores.get), "confidence": max(scores.values()), "scores": scores} |
|
|
``` |
|
|
|
|
|
### Batch Inference |
|
|
|
|
|
```python |
|
|
def predict_batch(texts: list, batch_size: int = 32) -> list: |
|
|
results = [] |
|
|
for i in range(0, len(texts), batch_size): |
|
|
batch = texts[i:i + batch_size] |
|
|
inputs = tokenizer(batch, return_tensors="pt", truncation=True, max_length=128, padding=True) |
|
|
inputs = {k: v.to(device) for k, v in inputs.items()} |
|
|
|
|
|
with torch.no_grad(): |
|
|
probs = torch.softmax(model(**inputs).logits, dim=-1).cpu().numpy() |
|
|
|
|
|
for prob in probs: |
|
|
scores = dict(zip(LABELS, prob)) |
|
|
results.append(scores) |
|
|
return results |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations & Known Issues |
|
|
|
|
|
### ⚠️ Model Limitations |
|
|
|
|
|
| Limitation | Details | Impact | |
|
|
|------------|---------|--------| |
|
|
| **Context Sensitivity** | The model may misclassify sentiments in ambiguous contexts | Potentially inaccurate predictions | |
|
|
| **Domain Adaptability** | Performance may vary across different domains (e.g., social media vs. formal texts) | Requires further fine-tuning for specific applications | |
|
|
| **Language Nuances** | Subtle linguistic features unique to Turkish may not be perfectly captured | May lead to classification errors in nuanced cases | |
|
|
|
|
|
### ⚠️ Production Deployment Considerations |
|
|
|
|
|
| Consideration | Details | Recommendation | |
|
|
|---------------|---------|----------------| |
|
|
| **Model Size** | The model is approximately 110M parameters | Ensure adequate resources for deployment | |
|
|
| **Latency** | Inference time may vary with input length and server load | Optimize batch sizes for improved performance | |
|
|
|
|
|
### Not Suitable For |
|
|
|
|
|
- Legal document analysis |
|
|
- Medical diagnosis based on text |
|
|
- Any critical decision-making without human oversight |
|
|
|
|
|
--- |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
### Intended Use |
|
|
|
|
|
- Sentiment analysis in customer feedback |
|
|
- Emotional tone detection in social media posts |
|
|
- Market research and analysis |
|
|
|
|
|
### Risks |
|
|
|
|
|
- **Bias in Data**: The model may reflect biases present in the training data, leading to skewed results. |
|
|
- **Misinterpretation of Sentiments**: Incorrect sentiment classification could misguide businesses in decision-making. |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
1. **Human Oversight**: Always accompany model predictions with human judgment. |
|
|
2. **Monitoring**: Regularly assess model performance and retrain as necessary. |
|
|
3. **Updates**: Stay informed about updates to the model and fine-tune based on new data. |
|
|
|
|
|
--- |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
### Model Architecture |
|
|
|
|
|
``` |
|
|
BertForSequenceClassification( |
|
|
(bert): BertModel( |
|
|
(embeddings): BertEmbeddings |
|
|
(encoder): BertEncoder (12 layers) |
|
|
(pooler): BertPooler |
|
|
) |
|
|
(dropout): Dropout(p=0.1) |
|
|
(classifier): Linear(in_features=768, out_features=3) |
|
|
) |
|
|
|
|
|
Total Parameters: ~110M |
|
|
``` |
|
|
|
|
|
### Input/Output |
|
|
|
|
|
- **Input**: Turkish text (max 128 tokens) |
|
|
- **Output**: 3-dimensional probability vector |
|
|
- **Tokenizer**: BERTurk WordPiece (32k vocab) |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{emotion-tr-2025, |
|
|
title={emotion-tr - Turkish Text Classification Model}, |
|
|
author={SiriusAI Tech Brain Team}, |
|
|
year={2025}, |
|
|
publisher={Hugging Face}, |
|
|
howpublished={\url{https://huggingface.co/hayatiali/emotion-tr}}, |
|
|
note={Fine-tuned from dbmdz/bert-base-turkish-uncased} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Card Authors |
|
|
|
|
|
**SiriusAI Tech Brain Team** |
|
|
|
|
|
## Contact |
|
|
|
|
|
- **Email**: info@siriusaitech.com |
|
|
- **Repository**: [GitHub](https://github.com/sirius-tedarik) |
|
|
|
|
|
--- |
|
|
|
|
|
## Changelog |
|
|
|
|
|
### v1.0 (Current) |
|
|
- Initial release |
|
|
- 3-category text classification |
|
|
- Macro F1: 0.9744976471619214, MCC: 0.9610214790438847 |
|
|
|
|
|
--- |
|
|
|
|
|
**License**: SiriusAI Tech Premium License v1.0 |
|
|
|
|
|
**Commercial Use**: Requires Premium License. Contact: info@siriusaitech.com |
|
|
|
|
|
**Free Use Allowed For**: |
|
|
- Academic research and education |
|
|
- Non-profit organizations (with approval) |
|
|
- Evaluation (30 days) |
|
|
|
|
|
**Disclaimer**: This model is designed for text classification applications. Always implement with appropriate safeguards and human oversight. Model predictions should inform decisions, not replace human judgment. |