Darmm Sentiment KK
Kazakh sentiment analysis model fine-tuned for 5-class sentiment classification.
Model Description
This model is based on bert-base-multilingual-cased and fine-tuned for sentiment classification of Kazakh text into five classes:
- very_positive
- positive
- neutral
- negative
- very_negative
Usage
Using transformers pipeline
from transformers import pipeline
classifier = pipeline("text-classification", model="Darmm/sentiment-kk")
text = "Бұл фильм маған ұнамады"
print(classifier(text))
Direct model usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "Darmm/sentiment-kk"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "Бұл фильм маған ұнамады"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=-1).item()
labels = ["very_negative", "negative", "neutral", "positive", "very_positive"]
print(f"Predicted: {labels[predicted_class]}")
print(f"Confidence: {predictions[0][predicted_class].item():.2%}")
Training
The model was trained on the Darmm/darmm-sentiment-kk dataset with the following parameters:
- Base Model:
bert-base-multilingual-cased - Epochs: 3
- Batch Size: 16
- Learning Rate: 2e-5 (linear decay)
- Max Length: 256
- Train/Validation/Test Split: ~80/10/10
Metrics
{
"eval_loss": 0.02658640407025814,
"eval_accuracy": 0.9969512195121951,
"eval_runtime": 0.544,
"eval_samples_per_second": 602.947,
"eval_steps_per_second": 38.603,
"epoch": 3.0
}
Paper & Documentation
🇬🇧 English
Darmm: Kazakh Sentiment Analysis (5-Class)
Abstract
We present a Kazakh sentiment classification model based on bert-base-multilingual-cased, fine‑tuned on the Darmm/darmm-sentiment-kk dataset. The model predicts five sentiment classes and achieves high accuracy on the evaluation split.
1. Dataset
- Dataset:
Darmm/darmm-sentiment-kk - Labels: very_negative, negative, neutral, positive, very_positive
- Split: ~80/10/10 (train/validation/test)
2. Training
- Base model:
bert-base-multilingual-cased - Epochs: 3
- Batch size: 16
- Learning rate: 2e-5
- Max length: 256
3. Results
- Accuracy: 0.9969512195121951
4. Limitations
- Performance may drop on domains not represented in the dataset.
- Short or ambiguous texts can reduce classification confidence.
🇰🇿 Қазақша
Darmm: Қазақ тіліндегі sentiment талдауы (5 класс)
Аңдатпа
Бұл модель bert-base-multilingual-cased негізінде Darmm/darmm-sentiment-kk деректерінде оқытылып, қазақ тіліндегі 5 классты sentiment жіктеуін орындайды.
1. Деректер
- Деректер жиыны:
Darmm/darmm-sentiment-kk - Кластар: very_negative, negative, neutral, positive, very_positive
- Бөлу: ~80/10/10 (train/validation/test)
2. Оқыту
- Негізгі модель:
bert-base-multilingual-cased - Эпохалар: 3
- Batch size: 16
- Learning rate: 2e-5
- Max length: 256
3. Нәтижелер
- Accuracy: 0.9969512195121951
4. Шектеулер
- Домені басқа мәтіндерде сапа төмендеуі мүмкін.
- Қысқа/екіұшты мәтіндерде сенімділік азаяды.
🇷🇺 Русский
Darmm: Анализ тональности на казахском (5 классов)
Аннотация
Модель на базе bert-base-multilingual-cased, дообученная на Darmm/darmm-sentiment-kk для 5‑классовой классификации тональности казахского текста.
1. Данные
- Датасет:
Darmm/darmm-sentiment-kk - Классы: very_negative, negative, neutral, positive, very_positive
- Сплит: ~80/10/10 (train/validation/test)
2. Обучение
- Базовая модель:
bert-base-multilingual-cased - Эпохи: 3
- Batch size: 16
- Learning rate: 2e-5
- Max length: 256
3. Результаты
- Accuracy: 0.9969512195121951
4. Ограничения
- Качество падает на доменах вне обучающих данных.
- Короткие или неоднозначные тексты снижают уверенность.
Limitations
- Domain coverage depends on collected sources and may be biased.
- Performance may vary outside common review-style text.
Intended use
- Sentiment analysis for Kazakh text across multiple domains.
Model tree for Darmm/darmm-sentiment-kazakh
Base model
google-bert/bert-base-multilingual-casedDataset used to train Darmm/darmm-sentiment-kazakh
Collection including Darmm/darmm-sentiment-kazakh
Evaluation results
- accuracy on Darmm/darmm-sentiment-kkself-reported0.997