Darmm Sentiment KK

Kazakh sentiment analysis model fine-tuned for 5-class sentiment classification.

Model Description

This model is based on bert-base-multilingual-cased and fine-tuned for sentiment classification of Kazakh text into five classes:

very_positive
positive
neutral
negative
very_negative

Usage

Using transformers pipeline

from transformers import pipeline

classifier = pipeline("text-classification", model="Darmm/sentiment-kk")
text = "Бұл фильм маған ұнамады"
print(classifier(text))

Direct model usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "Darmm/sentiment-kk"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "Бұл фильм маған ұнамады"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

predicted_class = torch.argmax(predictions, dim=-1).item()
labels = ["very_negative", "negative", "neutral", "positive", "very_positive"]
print(f"Predicted: {labels[predicted_class]}")
print(f"Confidence: {predictions[0][predicted_class].item():.2%}")

Training

The model was trained on the Darmm/darmm-sentiment-kk dataset with the following parameters:

Base Model: bert-base-multilingual-cased
Epochs: 3
Batch Size: 16
Learning Rate: 2e-5 (linear decay)
Max Length: 256
Train/Validation/Test Split: ~80/10/10

Metrics

{
  "eval_loss": 0.02658640407025814,
  "eval_accuracy": 0.9969512195121951,
  "eval_runtime": 0.544,
  "eval_samples_per_second": 602.947,
  "eval_steps_per_second": 38.603,
  "epoch": 3.0
}

Paper & Documentation

🇬🇧 English

Darmm: Kazakh Sentiment Analysis (5-Class)

Abstract

We present a Kazakh sentiment classification model based on bert-base-multilingual-cased, fine‑tuned on the Darmm/darmm-sentiment-kk dataset. The model predicts five sentiment classes and achieves high accuracy on the evaluation split.

1. Dataset

Dataset: Darmm/darmm-sentiment-kk
Labels: very_negative, negative, neutral, positive, very_positive
Split: ~80/10/10 (train/validation/test)

2. Training

Base model: bert-base-multilingual-cased
Epochs: 3
Batch size: 16
Learning rate: 2e-5
Max length: 256

3. Results

Accuracy: 0.9969512195121951

4. Limitations

Performance may drop on domains not represented in the dataset.
Short or ambiguous texts can reduce classification confidence.

🇰🇿 Қазақша

Darmm: Қазақ тіліндегі sentiment талдауы (5 класс)

Аңдатпа

Бұл модель bert-base-multilingual-cased негізінде Darmm/darmm-sentiment-kk деректерінде оқытылып, қазақ тіліндегі 5 классты sentiment жіктеуін орындайды.

1. Деректер

Деректер жиыны: Darmm/darmm-sentiment-kk
Кластар: very_negative, negative, neutral, positive, very_positive
Бөлу: ~80/10/10 (train/validation/test)

2. Оқыту

Негізгі модель: bert-base-multilingual-cased
Эпохалар: 3
Batch size: 16
Learning rate: 2e-5
Max length: 256

3. Нәтижелер

Accuracy: 0.9969512195121951

4. Шектеулер

Домені басқа мәтіндерде сапа төмендеуі мүмкін.
Қысқа/екіұшты мәтіндерде сенімділік азаяды.

🇷🇺 Русский

Darmm: Анализ тональности на казахском (5 классов)

Аннотация

Модель на базе bert-base-multilingual-cased, дообученная на Darmm/darmm-sentiment-kk для 5‑классовой классификации тональности казахского текста.

1. Данные

Датасет: Darmm/darmm-sentiment-kk
Классы: very_negative, negative, neutral, positive, very_positive
Сплит: ~80/10/10 (train/validation/test)

2. Обучение

Базовая модель: bert-base-multilingual-cased
Эпохи: 3
Batch size: 16
Learning rate: 2e-5
Max length: 256

3. Результаты

Accuracy: 0.9969512195121951

4. Ограничения

Качество падает на доменах вне обучающих данных.
Короткие или неоднозначные тексты снижают уверенность.

Limitations

Domain coverage depends on collected sources and may be biased.
Performance may vary outside common review-style text.

Intended use

Sentiment analysis for Kazakh text across multiple domains.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Darmm/darmm-sentiment-kazakh

Base model

google-bert/bert-base-multilingual-cased

Finetuned

(1001)

this model

Dataset used to train Darmm/darmm-sentiment-kazakh

Collection including Darmm/darmm-sentiment-kazakh

Darmm Text Classifcation

Collection

Text classification models, that were trained on our own models. Trained on Kazakh datasets • 1 item • Updated Jan 28

Evaluation results

accuracy on Darmm/darmm-sentiment-kk
self-reported

0.997