Darmm Sentiment KK

Kazakh sentiment analysis model fine-tuned for 5-class sentiment classification.

Model Description

This model is based on bert-base-multilingual-cased and fine-tuned for sentiment classification of Kazakh text into five classes:

  • very_positive
  • positive
  • neutral
  • negative
  • very_negative

Usage

Using transformers pipeline

from transformers import pipeline

classifier = pipeline("text-classification", model="Darmm/sentiment-kk")
text = "Бұл фильм маған ұнамады"
print(classifier(text))

Direct model usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "Darmm/sentiment-kk"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "Бұл фильм маған ұнамады"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

predicted_class = torch.argmax(predictions, dim=-1).item()
labels = ["very_negative", "negative", "neutral", "positive", "very_positive"]
print(f"Predicted: {labels[predicted_class]}")
print(f"Confidence: {predictions[0][predicted_class].item():.2%}")

Training

The model was trained on the Darmm/darmm-sentiment-kk dataset with the following parameters:

  • Base Model: bert-base-multilingual-cased
  • Epochs: 3
  • Batch Size: 16
  • Learning Rate: 2e-5 (linear decay)
  • Max Length: 256
  • Train/Validation/Test Split: ~80/10/10

Metrics

{
  "eval_loss": 0.02658640407025814,
  "eval_accuracy": 0.9969512195121951,
  "eval_runtime": 0.544,
  "eval_samples_per_second": 602.947,
  "eval_steps_per_second": 38.603,
  "epoch": 3.0
}

Paper & Documentation

🇬🇧 English

Darmm: Kazakh Sentiment Analysis (5-Class)

Abstract

We present a Kazakh sentiment classification model based on bert-base-multilingual-cased, fine‑tuned on the Darmm/darmm-sentiment-kk dataset. The model predicts five sentiment classes and achieves high accuracy on the evaluation split.

1. Dataset

  • Dataset: Darmm/darmm-sentiment-kk
  • Labels: very_negative, negative, neutral, positive, very_positive
  • Split: ~80/10/10 (train/validation/test)

2. Training

  • Base model: bert-base-multilingual-cased
  • Epochs: 3
  • Batch size: 16
  • Learning rate: 2e-5
  • Max length: 256

3. Results

  • Accuracy: 0.9969512195121951

4. Limitations

  • Performance may drop on domains not represented in the dataset.
  • Short or ambiguous texts can reduce classification confidence.
🇰🇿 Қазақша

Darmm: Қазақ тіліндегі sentiment талдауы (5 класс)

Аңдатпа

Бұл модель bert-base-multilingual-cased негізінде Darmm/darmm-sentiment-kk деректерінде оқытылып, қазақ тіліндегі 5 классты sentiment жіктеуін орындайды.

1. Деректер

  • Деректер жиыны: Darmm/darmm-sentiment-kk
  • Кластар: very_negative, negative, neutral, positive, very_positive
  • Бөлу: ~80/10/10 (train/validation/test)

2. Оқыту

  • Негізгі модель: bert-base-multilingual-cased
  • Эпохалар: 3
  • Batch size: 16
  • Learning rate: 2e-5
  • Max length: 256

3. Нәтижелер

  • Accuracy: 0.9969512195121951

4. Шектеулер

  • Домені басқа мәтіндерде сапа төмендеуі мүмкін.
  • Қысқа/екіұшты мәтіндерде сенімділік азаяды.
🇷🇺 Русский

Darmm: Анализ тональности на казахском (5 классов)

Аннотация

Модель на базе bert-base-multilingual-cased, дообученная на Darmm/darmm-sentiment-kk для 5‑классовой классификации тональности казахского текста.

1. Данные

  • Датасет: Darmm/darmm-sentiment-kk
  • Классы: very_negative, negative, neutral, positive, very_positive
  • Сплит: ~80/10/10 (train/validation/test)

2. Обучение

  • Базовая модель: bert-base-multilingual-cased
  • Эпохи: 3
  • Batch size: 16
  • Learning rate: 2e-5
  • Max length: 256

3. Результаты

  • Accuracy: 0.9969512195121951

4. Ограничения

  • Качество падает на доменах вне обучающих данных.
  • Короткие или неоднозначные тексты снижают уверенность.

Limitations

  • Domain coverage depends on collected sources and may be biased.
  • Performance may vary outside common review-style text.

Intended use

  • Sentiment analysis for Kazakh text across multiple domains.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Darmm/darmm-sentiment-kazakh

Finetuned
(935)
this model

Dataset used to train Darmm/darmm-sentiment-kazakh

Collection including Darmm/darmm-sentiment-kazakh

Evaluation results