metadata
language:
- hu
license: mit
tags:
- sentiment-analysis
- xlm-roberta
- hungarian
- text-classification
datasets:
- custom
metrics:
- accuracy
- f1
pipeline_tag: text-classification
Sentiment
Fine-tuned xlm-roberta-base for Hungarian sentiment classification.
Model Details
- Base model:
xlm-roberta-base - Task: 3-class sentiment classification (negative / neutral / positive)
- Language: Hungarian
- Training data: ~37K sentences (stratified split from ~46K total)
- Class weighting: Balanced weights applied during training to handle class imbalance
Labels
| Label ID | Label | Description |
|---|---|---|
| 0 | negative | Negative sentiment |
| 1 | neutral | Neutral sentiment |
| 2 | positive | Positive sentiment |
Overall Results
| Metric | Value |
|---|---|
| Accuracy | 0.8442320225939605 |
| F1 (macro) | 0.8387464047460437 |
| F1 (weighted) | 0.8435908941071462 |
Per-Language Results
| Language | Samples | Accuracy | F1 (macro) | F1 (weighted) |
|---|---|---|---|---|
| hun | 4603 | 0.8442 | 0.8387 | 0.8436 |
Usage
from transformers import pipeline
classifier = pipeline("text-classification", model="ringorsolya/Sentiment")
classifier("Ez egy fantasztikus nap!")
# [{'label': 'positive', 'score': 0.95}]
classifier("Szörnyű volt a kiszolgálás.")
# [{'label': 'negative', 'score': 0.92}]
Training Details
- Epochs: 5
- Batch size: 32
- Learning rate: 2e-05
- Weight decay: 0.01
- Warmup ratio: 0.1
- Max sequence length: 128
- FP16: True
- Class weights: [0.8114, 1.1219, 1.1413]