fewshot-goes-multilingual/cs_csfd-movie-reviews
Viewer • Updated • 30k • 838 • 2
A 3-class sentiment classifier for Czech movie reviews, fine-tuned from Seznam/small-e-czech (13.5M params, Czech ELECTRA).
from transformers import pipeline
classifier = pipeline("text-classification", model="jakubmach/cs-sentiment-small-e-czech-v2")
result = classifier("Tento film byl naprosto úžasný, skvělí herci a výborný příběh!")
print(result) # [{'label': 'positive', 'score': 0.87}]
| Parameter | Value |
|---|---|
| Base model | Seznam/small-e-czech (13.5M params) |
| Dataset | fewshot-goes-multilingual/cs_csfd-movie-reviews |
| Training samples | 2,000 (from 25K total) |
| Epochs | 5 |
| Learning rate | 1e-4 |
| Batch size | 16 |
| Max sequence length | 256 tokens |
| Scheduler | Cosine with 50 warmup steps |
| Weight decay | 0.01 |
| Hardware | CPU (2 vCPU, 16GB RAM) |
| Training time | ~2 hours |
Ratings from CSFD (0-5 stars) mapped to sentiment:
| Method | Accuracy | F1 (macro) | F1 (weighted) |
|---|---|---|---|
| Random baseline | 33.3% | 0.315 | 0.349 |
| Majority class (negative) | 49.6% | 0.221 | 0.329 |
| Keyword heuristic | 33.3% | 0.336 | 0.344 |
| Exp 1 (lr=5e-5, 3ep, 1.5K samples) | 60.8% | 0.421 | 0.542 |
| Exp 2 (this model) (lr=1e-4, 5ep, 2K samples) | 67.7% | 0.488 | 0.619 |
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| positive | 0.586 | 0.799 | 0.676 | 844 |
| negative | 0.755 | 0.821 | 0.787 | 1,241 |
| neutral | 0.000 | 0.000 | 0.000 | 415 |
| Pred: positive | Pred: negative | Pred: neutral | |
|---|---|---|---|
| Gold: positive | 674 | 170 | 0 |
| Gold: negative | 222 | 1,019 | 0 |
| Gold: neutral | 254 | 161 | 0 |
ufal/robeczech-base at 125M params), would likely push accuracy well above 75%.CC-BY-4.0 (inherited from base model Seznam/small-e-czech)