Twitter Sentiment PL (base)

Twitter Sentiment PL (base) is a Polish-language sentiment analysis model fine-tuned from allegro/herbert-base-cased on a Polish translation of the TweetEval dataset (Barbieri et al., 2020). It predicts one of three sentiment classes for short, tweet-style Polish text.

Model Details

Developed by: bards.ai
Model type: Transformer encoder (BERT-style) fine-tuned for sequence classification
Language: Polish (pl)
License: CC BY 4.0 (inherited from the base model)
Finetuned from: allegro/herbert-base-cased
Labels: positive, negative, neutral

Intended Uses & Limitations

Intended uses

Sentiment analysis of Polish short-form social media text (tweets, comments, short posts).
Research and prototyping for Polish-language NLP applications.

Out-of-scope / limitations

The model was trained on a machine-translated version of TweetEval, so it inherits translation artifacts and may underperform on idiomatic Polish that differs in style from the translated training data.
Performance on long-form text, formal Polish (news, legal, medical), or non-Twitter domains is not guaranteed.
Like any sentiment model trained on social media, predictions may reflect biases present in the source data. Do not use as the sole signal in moderation, hiring, or other high-stakes decisions.

How to Use

With the pipeline API:

from transformers import pipeline

nlp = pipeline("sentiment-analysis", model="bardsai/twitter-sentiment-pl-base")
nlp("Nigdy przegrana nie sprawiła mi takiej radości. Szczęście i Opatrzność mają znaczenie Gratuluje @pzpn_pl")
# [{'label': 'positive', 'score': 0.9997233748435974}]

Or loading the model and tokenizer directly:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("bardsai/twitter-sentiment-pl-base")
model = AutoModelForSequenceClassification.from_pretrained("bardsai/twitter-sentiment-pl-base")

Training

Base model: allegro/herbert-base-cased
Training data: TweetEval (sentiment subset) machine-translated into Polish.
Epochs: 10
Hardware: Single NVIDIA RTX 3090

Evaluation

Evaluated on the held-out test split (translated TweetEval, sentiment task) on an RTX 3090.

Metric	Value
F1 (macro)	0.658
Precision (macro)	0.655
Recall (macro)	0.662
Accuracy	0.662
Samples per second	129.9

License

This model is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, inherited from the base model allegro/herbert-base-cased, which is also distributed under CC BY 4.0.

You are free to share and adapt the model, including for commercial use, provided you give appropriate credit to:

HerBERT — Allegro ML Research and the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences.
Twitter Sentiment PL (base) — bards.ai.

Citation

If you use this model, please cite HerBERT and TweetEval:

@inproceedings{mroczkowski-etal-2021-herbert,
    title = "{H}er{BERT}: Efficiently Pretrained Transformer-based Language Model for {P}olish",
    author = "Mroczkowski, Robert and Rybak, Piotr and Wr{\'o}blewska, Alina and Gawlik, Ireneusz",
    booktitle = "Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing",
    year = "2021",
    publisher = "Association for Computational Linguistics",
    pages = "1--10",
}

@inproceedings{barbieri-etal-2020-tweeteval,
    title = "{T}weet{E}val: Unified Benchmark and Comparative Evaluation for Tweet Classification",
    author = "Barbieri, Francesco and Camacho-Collados, Jose and Espinosa Anke, Luis and Neves, Leonardo",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    pages = "1644--1650",
}

Changelog

2022-12-01 — Initial release
2023-07-19 — Improvement of translation quality
2026-05-25 — Model card updated: license metadata (CC BY 4.0) and structure aligned with Hugging Face model card guidelines

About bards.ai

At bards.ai we focus on providing machine learning expertise to our partners, particularly in NLP, computer vision and time series analysis. Our team is based in Wrocław, Poland.

If you use our model we'd love to hear about it. For questions or collaboration, contact us at info@bards.ai.

Downloads last month: 1,235

Safetensors

Model size

0.1B params

Tensor type

I64

F32

Model tree for bardsai/twitter-sentiment-pl-base

Base model

allegro/herbert-base-cased

Finetuned

(12)

this model

Dataset used to train bardsai/twitter-sentiment-pl-base

Collection including bardsai/twitter-sentiment-pl-base

Localized Sentiment Models

Collection

A group of sentiment detection models dedicated for specific languages • 2 items • Updated Jan 10, 2024 • 1

Evaluation results

F1 (macro) on TweetEval (translated to Polish)
self-reported

0.658
Precision (macro) on TweetEval (translated to Polish)
self-reported

0.655
Recall (macro) on TweetEval (translated to Polish)
self-reported

0.662
Accuracy on TweetEval (translated to Polish)
self-reported

0.662