Do pretrained language models actually beat strong classical baselines on sentiment and sarcasm?
Simanta compared three classical models against three pretrained transformer language models across two binary tasks: sentiment analysis and sarcasm detection. That gives twelve model artifacts in one controlled comparison.
text_classical preprocessing column with TF-IDF features.Sentiment and Sarcasm labels.
PTLMs win clearly on sentiment, where RoBERTa reaches the strongest macro-F1. Sarcasm is much harder: classical Logistic Regression outperforms every PTLM in this benchmark.