{% extends "base.html" %} {% block title %}Task 1 - Simanta Benchmark{% endblock %} {% block content %}
01

Classical-vs-PTLM Benchmark

Do pretrained language models actually beat strong classical baselines on sentiment and sarcasm?

What Simanta Did

Simanta compared three classical models against three pretrained transformer language models across two binary tasks: sentiment analysis and sarcasm detection. That gives twelve model artifacts in one controlled comparison.

  • Classical models used the shared text_classical preprocessing column with TF-IDF features.
  • The classical set covered Logistic Regression, Linear SVM, and Random Forest.
  • The PTLM set covered ALBERT, RoBERTa, and DistilBERT with max length 64, 2 epochs, and learning rate 2e-5.
  • Every model was evaluated on both Sentiment and Sarcasm labels.
Type once and all six models for the selected task will reply independently.
Select a task, enter text, and compare the six model responses.

Evaluation Results

Sentiment Analysis

{% set rows = eval_tables.sentiment %} {% include "partials/eval_table.html" %}

Sarcasm Detection

{% set rows = eval_tables.sarcasm %} {% include "partials/eval_table.html" %}

Visualisations

Baseline vs PTLM macro-F1 bar graph
Baseline vs PTLM macro-F1 (averaged over 3 seeds).
Confusion matrices for best baseline and best PTLM
Confusion matrices for the best baseline and best PTLM on each task.
Per-model macro-F1 gap analysis
Per-model macro-F1 gap between baselines and PTLMs.

Takeaway

PTLMs win clearly on sentiment, where RoBERTa reaches the strongest macro-F1. Sarcasm is much harder: classical Logistic Regression outperforms every PTLM in this benchmark.

{% endblock %} {% block scripts %} {% endblock %}