Spaces:
Sleeping
Sleeping
| {% extends "base.html" %} | |
| {% block title %}Task 1 - Simanta Benchmark{% endblock %} | |
| {% block content %} | |
| <section class="task-page task1-page"> | |
| <div class="task-header"> | |
| <span class="task-number">01</span> | |
| <h1>Classical-vs-PTLM Benchmark</h1> | |
| <p class="task-subtitle">Do pretrained language models actually beat strong classical baselines on sentiment and sarcasm?</p> | |
| </div> | |
| <div class="task-body"> | |
| <section class="task-card"> | |
| <h2><i class="fa-solid fa-book-open"></i> What Simanta Did</h2> | |
| <p> | |
| Simanta compared three classical models against three pretrained transformer language models across two binary tasks: | |
| sentiment analysis and sarcasm detection. That gives twelve model artifacts in one controlled comparison. | |
| </p> | |
| <ul class="task-list"> | |
| <li>Classical models used the shared <code>text_classical</code> preprocessing column with TF-IDF features.</li> | |
| <li>The classical set covered Logistic Regression, Linear SVM, and Random Forest.</li> | |
| <li>The PTLM set covered ALBERT, RoBERTa, and DistilBERT with max length 64, 2 epochs, and learning rate 2e-5.</li> | |
| <li>Every model was evaluated on both <code>Sentiment</code> and <code>Sarcasm</code> labels.</li> | |
| </ul> | |
| </section> | |
| <section class="chat-section task1-chat-section"> | |
| <div class="chat-container"> | |
| <div class="controls-row"> | |
| <div class="control-group"> | |
| <label for="task1TaskSelect"><i class="fa-solid fa-list-check"></i> Task</label> | |
| <select id="task1TaskSelect"> | |
| <option value="sentiment" selected>Sentiment Analysis</option> | |
| <option value="sarcasm">Sarcasm Detection</option> | |
| </select> | |
| </div> | |
| </div> | |
| <div class="model-desc" id="task1ModelDesc"> | |
| Type once and all six models for the selected task will reply independently. | |
| </div> | |
| <div class="chat-log" id="task1ChatLog"> | |
| <div class="msg bot"> | |
| <div class="msg-avatar"><i class="fa-solid fa-robot"></i></div> | |
| <div class="msg-bubble">Select a task, enter text, and compare the six model responses.</div> | |
| </div> | |
| </div> | |
| <div class="chat-input-bar"> | |
| <textarea id="task1UserInput" placeholder="Type your text here..." rows="1"></textarea> | |
| <button id="task1SendBtn" title="Run all Task 1 models"><i class="fa-solid fa-paper-plane"></i></button> | |
| </div> | |
| </div> | |
| </section> | |
| <section class="task-card"> | |
| <h2><i class="fa-solid fa-table"></i> Evaluation Results</h2> | |
| <div class="eval-grid"> | |
| <div> | |
| <h3>Sentiment Analysis</h3> | |
| {% set rows = eval_tables.sentiment %} | |
| {% include "partials/eval_table.html" %} | |
| </div> | |
| <div> | |
| <h3>Sarcasm Detection</h3> | |
| {% set rows = eval_tables.sarcasm %} | |
| {% include "partials/eval_table.html" %} | |
| </div> | |
| </div> | |
| </section> | |
| <section class="task-card"> | |
| <h2><i class="fa-solid fa-chart-column"></i> Visualisations</h2> | |
| <div class="task-figures"> | |
| <figure> | |
| <img src="{{ url_for('static', filename='images/task1/BaselineVsPTLMsBarGraph.png') }}" alt="Baseline vs PTLM macro-F1 bar graph" /> | |
| <figcaption>Baseline vs PTLM macro-F1 (averaged over 3 seeds).</figcaption> | |
| </figure> | |
| <figure> | |
| <img src="{{ url_for('static', filename='images/task1/ConfusionMatrix.png') }}" alt="Confusion matrices for best baseline and best PTLM" /> | |
| <figcaption>Confusion matrices for the best baseline and best PTLM on each task.</figcaption> | |
| </figure> | |
| <figure> | |
| <img src="{{ url_for('static', filename='images/task1/GapAnalysis.png') }}" alt="Per-model macro-F1 gap analysis" /> | |
| <figcaption>Per-model macro-F1 gap between baselines and PTLMs.</figcaption> | |
| </figure> | |
| </div> | |
| </section> | |
| <section class="task-card"> | |
| <h2><i class="fa-solid fa-lightbulb"></i> Takeaway</h2> | |
| <p> | |
| PTLMs win clearly on sentiment, where RoBERTa reaches the strongest macro-F1. Sarcasm is much harder: | |
| classical Logistic Regression outperforms every PTLM in this benchmark. | |
| </p> | |
| </section> | |
| </div> | |
| </section> | |
| {% endblock %} | |
| {% block scripts %} | |
| <script> | |
| </script> | |
| <script src="{{ url_for('static', filename='js/task1_chat.js') }}"></script> | |
| {% endblock %} | |