{% extends "base.html" %} {% block title %}Task 3 — Dialect-Aware LoRA Adapters{% endblock %} {% block content %}
03

Dialect-Aware LoRA Sarcasm Detection

Can a single 1.1B-parameter LLM be cheaply specialised per English variety with LoRA adapters?

What Omkar Did

Omkar fine-tuned TinyLlama-1.1B-Chat-v1.0 with a separate LoRA adapter for each English variety in the BESSTIE sarcasm dataset (en-AU, en-IN, en-UK) and evaluated every adapter across all three test sets, producing a full 3×3 cross-variety evaluation matrix.

  • Base model: TinyLlama-1.1B-Chat-v1.0 (frozen) with PEFT LoRA adapters — r=16, alpha=32, dropout=0.05, all-linear target modules.
  • Task framing: the assistant answers "yes" or "no" to a single sarcasm question, with completion-only loss applied to that one label token.
  • Class balance: a WeightedRandomSampler rebalances training exposure without duplicating rows; evaluation uses a validation-tuned threshold on logit_yes − logit_no.
  • Training recipe: max-len 512, batch 8 × 2 grad-accum, 5 epochs, AdamW LR 2e-4 with cosine schedule, warmup ratio 0.05, weight decay 0.01, bf16, best-checkpoint by validation Macro-F1.
  • Every score is averaged over 3 random seeds [42, 123, 2024]. The best-performing adapter per variety is published on the Hugging Face Hub.
Type once and every dialect-tuned LoRA adapter will reply independently.
Send a message and compare what each variety-tuned LoRA adapter predicts. (First call per dialect lazy-loads the adapter, so expect a short delay.)

Cross-Variety Evaluation

Macro-F1 / Macro-P / Macro-R reported as mean ± std over 3 seeds. Rows are the variety the adapter was trained on; the test variety changes within each row.

Sarcasm Detection

{% set rows = eval_tables.sarcasm %} {% include "partials/cross_variety_table.html" %}

Visualisations

Cross-variety mean Macro-F1 heatmap
Cross-variety Macro-F1 heatmap (mean over 3 seeds).
Macro-F1 by train/test variety with seed std-dev
Macro-F1 by train/test variety with seed standard deviation.
Seed-averaged Macro-F1 with seed standard deviation
Seed-averaged Macro-F1 across test varieties.
Mean confusion matrices across seeds
Mean confusion matrices across seeds (3×3 grid).
Validation Macro-F1 vs training step
Validation Macro-F1 vs. training step.

Takeaway

On its own dialect, the en-UK adapter leads with Macro-F1 0.7724 ± 0.0088, closely followed by en-AU at 0.7603 ± 0.0291. The en-IN adapter lags at 0.5964 ± 0.0817 with a much wider seed spread — sarcasm in Indian English is the hardest of the three for this 1.1B-parameter base.

Cross-variety transfer collapses just like in Ryan's RoBERTa sarcasm matrix: every off-diagonal cell drops well below the same-dialect score, with the worst pair (en-AUen-IN) falling to 0.50. Sarcasm cues clearly remain dialect-specific even when the underlying model is a much larger LLM.

{% endblock %} {% block scripts %} {% endblock %}