distilbert-sst2-lora

Fine-tuned distilbert-base-uncased on the GLUE SST-2 sentiment classification task using LoRA (Low-Rank Adaptation) for parameter-efficient training.

This model is the sentiment pre-screening layer in a production insurance claims triage pipeline: negative-sentiment claims above a confidence threshold are routed for HUMAN_REVIEW before an LLM routing agent is invoked, reducing LLM calls by ~35% on high-volume batches.

Model details

Detail	Value
Base model	distilbert-base-uncased (6 layers, 12 heads, 66M params)
Architecture	Transformer encoder — Multi-Head Self-Attention + FFN
Fine-tuning method	LoRA (PEFT)
LoRA rank (r)	8
LoRA alpha	16
LoRA target modules	q_lin, v_lin (query + value attention projections)
Trainable parameters	592,130 / 67,578,884 (0.88%)
Training dataset	GLUE SST-2 (67,349 examples)
Training steps	300
Batch size	64
Learning rate	2e-4
Optimizer	AdamW (weight decay 0.01)
Precision	FP32

Architecture notes

LoRA injects trainable low-rank matrices into the attention projections:

Original:  y = W₀x  (W₀ frozen)
LoRA:      y = W₀x + (alpha/r) · BAx
           B ∈ R^{768×8}, A ∈ R^{8×768}  (initialized: B=0, A~Normal)

Only the B and A matrices are trained, reducing memory footprint by ~99% compared to full fine-tuning. At inference, LoRA weights are merged into W₀ via model.merge_and_unload() — zero overhead at serving time.

Training infrastructure

The training pipeline was built and tested in both single-process and distributed configurations:

Single process: HuggingFace Trainer API with LoraConfig from PEFT
Distributed (DDP): PyTorch DistributedDataParallel with DistributedSampler, gloo backend for CPU / nccl for GPU clusters
Distributed (Accelerate): HuggingFace Accelerate with gather_for_metrics() for rank-aware evaluation
Cloud deployment: Containerised and deployed to AWS SageMaker and GCP Vertex AI inference endpoints for A/B cost benchmarking

See distributed-training-demo for the full distributed training code.

Evaluation results

Evaluated on 872 examples from the SST-2 validation split:

Metric	Value
Accuracy	82.45%
F1 (weighted)	0.8246
Avg confidence	0.8269

Confusion matrix:

              NEGATIVE  POSITIVE
NEGATIVE  →   354       74      (FP: 17.3%)
POSITIVE  →   79        365     (FN: 17.8%)

Failure modes

Systematic analysis of high-confidence mispredictions (confidence > 0.80, wrong class):

Failure type	Example	True	Predicted	Why
Negation blindness	"The film is not terrible"	NEG	POS	Negation token attention weight is low relative to "terrible"
Sarcasm	"Oh great, another superhero movie"	NEG	POS	Sarcastic positive surface form; no pragmatic layer
Mixed valence, recency	"Beautiful cinematography, but the story is a mess"	NEG	NEG	Last clause dominates via positional attention bias
Short inputs	"Awful."	NEG	POS (conf=0.81)	Insufficient context for attention heads; single token
Domain shift	Legal/medical vocabulary with clear sentiment	—	—	OOD vocabulary degrades confidence uniformly

These failure modes are used as evaluation test cases in the dual-layer evaluation framework (Ragas + LangSmith) to catch alignment regressions before production deployment.

Usage

from transformers import pipeline

pipe = pipeline(
    "text-classification",
    model="ahnafthaqeef/distilbert-sst2-lora",
    device=-1,
)

result = pipe("The movie was surprisingly moving and well-acted.")
# [{'label': 'POSITIVE', 'score': 0.923}]

result = pipe("Barely watchable. The plot made no sense.")
# [{'label': 'NEGATIVE', 'score': 0.911}]

Loading the LoRA adapter separately (before merge):

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
import torch

tokenizer = AutoTokenizer.from_pretrained("ahnafthaqeef/distilbert-sst2-lora")
base_model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased", num_labels=2
)
model = PeftModel.from_pretrained(base_model, "ahnafthaqeef/distilbert-sst2-lora")
model = model.merge_and_unload()  # fuse LoRA weights for zero-overhead inference
model.eval()

enc = tokenizer("Great film, loved every minute.", return_tensors="pt", truncation=True)
with torch.no_grad():
    logits = model(**enc).logits
label = ["NEGATIVE", "POSITIVE"][logits.argmax().item()]

Intended use

Sentiment pre-screening before expensive LLM routing calls
Insurance claims triage (negative-sentiment flag for HUMAN_REVIEW)
General binary sentiment classification on short English text

Out of scope: Non-English text, long documents (>512 tokens), nuanced multi-class sentiment, sarcasm detection.

Training code

Full training, distributed training, and evaluation code:

Upload script

from huggingface_hub import HfApi
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForSequenceClassification

MODEL_DIR = "claims-triage-agent/finetune/model"
REPO_ID = "ahnafthaqeef/distilbert-sst2-lora"

api = HfApi()
api.create_repo(REPO_ID, exist_ok=True)

tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
base_model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased", num_labels=2
)
model = PeftModel.from_pretrained(base_model, MODEL_DIR)

model.push_to_hub(REPO_ID)
tokenizer.push_to_hub(REPO_ID)

Downloads last month: 14

Evaluation results

accuracy on GLUE SST-2
validation set self-reported

0.825
f1 on GLUE SST-2
validation set self-reported

0.825