distilbert-sst2-lora

Fine-tuned distilbert-base-uncased on the GLUE SST-2 sentiment classification task using LoRA (Low-Rank Adaptation) for parameter-efficient training.

This model is the sentiment pre-screening layer in a production insurance claims triage pipeline: negative-sentiment claims above a confidence threshold are routed for HUMAN_REVIEW before an LLM routing agent is invoked, reducing LLM calls by ~35% on high-volume batches.

Model details

Detail Value
Base model distilbert-base-uncased (6 layers, 12 heads, 66M params)
Architecture Transformer encoder โ€” Multi-Head Self-Attention + FFN
Fine-tuning method LoRA (PEFT)
LoRA rank (r) 8
LoRA alpha 16
LoRA target modules q_lin, v_lin (query + value attention projections)
Trainable parameters 592,130 / 67,578,884 (0.88%)
Training dataset GLUE SST-2 (67,349 examples)
Training steps 300
Batch size 64
Learning rate 2e-4
Optimizer AdamW (weight decay 0.01)
Precision FP32

Architecture notes

LoRA injects trainable low-rank matrices into the attention projections:

Original:  y = Wโ‚€x  (Wโ‚€ frozen)
LoRA:      y = Wโ‚€x + (alpha/r) ยท BAx
           B โˆˆ R^{768ร—8}, A โˆˆ R^{8ร—768}  (initialized: B=0, A~Normal)

Only the B and A matrices are trained, reducing memory footprint by ~99% compared to full fine-tuning. At inference, LoRA weights are merged into Wโ‚€ via model.merge_and_unload() โ€” zero overhead at serving time.

Training infrastructure

The training pipeline was built and tested in both single-process and distributed configurations:

  • Single process: HuggingFace Trainer API with LoraConfig from PEFT
  • Distributed (DDP): PyTorch DistributedDataParallel with DistributedSampler, gloo backend for CPU / nccl for GPU clusters
  • Distributed (Accelerate): HuggingFace Accelerate with gather_for_metrics() for rank-aware evaluation
  • Cloud deployment: Containerised and deployed to AWS SageMaker and GCP Vertex AI inference endpoints for A/B cost benchmarking

See distributed-training-demo for the full distributed training code.

Evaluation results

Evaluated on 872 examples from the SST-2 validation split:

Metric Value
Accuracy 82.45%
F1 (weighted) 0.8246
Avg confidence 0.8269

Confusion matrix:

              NEGATIVE  POSITIVE
NEGATIVE  โ†’   354       74      (FP: 17.3%)
POSITIVE  โ†’   79        365     (FN: 17.8%)

Failure modes

Systematic analysis of high-confidence mispredictions (confidence > 0.80, wrong class):

Failure type Example True Predicted Why
Negation blindness "The film is not terrible" NEG POS Negation token attention weight is low relative to "terrible"
Sarcasm "Oh great, another superhero movie" NEG POS Sarcastic positive surface form; no pragmatic layer
Mixed valence, recency "Beautiful cinematography, but the story is a mess" NEG NEG Last clause dominates via positional attention bias
Short inputs "Awful." NEG POS (conf=0.81) Insufficient context for attention heads; single token
Domain shift Legal/medical vocabulary with clear sentiment โ€” โ€” OOD vocabulary degrades confidence uniformly

These failure modes are used as evaluation test cases in the dual-layer evaluation framework (Ragas + LangSmith) to catch alignment regressions before production deployment.

Usage

from transformers import pipeline

pipe = pipeline(
    "text-classification",
    model="ahnafthaqeef/distilbert-sst2-lora",
    device=-1,
)

result = pipe("The movie was surprisingly moving and well-acted.")
# [{'label': 'POSITIVE', 'score': 0.923}]

result = pipe("Barely watchable. The plot made no sense.")
# [{'label': 'NEGATIVE', 'score': 0.911}]

Loading the LoRA adapter separately (before merge):

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
import torch

tokenizer = AutoTokenizer.from_pretrained("ahnafthaqeef/distilbert-sst2-lora")
base_model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased", num_labels=2
)
model = PeftModel.from_pretrained(base_model, "ahnafthaqeef/distilbert-sst2-lora")
model = model.merge_and_unload()  # fuse LoRA weights for zero-overhead inference
model.eval()

enc = tokenizer("Great film, loved every minute.", return_tensors="pt", truncation=True)
with torch.no_grad():
    logits = model(**enc).logits
label = ["NEGATIVE", "POSITIVE"][logits.argmax().item()]

Intended use

  • Sentiment pre-screening before expensive LLM routing calls
  • Insurance claims triage (negative-sentiment flag for HUMAN_REVIEW)
  • General binary sentiment classification on short English text

Out of scope: Non-English text, long documents (>512 tokens), nuanced multi-class sentiment, sarcasm detection.

Training code

Full training, distributed training, and evaluation code:

Upload script

from huggingface_hub import HfApi
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForSequenceClassification

MODEL_DIR = "claims-triage-agent/finetune/model"
REPO_ID = "ahnafthaqeef/distilbert-sst2-lora"

api = HfApi()
api.create_repo(REPO_ID, exist_ok=True)

tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
base_model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased", num_labels=2
)
model = PeftModel.from_pretrained(base_model, MODEL_DIR)

model.push_to_hub(REPO_ID)
tokenizer.push_to_hub(REPO_ID)
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results