Instructions to use ahnafthaqeef/distilbert-sst2-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use ahnafthaqeef/distilbert-sst2-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForSequenceClassification base_model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased") model = PeftModel.from_pretrained(base_model, "ahnafthaqeef/distilbert-sst2-lora") - Transformers
How to use ahnafthaqeef/distilbert-sst2-lora with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="ahnafthaqeef/distilbert-sst2-lora")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ahnafthaqeef/distilbert-sst2-lora", dtype="auto") - Notebooks
- Google Colab
- Kaggle
distilbert-sst2-lora
Fine-tuned distilbert-base-uncased on the GLUE SST-2 sentiment classification task using LoRA (Low-Rank Adaptation) for parameter-efficient training.
This model is the sentiment pre-screening layer in a production insurance claims triage pipeline: negative-sentiment claims above a confidence threshold are routed for HUMAN_REVIEW before an LLM routing agent is invoked, reducing LLM calls by ~35% on high-volume batches.
Model details
| Detail | Value |
|---|---|
| Base model | distilbert-base-uncased (6 layers, 12 heads, 66M params) |
| Architecture | Transformer encoder โ Multi-Head Self-Attention + FFN |
| Fine-tuning method | LoRA (PEFT) |
| LoRA rank (r) | 8 |
| LoRA alpha | 16 |
| LoRA target modules | q_lin, v_lin (query + value attention projections) |
| Trainable parameters | 592,130 / 67,578,884 (0.88%) |
| Training dataset | GLUE SST-2 (67,349 examples) |
| Training steps | 300 |
| Batch size | 64 |
| Learning rate | 2e-4 |
| Optimizer | AdamW (weight decay 0.01) |
| Precision | FP32 |
Architecture notes
LoRA injects trainable low-rank matrices into the attention projections:
Original: y = Wโx (Wโ frozen)
LoRA: y = Wโx + (alpha/r) ยท BAx
B โ R^{768ร8}, A โ R^{8ร768} (initialized: B=0, A~Normal)
Only the B and A matrices are trained, reducing memory footprint by ~99% compared to full fine-tuning. At inference, LoRA weights are merged into Wโ via model.merge_and_unload() โ zero overhead at serving time.
Training infrastructure
The training pipeline was built and tested in both single-process and distributed configurations:
- Single process: HuggingFace
TrainerAPI withLoraConfigfrom PEFT - Distributed (DDP): PyTorch
DistributedDataParallelwithDistributedSampler,gloobackend for CPU /ncclfor GPU clusters - Distributed (Accelerate): HuggingFace
Acceleratewithgather_for_metrics()for rank-aware evaluation - Cloud deployment: Containerised and deployed to AWS SageMaker and GCP Vertex AI inference endpoints for A/B cost benchmarking
See distributed-training-demo for the full distributed training code.
Evaluation results
Evaluated on 872 examples from the SST-2 validation split:
| Metric | Value |
|---|---|
| Accuracy | 82.45% |
| F1 (weighted) | 0.8246 |
| Avg confidence | 0.8269 |
Confusion matrix:
NEGATIVE POSITIVE
NEGATIVE โ 354 74 (FP: 17.3%)
POSITIVE โ 79 365 (FN: 17.8%)
Failure modes
Systematic analysis of high-confidence mispredictions (confidence > 0.80, wrong class):
| Failure type | Example | True | Predicted | Why |
|---|---|---|---|---|
| Negation blindness | "The film is not terrible" | NEG | POS | Negation token attention weight is low relative to "terrible" |
| Sarcasm | "Oh great, another superhero movie" | NEG | POS | Sarcastic positive surface form; no pragmatic layer |
| Mixed valence, recency | "Beautiful cinematography, but the story is a mess" | NEG | NEG | Last clause dominates via positional attention bias |
| Short inputs | "Awful." | NEG | POS (conf=0.81) | Insufficient context for attention heads; single token |
| Domain shift | Legal/medical vocabulary with clear sentiment | โ | โ | OOD vocabulary degrades confidence uniformly |
These failure modes are used as evaluation test cases in the dual-layer evaluation framework (Ragas + LangSmith) to catch alignment regressions before production deployment.
Usage
from transformers import pipeline
pipe = pipeline(
"text-classification",
model="ahnafthaqeef/distilbert-sst2-lora",
device=-1,
)
result = pipe("The movie was surprisingly moving and well-acted.")
# [{'label': 'POSITIVE', 'score': 0.923}]
result = pipe("Barely watchable. The plot made no sense.")
# [{'label': 'NEGATIVE', 'score': 0.911}]
Loading the LoRA adapter separately (before merge):
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
import torch
tokenizer = AutoTokenizer.from_pretrained("ahnafthaqeef/distilbert-sst2-lora")
base_model = AutoModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased", num_labels=2
)
model = PeftModel.from_pretrained(base_model, "ahnafthaqeef/distilbert-sst2-lora")
model = model.merge_and_unload() # fuse LoRA weights for zero-overhead inference
model.eval()
enc = tokenizer("Great film, loved every minute.", return_tensors="pt", truncation=True)
with torch.no_grad():
logits = model(**enc).logits
label = ["NEGATIVE", "POSITIVE"][logits.argmax().item()]
Intended use
- Sentiment pre-screening before expensive LLM routing calls
- Insurance claims triage (negative-sentiment flag for HUMAN_REVIEW)
- General binary sentiment classification on short English text
Out of scope: Non-English text, long documents (>512 tokens), nuanced multi-class sentiment, sarcasm detection.
Training code
Full training, distributed training, and evaluation code:
Upload script
from huggingface_hub import HfApi
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForSequenceClassification
MODEL_DIR = "claims-triage-agent/finetune/model"
REPO_ID = "ahnafthaqeef/distilbert-sst2-lora"
api = HfApi()
api.create_repo(REPO_ID, exist_ok=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
base_model = AutoModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased", num_labels=2
)
model = PeftModel.from_pretrained(base_model, MODEL_DIR)
model.push_to_hub(REPO_ID)
tokenizer.push_to_hub(REPO_ID)
- Downloads last month
- 14
Evaluation results
- accuracy on GLUE SST-2validation set self-reported0.825
- f1 on GLUE SST-2validation set self-reported0.825