Model Card for distilbert_cls-lora-IMDB
Model Details
Model Description
This model is a LoRA-adapted DistilBERT model fine-tuned for binary sentiment classification (POSITIVE / NEGATIVE) on the IMDB movie reviews dataset.
Instead of fine-tuning all parameters, Low-Rank Adaptation (LoRA) was applied to the attention projection layers, enabling efficient training with a small number of trainable parameters while preserving the original pretrained weights.
โ ๏ธ Note: This repository contains LoRA adapter weights only, not a fully merged model.
- Developed by: Chetan Fernandis
- Model type: Transformer encoder (DistilBERT) + LoRA adapters
- Task: Sentiment Classification (Binary)
- Language(s): English
- License: Apache-2.0
- Finetuned from:
distilbert-base-uncased
Model Sources
- Base Model: https://huggingface.co/distilbert-base-uncased
- Dataset: https://huggingface.co/datasets/imdb
- Repository: https://huggingface.co/ChetanFernandis/distilbert_cls-lora-IMDB
Evaluation Results
The model was evaluated on a held-out validation subset of the IMDB dataset using standard classification metrics.
Confusion Matrix
- NEGATIVE โ NEGATIVE: 42
- NEGATIVE โ POSITIVE: 11
- POSITIVE โ NEGATIVE: 7
- POSITIVE โ POSITIVE: 40
This indicates balanced performance across both sentiment classes.
Classification Report
Summary
- Overall Accuracy: 82%
- Balanced F1-score: 0.82 for both classes
- Strong precision for NEGATIVE reviews
- Strong recall for POSITIVE reviews
These results demonstrate that LoRA fine-tuning achieves competitive sentiment classification performance while training only a small fraction of model parameters.
Uses
Direct Use
This model can be used for sentiment analysis on English text, classifying input sentences or paragraphs as:
POSITIVENEGATIVE
Example use cases:
- Movie review analysis
- User feedback classification
- Opinion mining
Out-of-Scope Use
- Not suitable for multilingual sentiment analysis
- Not intended for fine-grained sentiment (e.g., star ratings)
- Not designed for long documents beyond 512 tokens
Bias, Risks, and Limitations
- The model inherits biases from the IMDB dataset and the DistilBERT pretraining corpus
- Performance may degrade on:
- Informal language
- Sarcasm
- Domain-specific jargon
- Predictions should not be used for high-stakes decisions without human review
How to Get Started with the Model
โ ๏ธ This is a LoRA adapter, so it must be loaded on top of the base model.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased",
num_labels=2
)
# Load LoRA adapter
model = PeftModel.from_pretrained(
base_model,
"ChetanFernandis/distilbert_cls-lora-IMDB"
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"ChetanFernandis/distilbert_cls-lora-IMDB"
)
# Inference
text = "This movie was absolutely fantastic!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
prediction = outputs.logits.argmax(dim=-1).item()
label_map = {0: "NEGATIVE", 1: "POSITIVE"}
print(label_map[prediction])
---
## Training Details
Training Data
Dataset: IMDB Movie Reviews
Samples: 200 training / 100 test
train_ds = imdb_dataset['train'].shuffle(seed=42).select(range(200))
val_ds = imdb_dataset['test'].shuffle(seed=42).select(range(100))
Labels: Binary (Positive / Negative)
Training Procedure
Preprocessing
a. Text tokenized using AutoTokenizer
b.Truncation applied to max sequence length
c. Padding applied dynamically per batch
Training Hyperparameters:-
a. Training regime: FP32 (full precision)
b. Batch size: 8
c. Gradient accumulation: 4
d. Epochs: 20
e. Optimizer: AdamW
f. LoRA rank (r): 4
g. LoRA alpha: 8
h. LoRA dropout: 0.1
i. Target modules: q_lin, k_lin, v_lin
Speeds, Sizes, Times:-
a. Trainable parameters: ~700K
b. Total parameters: ~67M
c. Trainable %: ~1%
d. Checkpoint size: ~3โ4 MB (adapter only)
Evaluation:-
Metrics
a. Accuracy
b. Loss (Cross-Entropy)
Results:-
The LoRA-adapted model achieves competitive sentiment classification performance compared to full fine-tuning, while significantly reducing memory usage and training cost.
Technical Specifications
Model Architecture
a. Base architecture: DistilBERT (6 layers, 12 heads)
b. Hidden size: 768
c.LoRA injected into: Attention Q, K, V projections
d. Classification head: 2-class linear classifier
Compute Infrastructure
a. Hardware
b. CPU-based training (no GPU required)
Software
a.transformers
b.peft
c. torch
d.datasets
Citation
If you use this model, please cite the base model and dataset:
@misc{sanh2019distilbert,
title={DistilBERT, a distilled version of BERT},
author={Sanh, Victor and others},
year={2019},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

