Model Card for distilbert_cls-lora-IMDB

Model Details

Model Description

This model is a LoRA-adapted DistilBERT model fine-tuned for binary sentiment classification (POSITIVE / NEGATIVE) on the IMDB movie reviews dataset.

Instead of fine-tuning all parameters, Low-Rank Adaptation (LoRA) was applied to the attention projection layers, enabling efficient training with a small number of trainable parameters while preserving the original pretrained weights.

โš ๏ธ Note: This repository contains LoRA adapter weights only, not a fully merged model.

  • Developed by: Chetan Fernandis
  • Model type: Transformer encoder (DistilBERT) + LoRA adapters
  • Task: Sentiment Classification (Binary)
  • Language(s): English
  • License: Apache-2.0
  • Finetuned from: distilbert-base-uncased

Model Sources

Evaluation Results

The model was evaluated on a held-out validation subset of the IMDB dataset using standard classification metrics.

Confusion Matrix

  • NEGATIVE โ†’ NEGATIVE: 42
  • NEGATIVE โ†’ POSITIVE: 11
  • POSITIVE โ†’ NEGATIVE: 7
  • POSITIVE โ†’ POSITIVE: 40

This indicates balanced performance across both sentiment classes.

Classification Report

image

Summary

  • Overall Accuracy: 82%
  • Balanced F1-score: 0.82 for both classes
  • Strong precision for NEGATIVE reviews
  • Strong recall for POSITIVE reviews

These results demonstrate that LoRA fine-tuning achieves competitive sentiment classification performance while training only a small fraction of model parameters.

image

Uses

Direct Use

This model can be used for sentiment analysis on English text, classifying input sentences or paragraphs as:

  • POSITIVE
  • NEGATIVE

Example use cases:

  • Movie review analysis
  • User feedback classification
  • Opinion mining

Out-of-Scope Use

  • Not suitable for multilingual sentiment analysis
  • Not intended for fine-grained sentiment (e.g., star ratings)
  • Not designed for long documents beyond 512 tokens

Bias, Risks, and Limitations

  • The model inherits biases from the IMDB dataset and the DistilBERT pretraining corpus
  • Performance may degrade on:
    • Informal language
    • Sarcasm
    • Domain-specific jargon
  • Predictions should not be used for high-stakes decisions without human review

How to Get Started with the Model

โš ๏ธ This is a LoRA adapter, so it must be loaded on top of the base model.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel

# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2
)

# Load LoRA adapter
model = PeftModel.from_pretrained(
    base_model,
    "ChetanFernandis/distilbert_cls-lora-IMDB"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "ChetanFernandis/distilbert_cls-lora-IMDB"
)

# Inference
text = "This movie was absolutely fantastic!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

outputs = model(**inputs)
prediction = outputs.logits.argmax(dim=-1).item()

label_map = {0: "NEGATIVE", 1: "POSITIVE"}
print(label_map[prediction])

---

## Training Details
Training Data

Dataset: IMDB Movie Reviews

Samples: 200 training / 100 test

train_ds = imdb_dataset['train'].shuffle(seed=42).select(range(200))
val_ds   = imdb_dataset['test'].shuffle(seed=42).select(range(100))


Labels: Binary (Positive / Negative)

Training Procedure
Preprocessing
 a. Text tokenized using AutoTokenizer
 b.Truncation applied to max sequence length
 c. Padding applied dynamically per batch

Training Hyperparameters:-
  a. Training regime: FP32 (full precision)
  b. Batch size: 8
  c. Gradient accumulation: 4
  d. Epochs: 20
  e. Optimizer: AdamW
  f. LoRA rank (r): 4
  g. LoRA alpha: 8
  h. LoRA dropout: 0.1
  i. Target modules: q_lin, k_lin, v_lin

Speeds, Sizes, Times:-
  a. Trainable parameters: ~700K
  b. Total parameters: ~67M
  c. Trainable %: ~1%
  d. Checkpoint size: ~3โ€“4 MB (adapter only)

Evaluation:-
 Metrics
  a. Accuracy
  b. Loss (Cross-Entropy)

Results:-
The LoRA-adapted model achieves competitive sentiment classification performance compared to full fine-tuning, while significantly reducing memory usage and training cost.

Technical Specifications
Model Architecture
 a. Base architecture: DistilBERT (6 layers, 12 heads)
 b. Hidden size: 768
 c.LoRA injected into: Attention Q, K, V projections
 d. Classification head: 2-class linear classifier

Compute Infrastructure
 a. Hardware
 b. CPU-based training (no GPU required)

Software
 a.transformers
 b.peft
 c. torch
 d.datasets

Citation

If you use this model, please cite the base model and dataset:

@misc{sanh2019distilbert,
  title={DistilBERT, a distilled version of BERT},
  author={Sanh, Victor and others},
  year={2019},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support