File size: 5,522 Bytes
34d8031
 
7af7ac2
 
 
 
 
 
 
34d8031
 
7af7ac2
34d8031
 
 
 
 
7af7ac2
34d8031
7af7ac2
34d8031
7af7ac2
34d8031
7af7ac2
 
 
 
 
 
34d8031
7af7ac2
34d8031
7af7ac2
34d8031
7af7ac2
 
 
34d8031
7441400
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78a0665
7441400
78a0665
7441400
 
 
 
 
 
 
 
 
 
78a0665
34d8031
7af7ac2
34d8031
7af7ac2
34d8031
7af7ac2
34d8031
7af7ac2
 
34d8031
7af7ac2
 
 
 
34d8031
7af7ac2
34d8031
 
 
7af7ac2
 
 
34d8031
7af7ac2
34d8031
 
 
7af7ac2
 
 
 
 
 
34d8031
7af7ac2
34d8031
 
 
7af7ac2
34d8031
7af7ac2
 
 
34d8031
7af7ac2
 
 
 
 
34d8031
7af7ac2
 
 
 
 
34d8031
7af7ac2
 
 
 
34d8031
7af7ac2
 
 
34d8031
7af7ac2
 
34d8031
7af7ac2
 
5361651
fea1fe1
 
 
5361651
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
---
library_name: transformers
tags:
- text-classification
- sentiment-analysis
- lora
- peft
- distilbert
- imdb
---

# Model Card for distilbert_cls-lora-IMDB

## Model Details

### Model Description

This model is a **LoRA-adapted DistilBERT model** fine-tuned for **binary sentiment classification** (POSITIVE / NEGATIVE) on the **IMDB movie reviews dataset**.

Instead of fine-tuning all parameters, **Low-Rank Adaptation (LoRA)** was applied to the attention projection layers, enabling efficient training with a small number of trainable parameters while preserving the original pretrained weights.

⚠️ **Note:** This repository contains **LoRA adapter weights only**, not a fully merged model.

- **Developed by:** Chetan Fernandis
- **Model type:** Transformer encoder (DistilBERT) + LoRA adapters
- **Task:** Sentiment Classification (Binary)
- **Language(s):** English
- **License:** Apache-2.0
- **Finetuned from:** `distilbert-base-uncased`

---

## Model Sources

- **Base Model:** https://huggingface.co/distilbert-base-uncased
- **Dataset:** https://huggingface.co/datasets/imdb
- **Repository:** https://huggingface.co/ChetanFernandis/distilbert_cls-lora-IMDB

## Evaluation Results

The model was evaluated on a held-out validation subset of the **IMDB dataset** using standard classification metrics.

### Confusion Matrix

- **NEGATIVE → NEGATIVE:** 42  
- **NEGATIVE → POSITIVE:** 11  
- **POSITIVE → NEGATIVE:** 7  
- **POSITIVE → POSITIVE:** 40  

This indicates balanced performance across both sentiment classes.


### Classification Report

    

![image](https://cdn-uploads.huggingface.co/production/uploads/6864251ed4dde09fbaabf6cf/sFiCggbBtFF9zYZeAJ1Wo.png)

### Summary

- **Overall Accuracy:** 82%
- **Balanced F1-score:** 0.82 for both classes
- Strong precision for **NEGATIVE** reviews
- Strong recall for **POSITIVE** reviews

These results demonstrate that **LoRA fine-tuning** achieves competitive sentiment classification performance while training only a small fraction of model parameters.

![image](https://cdn-uploads.huggingface.co/production/uploads/6864251ed4dde09fbaabf6cf/1lZVhPe2w0Y61MGhGBPMU.png)

## Uses

### Direct Use

This model can be used for **sentiment analysis** on English text, classifying input sentences or paragraphs as:

- `POSITIVE`
- `NEGATIVE`

Example use cases:
- Movie review analysis
- User feedback classification
- Opinion mining

---

### Out-of-Scope Use

- Not suitable for multilingual sentiment analysis
- Not intended for fine-grained sentiment (e.g., star ratings)
- Not designed for long documents beyond 512 tokens

---

## Bias, Risks, and Limitations

- The model inherits biases from the **IMDB dataset** and the **DistilBERT pretraining corpus**
- Performance may degrade on:
  - Informal language
  - Sarcasm
  - Domain-specific jargon
- Predictions should not be used for high-stakes decisions without human review

---

## How to Get Started with the Model

⚠️ This is a **LoRA adapter**, so it must be loaded on top of the base model.

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel

# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2
)

# Load LoRA adapter
model = PeftModel.from_pretrained(
    base_model,
    "ChetanFernandis/distilbert_cls-lora-IMDB"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "ChetanFernandis/distilbert_cls-lora-IMDB"
)

# Inference
text = "This movie was absolutely fantastic!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

outputs = model(**inputs)
prediction = outputs.logits.argmax(dim=-1).item()

label_map = {0: "NEGATIVE", 1: "POSITIVE"}
print(label_map[prediction])

---

## Training Details
Training Data

Dataset: IMDB Movie Reviews

Samples: 200 training / 100 test

train_ds = imdb_dataset['train'].shuffle(seed=42).select(range(200))
val_ds   = imdb_dataset['test'].shuffle(seed=42).select(range(100))


Labels: Binary (Positive / Negative)

Training Procedure
Preprocessing
 a. Text tokenized using AutoTokenizer
 b.Truncation applied to max sequence length
 c. Padding applied dynamically per batch

Training Hyperparameters:-
  a. Training regime: FP32 (full precision)
  b. Batch size: 8
  c. Gradient accumulation: 4
  d. Epochs: 20
  e. Optimizer: AdamW
  f. LoRA rank (r): 4
  g. LoRA alpha: 8
  h. LoRA dropout: 0.1
  i. Target modules: q_lin, k_lin, v_lin

Speeds, Sizes, Times:-
  a. Trainable parameters: ~700K
  b. Total parameters: ~67M
  c. Trainable %: ~1%
  d. Checkpoint size: ~3–4 MB (adapter only)

Evaluation:-
 Metrics
  a. Accuracy
  b. Loss (Cross-Entropy)

Results:-
The LoRA-adapted model achieves competitive sentiment classification performance compared to full fine-tuning, while significantly reducing memory usage and training cost.

Technical Specifications
Model Architecture
 a. Base architecture: DistilBERT (6 layers, 12 heads)
 b. Hidden size: 768
 c.LoRA injected into: Attention Q, K, V projections
 d. Classification head: 2-class linear classifier

Compute Infrastructure
 a. Hardware
 b. CPU-based training (no GPU required)

Software
 a.transformers
 b.peft
 c. torch
 d.datasets

Citation

If you use this model, please cite the base model and dataset:

@misc{sanh2019distilbert,
  title={DistilBERT, a distilled version of BERT},
  author={Sanh, Victor and others},
  year={2019},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}