File size: 5,556 Bytes
92c75b7
3d6692b
 
92c75b7
3d6692b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92c75b7
 
3d6692b
92c75b7
3d6692b
 
92c75b7
3d6692b
 
 
 
92c75b7
3d6692b
92c75b7
3d6692b
 
92c75b7
3d6692b
 
 
 
 
92c75b7
3d6692b
92c75b7
3d6692b
 
92c75b7
3d6692b
 
 
 
 
 
92c75b7
3d6692b
92c75b7
3d6692b
 
 
 
 
 
92c75b7
3d6692b
92c75b7
3d6692b
 
 
 
92c75b7
3d6692b
92c75b7
3d6692b
92c75b7
3d6692b
 
 
 
 
 
 
 
 
 
 
 
92c75b7
3d6692b
92c75b7
3d6692b
 
 
92c75b7
3d6692b
92c75b7
3d6692b
92c75b7
3d6692b
 
 
 
92c75b7
3d6692b
 
92c75b7
3d6692b
 
 
 
92c75b7
3d6692b
92c75b7
3d6692b
92c75b7
3d6692b
 
 
 
 
 
92c75b7
3d6692b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
language: en
license: apache-2.0
library_name: transformers
pipeline_tag: text-classification
tags:
- text-classification
- sentiment-analysis
- distilbert
- imdb
- mlops
datasets:
- stanfordnlp/imdb
base_model: distilbert-base-uncased
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: mlops-group-sentiment
  results:
  - task:
      type: text-classification
      name: Sentiment Classification
    dataset:
      type: stanfordnlp/imdb
      name: IMDB
    metrics:
    - type: accuracy
      value: 0.90
      name: Test Accuracy
    - type: f1
      value: 0.90
      name: Test F1 (weighted)
---

# mlops-group-sentiment

A `distilbert-base-uncased` model fine-tuned on the IMDB movie reviews dataset
for binary sentiment classification (positive / negative).

This model is the final artifact of an MLOps group project at IIT Jodhpur
(Course CSL7040), demonstrating an end-to-end production ML pipeline: version
control on GitHub, GPU training on Kaggle, experiment tracking on Weights &
Biases, container packaging via Docker, and deployment to the Hugging Face Hub.

## How to Use

```python
from transformers import pipeline

classifier = pipeline("sentiment-analysis", model="pujaniitj/mlops-group-sentiment")
result = classifier("This movie was fantastic!")
print(result)
# [{'label': 'positive', 'score': 0.9876}]
```

## Intended Use

**Primary use case**: Classifying English-language movie reviews as positive
or negative sentiment.

**Out-of-scope uses**:
- Non-English text (model only trained on English IMDB reviews)
- Domain shift — e.g. tweets, product reviews, news articles, customer support
  transcripts. Performance will degrade outside the movie-review domain.
- Fine-grained sentiment (beyond binary pos/neg, e.g. 5-star ratings)
- High-stakes decisions or content moderation without human review

## Model Description

- **Base architecture**: DistilBERT (`distilbert-base-uncased`)
- **Distinct from base**: Fine-tuned classification head (2 output labels)
- **Parameters**: ~66 million
- **Tokenizer**: WordPiece (DistilBERT default)
- **Max sequence length**: 256 tokens
- **Labels**: `0 → negative`, `1 → positive`

## Training Data

- **Dataset**: [IMDB Movie Reviews](https://huggingface.co/datasets/stanfordnlp/imdb)
- **Train size**: 25,000 reviews (12,500 positive + 12,500 negative — perfectly balanced)
- **Test size**: 25,000 reviews (same balance)
- **Train/Validation split**: 90/10 of the train set, with `seed=42`

## Training Procedure

### Hyperparameters

| Setting              | Value  |
|----------------------|--------|
| Learning rate        | 3e-5   |
| Train batch size     | 16     |
| Eval batch size      | 32     |
| Epochs               | 3      |
| Max sequence length  | 256    |
| Warmup ratio         | 0.1    |
| Weight decay         | 0.01   |
| Optimizer            | AdamW  |
| Mixed precision      | fp16   |
| Seed                 | 42     |

### Training Environment

- **Platform**: Kaggle Notebook
- **Hardware**: 2× NVIDIA Tesla T4 GPU
- **Training time**: ~17 minutes

### Experiment Tracking

Two configurations were trained and compared via Weights & Biases:

| Run  | Learning rate | Test F1 | Test Accuracy | Test Loss |
|------|---------------|---------|---------------|-----------|
| v1 (this model) | 3e-5 | ~0.90 | ~0.90 | ~0.70 |
| v2 (discarded)  | 5e-5 | ~0.91 | ~0.91 | ~0.85 |

>  Replace these values with the exact decimals from your W&B run summary
> before publishing the final model card.

**Why v1 was selected**: While v2 achieved a marginally higher F1 (~0.5%),
it showed clear signs of overfitting — its eval loss climbed sharply across
epochs while v1's remained more stable. v1 also delivers ~25% faster inference,
making it the better choice for a production deployment.

## Evaluation Results

Evaluation on the held-out IMDB test set (25,000 reviews):

| Metric              | Value |
|---------------------|-------|
| Accuracy            | ~0.90 |
| F1 (weighted)       | ~0.90 |
| Precision (weighted)| ~0.90 |
| Recall (weighted)   | ~0.90 |

## Limitations and Biases

- **Domain**: Only trained on movie reviews. Expect degraded performance on
  other domains.
- **Length**: Inputs are truncated to 256 tokens (~200 words). Longer reviews
  may lose tail information that matters for sentiment.
- **Language**: English only.
- **Demographic biases**: IMDB reviewers historically skew toward certain
  demographics (e.g., predominantly male, English-speaking). The model may
  inherit these biases — e.g., it may misclassify reviews using vernacular or
  cultural references underrepresented in IMDB.
- **Sarcasm and irony**: Like most BERT-based classifiers, the model can
  struggle with sarcastic or ironic text where the surface sentiment opposes
  the intended meaning.

## Project Resources

- **GitHub repository**: https://github.com/pujaniitj/mlops-group-project-iitj
- **W&B experiment dashboard**: https://wandb.ai/pujaniitj-iit-jodpur/MLops_group_8
- **Training notebook (v1)**: https://www.kaggle.com/code/pujaniitj/mlops-group-8-imdb-v1
- **Training notebook (v2)**: https://www.kaggle.com/code/pujaniitj/mlops-group-8-imdb-v2

## Acknowledgments

- **Base model**: [DistilBERT](https://huggingface.co/distilbert-base-uncased)
  by Sanh et al. (Hugging Face)
- **Dataset**: [IMDB](https://huggingface.co/datasets/stanfordnlp/imdb)
  by Maas et al. (Stanford NLP)
- **Training infrastructure**: [Kaggle Notebooks](https://www.kaggle.com)
- **Experiment tracking**: [Weights & Biases](https://wandb.ai)