mlops-group-sentiment

A distilbert-base-uncased model fine-tuned on the IMDB movie reviews dataset for binary sentiment classification (positive / negative).

This model is the final artifact of an MLOps group project at IIT Jodhpur (Course CSL7040), demonstrating an end-to-end production ML pipeline: version control on GitHub, GPU training on Kaggle, experiment tracking on Weights & Biases, container packaging via Docker, and deployment to the Hugging Face Hub.

How to Use

from transformers import pipeline

classifier = pipeline("sentiment-analysis", model="pujaniitj/mlops-group-sentiment")
result = classifier("This movie was fantastic!")
print(result)
# [{'label': 'positive', 'score': 0.9876}]

Intended Use

Primary use case: Classifying English-language movie reviews as positive or negative sentiment.

Out-of-scope uses:

  • Non-English text (model only trained on English IMDB reviews)
  • Domain shift — e.g. tweets, product reviews, news articles, customer support transcripts. Performance will degrade outside the movie-review domain.
  • Fine-grained sentiment (beyond binary pos/neg, e.g. 5-star ratings)
  • High-stakes decisions or content moderation without human review

Model Description

  • Base architecture: DistilBERT (distilbert-base-uncased)
  • Distinct from base: Fine-tuned classification head (2 output labels)
  • Parameters: ~66 million
  • Tokenizer: WordPiece (DistilBERT default)
  • Max sequence length: 256 tokens
  • Labels: 0 → negative, 1 → positive

Training Data

  • Dataset: IMDB Movie Reviews
  • Train size: 25,000 reviews (12,500 positive + 12,500 negative — perfectly balanced)
  • Test size: 25,000 reviews (same balance)
  • Train/Validation split: 90/10 of the train set, with seed=42

Training Procedure

Hyperparameters

Setting Value
Learning rate 3e-5
Train batch size 16
Eval batch size 32
Epochs 3
Max sequence length 256
Warmup ratio 0.1
Weight decay 0.01
Optimizer AdamW
Mixed precision fp16
Seed 42

Training Environment

  • Platform: Kaggle Notebook
  • Hardware: 2× NVIDIA Tesla T4 GPU
  • Training time: ~17 minutes

Experiment Tracking

Two configurations were trained and compared via Weights & Biases:

Run Learning rate Test F1 Test Accuracy Test Loss
v1 (this model) 3e-5 ~0.90 ~0.90 ~0.70
v2 (discarded) 5e-5 ~0.91 ~0.91 ~0.85

Replace these values with the exact decimals from your W&B run summary before publishing the final model card.

Why v1 was selected: While v2 achieved a marginally higher F1 (~0.5%), it showed clear signs of overfitting — its eval loss climbed sharply across epochs while v1's remained more stable. v1 also delivers ~25% faster inference, making it the better choice for a production deployment.

Evaluation Results

Evaluation on the held-out IMDB test set (25,000 reviews):

Metric Value
Accuracy ~0.90
F1 (weighted) ~0.90
Precision (weighted) ~0.90
Recall (weighted) ~0.90

Limitations and Biases

  • Domain: Only trained on movie reviews. Expect degraded performance on other domains.
  • Length: Inputs are truncated to 256 tokens (~200 words). Longer reviews may lose tail information that matters for sentiment.
  • Language: English only.
  • Demographic biases: IMDB reviewers historically skew toward certain demographics (e.g., predominantly male, English-speaking). The model may inherit these biases — e.g., it may misclassify reviews using vernacular or cultural references underrepresented in IMDB.
  • Sarcasm and irony: Like most BERT-based classifiers, the model can struggle with sarcastic or ironic text where the surface sentiment opposes the intended meaning.

Project Resources

Acknowledgments

Downloads last month
31
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Pujaniitj/MLOPS_GROUP_PROJECT

Finetuned
(11779)
this model

Dataset used to train Pujaniitj/MLOPS_GROUP_PROJECT

Evaluation results