Update README.md

34f3aad verified 1 day ago

5.56 kB

language: en
license: apache-2.0
library_name: transformers
pipeline_tag: text-classification
tags:
  - text-classification
  - sentiment-analysis
  - distilbert
  - imdb
  - mlops
datasets:
  - stanfordnlp/imdb
base_model: distilbert-base-uncased
metrics:
  - accuracy
  - f1
  - precision
  - recall
model-index:
  - name: mlops-group-sentiment
    results:
      - task:
          type: text-classification
          name: Sentiment Classification
        dataset:
          type: stanfordnlp/imdb
          name: IMDB
        metrics:
          - type: accuracy
            value: 0.9
            name: Test Accuracy
          - type: f1
            value: 0.9
            name: Test F1 (weighted)

mlops-group-sentiment

A distilbert-base-uncased model fine-tuned on the IMDB movie reviews dataset for binary sentiment classification (positive / negative).

This model is the final artifact of an MLOps group project at IIT Jodhpur (Course CSL7040), demonstrating an end-to-end production ML pipeline: version control on GitHub, GPU training on Kaggle, experiment tracking on Weights & Biases, container packaging via Docker, and deployment to the Hugging Face Hub.

How to Use

from transformers import pipeline

classifier = pipeline("sentiment-analysis", model="pujaniitj/mlops-group-sentiment")
result = classifier("This movie was fantastic!")
print(result)
# [{'label': 'positive', 'score': 0.9876}]

Intended Use

Primary use case: Classifying English-language movie reviews as positive or negative sentiment.

Out-of-scope uses:

Non-English text (model only trained on English IMDB reviews)
Domain shift — e.g. tweets, product reviews, news articles, customer support transcripts. Performance will degrade outside the movie-review domain.
Fine-grained sentiment (beyond binary pos/neg, e.g. 5-star ratings)
High-stakes decisions or content moderation without human review

Model Description

Base architecture: DistilBERT (distilbert-base-uncased)
Distinct from base: Fine-tuned classification head (2 output labels)
Parameters: ~66 million
Tokenizer: WordPiece (DistilBERT default)
Max sequence length: 256 tokens
Labels: 0 → negative, 1 → positive

Training Data

Dataset: IMDB Movie Reviews
Train size: 25,000 reviews (12,500 positive + 12,500 negative — perfectly balanced)
Test size: 25,000 reviews (same balance)
Train/Validation split: 90/10 of the train set, with seed=42

Training Procedure

Hyperparameters

Setting	Value
Learning rate	3e-5
Train batch size	16
Eval batch size	32
Epochs	3
Max sequence length	256
Warmup ratio	0.1
Weight decay	0.01
Optimizer	AdamW
Mixed precision	fp16
Seed	42

Training Environment

Platform: Kaggle Notebook
Hardware: 2× NVIDIA Tesla T4 GPU
Training time: ~17 minutes

Experiment Tracking

Two configurations were trained and compared via Weights & Biases:

Run	Learning rate	Test F1	Test Accuracy	Test Loss
v1 (this model)	3e-5	~0.90	~0.90	~0.70
v2 (discarded)	5e-5	~0.91	~0.91	~0.85

Replace these values with the exact decimals from your W&B run summary before publishing the final model card.

Why v1 was selected: While v2 achieved a marginally higher F1 (~0.5%), it showed clear signs of overfitting — its eval loss climbed sharply across epochs while v1's remained more stable. v1 also delivers ~25% faster inference, making it the better choice for a production deployment.

Evaluation Results

Evaluation on the held-out IMDB test set (25,000 reviews):

Metric	Value
Accuracy	~0.90
F1 (weighted)	~0.90
Precision (weighted)	~0.90
Recall (weighted)	~0.90

Limitations and Biases

Domain: Only trained on movie reviews. Expect degraded performance on other domains.
Length: Inputs are truncated to 256 tokens (~200 words). Longer reviews may lose tail information that matters for sentiment.
Language: English only.
Demographic biases: IMDB reviewers historically skew toward certain demographics (e.g., predominantly male, English-speaking). The model may inherit these biases — e.g., it may misclassify reviews using vernacular or cultural references underrepresented in IMDB.
Sarcasm and irony: Like most BERT-based classifiers, the model can struggle with sarcastic or ironic text where the surface sentiment opposes the intended meaning.

Project Resources

GitHub repository: https://github.com/pujaniitj/mlops-group-project-iitj
W&B experiment dashboard: https://wandb.ai/pujaniitj-iit-jodpur/MLops_group_8
Training notebook (v1): https://www.kaggle.com/code/pujaniitj/mlops-group-8-imdb-v1
Training notebook (v2): https://www.kaggle.com/code/pujaniitj/mlops-group-8-imdb-v2

Acknowledgments

Base model: DistilBERT by Sanh et al. (Hugging Face)
Dataset: IMDB by Maas et al. (Stanford NLP)
Training infrastructure: Kaggle Notebooks
Experiment tracking: Weights & Biases