amarshiv86's picture
docs: add model card
6cd4751 verified
---
language: en
license: mit
tags:
- text-classification
- sentiment-analysis
- distilbert
- imdb
- transformers
- mlops
datasets:
- amarshiv86/sentiment-analysis-imdb-dataset
metrics:
- accuracy
- f1
- roc_auc
base_model: distilbert-base-uncased
---
# 🎭 Sentiment Analysis β€” IMDB Reviews
A binary sentiment classifier fine-tuned on IMDB movie reviews, predicting
**POSITIVE** or **NEGATIVE** sentiment with confidence scores.
---
## πŸ“Š Model Performance
| Metric | Score |
|-----------|--------|
| Accuracy | 0.894 |
| F1 Score | 0.893 |
| ROC-AUC | 0.960 |
| Precision | 0.884 |
| Recall | 0.902 |
### Confusion Matrix
![Confusion Matrix](artifacts/confusion_matrix.png)
---
## πŸ€– Model Details
| Property | Value |
|---|---|
| Base model | `distilbert-base-uncased` |
| Task | Binary text classification |
| Labels | `NEGATIVE` (0), `POSITIVE` (1) |
| Max token length | 256 |
| Training samples | 5,000 (IMDB subset) |
| Epochs | 2 |
| Batch size | 16 |
| Learning rate | 2e-5 |
| Framework | HuggingFace Transformers + Trainer API |
| Experiment tracking | MLflow |
---
## πŸš€ How to Use
```python
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
MODEL_PATH = "amarshiv86/sentiment-analysis-imdb-model"
tokenizer = AutoTokenizer.from_pretrained(f"{MODEL_PATH}/model")
model = AutoModelForSequenceClassification.from_pretrained(f"{MODEL_PATH}/model")
clf = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer, truncation=True)
results = clf([
"This movie was absolutely fantastic, loved every minute!",
"Terrible film, complete waste of time.",
])
for r in results:
print(f"{r['label']} β€” {r['score']:.1%} confidence")
```
---
## πŸ” MLOps Pipeline
Automatically retrained via GitHub Actions whenever `src/` or `params.yaml` changes:
```
GitHub Push
↓
GitHub Actions
↓
prepare.py β†’ train.py β†’ evaluate.py
↓ ↓
model files metrics.json
confusion_matrix.png
↓
HuggingFace Hub (this repo)
```
---
## πŸ“ Repository Structure
```
amarshiv86/sentiment-analysis-imdb-model
β”œβ”€β”€ model/
β”‚ β”œβ”€β”€ model.safetensors # fine-tuned weights (268 MB)
β”‚ β”œβ”€β”€ config.json # model architecture config
β”‚ β”œβ”€β”€ tokenizer.json # tokenizer vocab
β”‚ └── tokenizer_config.json # tokenizer settings
β”œβ”€β”€ artifacts/
β”‚ └── confusion_matrix.png # evaluation plot
└── metrics.json # latest eval metrics
```
---
## πŸ“„ Dataset
Trained on a 5,000-sample subset of the IMDB dataset.
Full processed dataset: [amarshiv86/sentiment-analysis-imdb-dataset](https://huggingface.co/datasets/amarshiv86/sentiment-analysis-imdb-dataset)
---
## πŸ“„ License
MIT β€” free to use, modify, and distribute.