amarshiv86
/

sentiment-analysis-imdb-model

Text Classification

sentiment-analysis

Model card Files Files and versions

sentiment-analysis-imdb-model / README.md

amarshiv86's picture

docs: add model card

6cd4751 verified 13 days ago

|

history blame contribute delete

2.9 kB

	---
	language: en
	license: mit
	tags:
	- text-classification
	- sentiment-analysis
	- distilbert
	- imdb
	- transformers
	- mlops
	datasets:
	- amarshiv86/sentiment-analysis-imdb-dataset
	metrics:
	- accuracy
	- f1
	- roc_auc
	base_model: distilbert-base-uncased
	---

	# 🎭 Sentiment Analysis — IMDB Reviews

	A binary sentiment classifier fine-tuned on IMDB movie reviews, predicting
	POSITIVE or NEGATIVE sentiment with confidence scores.

	---

	## 📊 Model Performance

	\| Metric \| Score \|
	\|-----------\|--------\|
	\| Accuracy \| 0.894 \|
	\| F1 Score \| 0.893 \|
	\| ROC-AUC \| 0.960 \|
	\| Precision \| 0.884 \|
	\| Recall \| 0.902 \|

	### Confusion Matrix
	![Confusion Matrix](artifacts/confusion_matrix.png)

	---

	## 🤖 Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Base model \| `distilbert-base-uncased` \|
	\| Task \| Binary text classification \|
	\| Labels \| `NEGATIVE` (0), `POSITIVE` (1) \|
	\| Max token length \| 256 \|
	\| Training samples \| 5,000 (IMDB subset) \|
	\| Epochs \| 2 \|
	\| Batch size \| 16 \|
	\| Learning rate \| 2e-5 \|
	\| Framework \| HuggingFace Transformers + Trainer API \|
	\| Experiment tracking \| MLflow \|

	---

	## 🚀 How to Use

	```python
	from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification

	MODEL_PATH = "amarshiv86/sentiment-analysis-imdb-model"

	tokenizer = AutoTokenizer.from_pretrained(f"{MODEL_PATH}/model")
	model = AutoModelForSequenceClassification.from_pretrained(f"{MODEL_PATH}/model")

	clf = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer, truncation=True)

	results = clf([
	"This movie was absolutely fantastic, loved every minute!",
	"Terrible film, complete waste of time.",
	])

	for r in results:
	print(f"{r['label']} — {r['score']:.1%} confidence")
	```

	---

	## 🔁 MLOps Pipeline

	Automatically retrained via GitHub Actions whenever `src/` or `params.yaml` changes:

	```
	GitHub Push
	↓
	GitHub Actions
	↓
	prepare.py → train.py → evaluate.py
	↓ ↓
	model files metrics.json
	confusion_matrix.png
	↓
	HuggingFace Hub (this repo)
	```

	---

	## 📁 Repository Structure

	```
	amarshiv86/sentiment-analysis-imdb-model
	├── model/
	│ ├── model.safetensors # fine-tuned weights (268 MB)
	│ ├── config.json # model architecture config
	│ ├── tokenizer.json # tokenizer vocab
	│ └── tokenizer_config.json # tokenizer settings
	├── artifacts/
	│ └── confusion_matrix.png # evaluation plot
	└── metrics.json # latest eval metrics
	```

	---

	## 📄 Dataset

	Trained on a 5,000-sample subset of the IMDB dataset.
	Full processed dataset: [amarshiv86/sentiment-analysis-imdb-dataset](https://huggingface.co/datasets/amarshiv86/sentiment-analysis-imdb-dataset)

	---

	## 📄 License

	MIT — free to use, modify, and distribute.