| --- |
| language: en |
| license: mit |
| tags: |
| - text-classification |
| - sentiment-analysis |
| - distilbert |
| - imdb |
| - transformers |
| - mlops |
| datasets: |
| - amarshiv86/sentiment-analysis-imdb-dataset |
| metrics: |
| - accuracy |
| - f1 |
| - roc_auc |
| base_model: distilbert-base-uncased |
| --- |
| |
| # π Sentiment Analysis β IMDB Reviews |
|
|
| A binary sentiment classifier fine-tuned on IMDB movie reviews, predicting |
| **POSITIVE** or **NEGATIVE** sentiment with confidence scores. |
|
|
| --- |
|
|
| ## π Model Performance |
|
|
| | Metric | Score | |
| |-----------|--------| |
| | Accuracy | 0.894 | |
| | F1 Score | 0.893 | |
| | ROC-AUC | 0.960 | |
| | Precision | 0.884 | |
| | Recall | 0.902 | |
|
|
| ### Confusion Matrix |
|  |
|
|
| --- |
|
|
| ## π€ Model Details |
|
|
| | Property | Value | |
| |---|---| |
| | Base model | `distilbert-base-uncased` | |
| | Task | Binary text classification | |
| | Labels | `NEGATIVE` (0), `POSITIVE` (1) | |
| | Max token length | 256 | |
| | Training samples | 5,000 (IMDB subset) | |
| | Epochs | 2 | |
| | Batch size | 16 | |
| | Learning rate | 2e-5 | |
| | Framework | HuggingFace Transformers + Trainer API | |
| | Experiment tracking | MLflow | |
|
|
| --- |
|
|
| ## π How to Use |
|
|
| ```python |
| from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification |
| |
| MODEL_PATH = "amarshiv86/sentiment-analysis-imdb-model" |
| |
| tokenizer = AutoTokenizer.from_pretrained(f"{MODEL_PATH}/model") |
| model = AutoModelForSequenceClassification.from_pretrained(f"{MODEL_PATH}/model") |
| |
| clf = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer, truncation=True) |
| |
| results = clf([ |
| "This movie was absolutely fantastic, loved every minute!", |
| "Terrible film, complete waste of time.", |
| ]) |
| |
| for r in results: |
| print(f"{r['label']} β {r['score']:.1%} confidence") |
| ``` |
|
|
| --- |
|
|
| ## π MLOps Pipeline |
|
|
| Automatically retrained via GitHub Actions whenever `src/` or `params.yaml` changes: |
|
|
| ``` |
| GitHub Push |
| β |
| GitHub Actions |
| β |
| prepare.py β train.py β evaluate.py |
| β β |
| model files metrics.json |
| confusion_matrix.png |
| β |
| HuggingFace Hub (this repo) |
| ``` |
|
|
| --- |
|
|
| ## π Repository Structure |
|
|
| ``` |
| amarshiv86/sentiment-analysis-imdb-model |
| βββ model/ |
| β βββ model.safetensors # fine-tuned weights (268 MB) |
| β βββ config.json # model architecture config |
| β βββ tokenizer.json # tokenizer vocab |
| β βββ tokenizer_config.json # tokenizer settings |
| βββ artifacts/ |
| β βββ confusion_matrix.png # evaluation plot |
| βββ metrics.json # latest eval metrics |
| ``` |
|
|
| --- |
|
|
| ## π Dataset |
|
|
| Trained on a 5,000-sample subset of the IMDB dataset. |
| Full processed dataset: [amarshiv86/sentiment-analysis-imdb-dataset](https://huggingface.co/datasets/amarshiv86/sentiment-analysis-imdb-dataset) |
|
|
| --- |
|
|
| ## π License |
|
|
| MIT β free to use, modify, and distribute. |
|
|