amarshiv86/sentiment-analysis-imdb-dataset
Viewer β’ Updated β’ 6k β’ 78
How to use amarshiv86/sentiment-analysis-imdb-model with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="amarshiv86/sentiment-analysis-imdb-model") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("amarshiv86/sentiment-analysis-imdb-model", dtype="auto")A binary sentiment classifier fine-tuned on IMDB movie reviews, predicting POSITIVE or NEGATIVE sentiment with confidence scores.
| Metric | Score |
|---|---|
| Accuracy | 0.894 |
| F1 Score | 0.893 |
| ROC-AUC | 0.960 |
| Precision | 0.884 |
| Recall | 0.902 |
| Property | Value |
|---|---|
| Base model | distilbert-base-uncased |
| Task | Binary text classification |
| Labels | NEGATIVE (0), POSITIVE (1) |
| Max token length | 256 |
| Training samples | 5,000 (IMDB subset) |
| Epochs | 2 |
| Batch size | 16 |
| Learning rate | 2e-5 |
| Framework | HuggingFace Transformers + Trainer API |
| Experiment tracking | MLflow |
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
MODEL_PATH = "amarshiv86/sentiment-analysis-imdb-model"
tokenizer = AutoTokenizer.from_pretrained(f"{MODEL_PATH}/model")
model = AutoModelForSequenceClassification.from_pretrained(f"{MODEL_PATH}/model")
clf = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer, truncation=True)
results = clf([
"This movie was absolutely fantastic, loved every minute!",
"Terrible film, complete waste of time.",
])
for r in results:
print(f"{r['label']} β {r['score']:.1%} confidence")
Automatically retrained via GitHub Actions whenever src/ or params.yaml changes:
GitHub Push
β
GitHub Actions
β
prepare.py β train.py β evaluate.py
β β
model files metrics.json
confusion_matrix.png
β
HuggingFace Hub (this repo)
amarshiv86/sentiment-analysis-imdb-model
βββ model/
β βββ model.safetensors # fine-tuned weights (268 MB)
β βββ config.json # model architecture config
β βββ tokenizer.json # tokenizer vocab
β βββ tokenizer_config.json # tokenizer settings
βββ artifacts/
β βββ confusion_matrix.png # evaluation plot
βββ metrics.json # latest eval metrics
Trained on a 5,000-sample subset of the IMDB dataset. Full processed dataset: amarshiv86/sentiment-analysis-imdb-dataset
MIT β free to use, modify, and distribute.
Base model
distilbert/distilbert-base-uncased