Instructions to use Aukrk/MLOPS_group-v4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Aukrk/MLOPS_group-v4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Aukrk/MLOPS_group-v4")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Aukrk/MLOPS_group-v4") model = AutoModelForSequenceClassification.from_pretrained("Aukrk/MLOPS_group-v4") - Notebooks
- Google Colab
- Kaggle
MLOPS_group-v4 — SMS Spam Classification
This repository is part of the MLOps Group 36 Project for the PGD AI Programme, IIT Jodhpur.
The project implements an end-to-end MLOps pipeline for SMS spam classification using DistilBERT, with GitHub, Kaggle, Weights & Biases, Hugging Face Hub, Docker, and GitHub Actions.
Contributor
Anu Kumar
Roll Number: G25AIT2016
Project Links
| Resource | Link |
|---|---|
| GitHub Repository | https://github.com/g25ait2032-prog/mlops-group36-iitj |
| Kaggle Notebook - G25AIT2016 | https://www.kaggle.com/code/anukumarkg25ait2016/mlops-group36-data-preprocessing-g25ait2016 |
| W&B Run - G25AIT2016 | https://wandb.ai/g25ait2032-iit-jodhpur/MLOPS_Group/runs/j5fk4zll |
| W&B Project Dashboard | https://wandb.ai/g25ait2032-iit-jodhpur/MLOPS_Group |
| Hugging Face Model | https://huggingface.co/Aukrk/MLOPS_group-v4 |
Model Details
| Item | Value |
|---|---|
| Base model | distilbert-base-uncased |
| Task | Binary text classification |
| Classes | ham, spam |
| Dataset | UCI SMS Spam Collection |
| Framework | Hugging Face Transformers |
| Output labels | 0 = ham, 1 = spam |
Contribution Summary
This repository is linked to the G25AIT2016 Task 2 workflow.
The completed contribution includes:
- Loading the UCI SMS Spam dataset
- Cleaning and normalising SMS text
- Removing missing and duplicate records
- Creating stratified train, validation, and test splits
- Creating
id2label.jsonandlabel2id.json - Running data sanity checks
- Logging data-preparation metrics to W&B
- Publishing this Hugging Face model repository for project traceability
Dataset Preparation Summary
| Metric | Value |
|---|---|
| Raw samples | 5,574 |
| Duplicates removed | 415 |
| Cleaned samples | 5,159 |
| Train rows | 3,611 |
| Validation rows | 774 |
| Test rows | 774 |
| Sanity checks passed | 21 / 21 |
| Leakage check | Passed |
Label Mapping
{
"0": "ham",
"1": "spam"
}
How to Use
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Aukrk/MLOPS_group-v4"
)
text = "Congratulations! You have won a free iPhone. Click here now."
result = classifier(text)
print(result)
Load Model Directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Aukrk/MLOPS_group-v4")
model = AutoModelForSequenceClassification.from_pretrained("Aukrk/MLOPS_group-v4")
Example Inputs
| Text | Expected Output |
|---|---|
| Congratulations! You have won a free prize. Click here now. | spam |
| Can we meet tomorrow at 5 PM? | ham |
W&B Traceability
The G25AIT2016 W&B run records data-preparation metrics such as:
- Raw sample count
- Duplicate removal count
- Cleaned sample count
- Train / validation / test split sizes
- Sanity check status
- Leakage check status
W&B Run: https://wandb.ai/g25ait2032-iit-jodhpur/MLOPS_Group/runs/j5fk4zll
Model Context
This model repository is published under the G25AIT2016 Hugging Face account for Group 36 project traceability.
The model artefact follows the Group 36 DistilBERT SMS spam classification workflow and is linked with the data-preparation contribution completed by Anu Kumar - G25AIT2016.
Limitations
- The dataset is relatively small and focused on SMS messages.
- The model may not generalise well to long emails, non-English messages, or modern scam formats.
- Boundary cases mixing normal conversation and promotional text may be misclassified.
- This is an academic MLOps demonstration and should not be used as the only spam detection control in production.
Intended Use
This repository is intended for:
- Academic MLOps demonstration
- SMS spam classification testing
- Hugging Face deployment evidence
- W&B traceability evidence
- GitHub Actions / Docker inference integration
Not Intended For
- Production-grade fraud detection
- Legal, financial, or safety-critical filtering
- Detecting all phishing or scam variants without further validation
Authors
MLOps Group 36
PGD AI Programme, IIT Jodhpur
Contributor for this repository:
Anu Kumar - G25AIT2016
- Downloads last month
- 39
Model tree for Aukrk/MLOPS_group-v4
Base model
distilbert/distilbert-base-uncased