|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- text-classification |
|
|
- spam-detection |
|
|
- sms |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# π‘οΈ Model Card for `alusci/distilbert-smsafe` |
|
|
|
|
|
A lightweight DistilBERT model fine-tuned for spam detection in SMS messages. The model classifies input messages as either **spam** or **ham** (not spam), using a custom dataset of real-world OTP (One-Time Password) and spam SMS messages. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
- **Developed by:** [alusci](https://huggingface.co/alusci) |
|
|
- **Model type:** Transformer-based binary classifier |
|
|
- **Language(s):** English |
|
|
- **License:** Apache 2.0 |
|
|
- **Finetuned from model:** `distilbert-base-uncased` |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Repository:** [https://huggingface.co/alusci/distilbert-smsafe](https://huggingface.co/alusci/distilbert-smsafe) |
|
|
|
|
|
--- |
|
|
|
|
|
## π οΈ Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
- Detect whether an SMS message is spam or ham (OTP or not). |
|
|
- Useful in prototypes, educational settings, or lightweight filtering applications. |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
classifier = pipeline("text-classification", model="alusci/distilbert-smsafe") |
|
|
result = classifier("Your verification code is 123456. Please do not share it with anyone.") |
|
|
|
|
|
# Optional: map the label to human-readable terms |
|
|
label_map = {"LABEL_0": "ham", "LABEL_1": "spam"} |
|
|
print(f"Label: {label_map[result[0]['label']]} - Score: {result[0]['score']:.2f}") |
|
|
``` |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
- Not intended for email spam detection or multilingual message filtering. |
|
|
- Not suitable for production environments without further testing and evaluation. |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ͺ Bias, Risks, and Limitations |
|
|
|
|
|
- The model may reflect dataset biases (e.g., message structure, language patterns). |
|
|
- It may misclassify legitimate OTPs or non-standard spam content. |
|
|
- Risk of false positives in edge cases. |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
- Evaluate on your own SMS dataset before deployment. |
|
|
- Consider combining with rule-based or heuristic systems in production. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
- Dataset used: [`alusci/sms-otp-spam-dataset`](https://huggingface.co/datasets/alusci/sms-otp-spam-dataset) |
|
|
- Binary labels for spam and non-spam OTP messages |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
- **Epochs:** 5 |
|
|
- **Batch Size:** 16 (assumed) |
|
|
- **Loss Function:** CrossEntropyLoss |
|
|
- **Optimizer:** AdamW |
|
|
- **Tokenizer:** `distilbert-base-uncased` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Evaluation |
|
|
|
|
|
### Metrics |
|
|
|
|
|
- Accuracy, Precision, Recall, F1-score on held-out validation set |
|
|
- Binary classification labels: |
|
|
- `LABEL_0` β ham |
|
|
- `LABEL_1` β spam |
|
|
|
|
|
### Results |
|
|
|
|
|
**Evaluation metrics after 5 epochs:** |
|
|
|
|
|
- **Loss:** 0.2962 |
|
|
- **Accuracy:** 91.35% |
|
|
- **Precision:** 90.26% |
|
|
- **Recall:** 100.00% |
|
|
- **F1-score:** 94.88% |
|
|
|
|
|
**Performance:** |
|
|
|
|
|
- **Evaluation runtime:** 4.37 seconds |
|
|
- **Samples/sec:** 457.27 |
|
|
- **Steps/sec:** 9.15 |
|
|
|
|
|
--- |
|
|
|
|
|
## π± Environmental Impact |
|
|
|
|
|
- **Hardware Type:** Apple Silicon MPS GPU (Mac) |
|
|
- **Hours used:** <1 hour (small dataset) |
|
|
- **Cloud Provider:** None (trained locally) |
|
|
- **Carbon Emitted:** Minimal due to local and efficient hardware |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Technical Specifications |
|
|
|
|
|
### Model Architecture and Objective |
|
|
|
|
|
- **Base:** DistilBERT |
|
|
- **Objective:** Binary classification head on pooled output |
|
|
- **Parameters:** ~66M (same as distilbert) |
|
|
|
|
|
--- |
|
|
|
|
|
## π¬ Model Card Contact |
|
|
|
|
|
For questions or feedback, please contact via [Hugging Face profile](https://huggingface.co/alusci). |