File size: 3,398 Bytes
dfb2b38 a630b04 191fce6 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 dfb2b38 a630b04 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
library_name: transformers
tags:
- text-classification
- spam-detection
- sms
license: apache-2.0
---
# π‘οΈ Model Card for `alusci/distilbert-smsafe`
A lightweight DistilBERT model fine-tuned for spam detection in SMS messages. The model classifies input messages as either **spam** or **ham** (not spam), using a custom dataset of real-world OTP (One-Time Password) and spam SMS messages.
---
## Model Details
### Model Description
- **Developed by:** [alusci](https://huggingface.co/alusci)
- **Model type:** Transformer-based binary classifier
- **Language(s):** English
- **License:** Apache 2.0
- **Finetuned from model:** `distilbert-base-uncased`
### Model Sources
- **Repository:** [https://huggingface.co/alusci/distilbert-smsafe](https://huggingface.co/alusci/distilbert-smsafe)
---
## π οΈ Uses
### Direct Use
- Detect whether an SMS message is spam or ham (OTP or not).
- Useful in prototypes, educational settings, or lightweight filtering applications.
```python
from transformers import pipeline
classifier = pipeline("text-classification", model="alusci/distilbert-smsafe")
result = classifier("Your verification code is 123456. Please do not share it with anyone.")
# Optional: map the label to human-readable terms
label_map = {"LABEL_0": "ham", "LABEL_1": "spam"}
print(f"Label: {label_map[result[0]['label']]} - Score: {result[0]['score']:.2f}")
```
### Out-of-Scope Use
- Not intended for email spam detection or multilingual message filtering.
- Not suitable for production environments without further testing and evaluation.
---
## π§ͺ Bias, Risks, and Limitations
- The model may reflect dataset biases (e.g., message structure, language patterns).
- It may misclassify legitimate OTPs or non-standard spam content.
- Risk of false positives in edge cases.
### Recommendations
- Evaluate on your own SMS dataset before deployment.
- Consider combining with rule-based or heuristic systems in production.
---
## π Training Details
### Training Data
- Dataset used: [`alusci/sms-otp-spam-dataset`](https://huggingface.co/datasets/alusci/sms-otp-spam-dataset)
- Binary labels for spam and non-spam OTP messages
### Training Procedure
- **Epochs:** 5
- **Batch Size:** 16 (assumed)
- **Loss Function:** CrossEntropyLoss
- **Optimizer:** AdamW
- **Tokenizer:** `distilbert-base-uncased`
---
## π Evaluation
### Metrics
- Accuracy, Precision, Recall, F1-score on held-out validation set
- Binary classification labels:
- `LABEL_0` β ham
- `LABEL_1` β spam
### Results
**Evaluation metrics after 5 epochs:**
- **Loss:** 0.2962
- **Accuracy:** 91.35%
- **Precision:** 90.26%
- **Recall:** 100.00%
- **F1-score:** 94.88%
**Performance:**
- **Evaluation runtime:** 4.37 seconds
- **Samples/sec:** 457.27
- **Steps/sec:** 9.15
---
## π± Environmental Impact
- **Hardware Type:** Apple Silicon MPS GPU (Mac)
- **Hours used:** <1 hour (small dataset)
- **Cloud Provider:** None (trained locally)
- **Carbon Emitted:** Minimal due to local and efficient hardware
---
## π§ Technical Specifications
### Model Architecture and Objective
- **Base:** DistilBERT
- **Objective:** Binary classification head on pooled output
- **Parameters:** ~66M (same as distilbert)
---
## π¬ Model Card Contact
For questions or feedback, please contact via [Hugging Face profile](https://huggingface.co/alusci). |