File size: 3,398 Bytes
dfb2b38
 
a630b04
191fce6
 
 
a630b04
dfb2b38
 
a630b04
dfb2b38
a630b04
dfb2b38
a630b04
dfb2b38
 
 
 
 
a630b04
 
 
 
 
dfb2b38
a630b04
dfb2b38
a630b04
dfb2b38
a630b04
dfb2b38
a630b04
dfb2b38
 
 
a630b04
 
dfb2b38
a630b04
 
dfb2b38
a630b04
 
dfb2b38
a630b04
 
 
 
dfb2b38
 
 
a630b04
 
dfb2b38
a630b04
dfb2b38
a630b04
dfb2b38
a630b04
 
 
dfb2b38
 
 
a630b04
 
dfb2b38
a630b04
dfb2b38
a630b04
dfb2b38
 
 
a630b04
 
dfb2b38
 
 
a630b04
 
 
 
 
dfb2b38
a630b04
dfb2b38
a630b04
dfb2b38
a630b04
dfb2b38
a630b04
 
 
 
dfb2b38
 
 
a630b04
dfb2b38
a630b04
 
 
 
 
dfb2b38
a630b04
dfb2b38
a630b04
 
 
dfb2b38
a630b04
dfb2b38
a630b04
dfb2b38
a630b04
 
 
 
dfb2b38
a630b04
dfb2b38
a630b04
dfb2b38
 
 
a630b04
 
 
dfb2b38
a630b04
dfb2b38
a630b04
dfb2b38
a630b04
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
library_name: transformers
tags:
- text-classification
- spam-detection
- sms
license: apache-2.0
---

# πŸ›‘οΈ Model Card for `alusci/distilbert-smsafe`

A lightweight DistilBERT model fine-tuned for spam detection in SMS messages. The model classifies input messages as either **spam** or **ham** (not spam), using a custom dataset of real-world OTP (One-Time Password) and spam SMS messages.

---

## Model Details

### Model Description

- **Developed by:** [alusci](https://huggingface.co/alusci)
- **Model type:** Transformer-based binary classifier
- **Language(s):** English
- **License:** Apache 2.0
- **Finetuned from model:** `distilbert-base-uncased`

### Model Sources

- **Repository:** [https://huggingface.co/alusci/distilbert-smsafe](https://huggingface.co/alusci/distilbert-smsafe)

---

## πŸ› οΈ Uses

### Direct Use

- Detect whether an SMS message is spam or ham (OTP or not).
- Useful in prototypes, educational settings, or lightweight filtering applications.

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="alusci/distilbert-smsafe")
result = classifier("Your verification code is 123456. Please do not share it with anyone.")

# Optional: map the label to human-readable terms
label_map = {"LABEL_0": "ham", "LABEL_1": "spam"}
print(f"Label: {label_map[result[0]['label']]} - Score: {result[0]['score']:.2f}")
```

### Out-of-Scope Use

- Not intended for email spam detection or multilingual message filtering.
- Not suitable for production environments without further testing and evaluation.

---

## πŸ§ͺ Bias, Risks, and Limitations

- The model may reflect dataset biases (e.g., message structure, language patterns).
- It may misclassify legitimate OTPs or non-standard spam content.
- Risk of false positives in edge cases.

### Recommendations

- Evaluate on your own SMS dataset before deployment.
- Consider combining with rule-based or heuristic systems in production.

---

## πŸ“š Training Details

### Training Data

- Dataset used: [`alusci/sms-otp-spam-dataset`](https://huggingface.co/datasets/alusci/sms-otp-spam-dataset)
- Binary labels for spam and non-spam OTP messages

### Training Procedure

- **Epochs:** 5
- **Batch Size:** 16 (assumed)
- **Loss Function:** CrossEntropyLoss
- **Optimizer:** AdamW
- **Tokenizer:** `distilbert-base-uncased`

---

## πŸ“ˆ Evaluation

### Metrics

- Accuracy, Precision, Recall, F1-score on held-out validation set
- Binary classification labels:  
  - `LABEL_0` β†’ ham  
  - `LABEL_1` β†’ spam

### Results

**Evaluation metrics after 5 epochs:**

- **Loss:** 0.2962  
- **Accuracy:** 91.35%  
- **Precision:** 90.26%  
- **Recall:** 100.00%  
- **F1-score:** 94.88%

**Performance:**

- **Evaluation runtime:** 4.37 seconds  
- **Samples/sec:** 457.27  
- **Steps/sec:** 9.15

---

## 🌱 Environmental Impact

- **Hardware Type:** Apple Silicon MPS GPU (Mac)
- **Hours used:** <1 hour (small dataset)
- **Cloud Provider:** None (trained locally)
- **Carbon Emitted:** Minimal due to local and efficient hardware

---

## πŸ”§ Technical Specifications

### Model Architecture and Objective

- **Base:** DistilBERT
- **Objective:** Binary classification head on pooled output
- **Parameters:** ~66M (same as distilbert)

---

## πŸ“¬ Model Card Contact

For questions or feedback, please contact via [Hugging Face profile](https://huggingface.co/alusci).