File size: 2,785 Bytes
871eeeb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51edcbb
871eeeb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
license: mit
datasets:
- codesignal/sms-spam-collection
language:
- en
library_name: transformers
pipeline_tag: text-classification
---


## **Model Overview**
This model is a fine-tuned version of BERT designed to classify SMS messages as either spam or not spam. It was developed as part of a technical test for the startup **IntiGo**.

### **Model Details**
- **Model Name:** BERT Fine-Tuned for SMS Spam Classification
- **Library:** [Transformers](https://huggingface.co/transformers/)
- **Language:** English
- **Pipeline Tag:** `text-classification`

### **License**
This model is released under the [MIT License](https://opensource.org/licenses/MIT).

## **Datasets**
- **Training Dataset:** [codesignal/sms-spam-collection](https://huggingface.co/datasets/codesignal/sms-spam-collection)

## **Fine-Tuning Procedure**
This model was fine-tuned on the SMS Spam Collection dataset. The dataset contains a collection of SMS messages labeled as "spam" or "ham" (not spam). 

### **Metrics**
- **Precision:** 0.99
- **Recall:** 0.81
- **F1 Score:** 0.96

These metrics were computed on the validation set and indicate that the model is highly precise, with a strong ability to balance false positives and false negatives.

### **Usage**
You can use this model to classify SMS messages into spam or not spam. The model accepts raw text input and outputs a label prediction.

#### Example:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the model and tokenizer
model_name = "Amenallah2001/intigo-technical-test"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example input
text = "Congratulations! You've won a free ticket to Bahamas. Call now!"

# Tokenize and classify
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
predicted_class = logits.argmax().item()

# Output prediction
label_map = {0: "ham", 1: "spam"}
print(f"Prediction: {label_map[predicted_class]}")
```

### **Intended Use**
This model is intended for detecting spam in SMS messages. It can be integrated into systems that require spam detection, such as messaging apps or SMS gateways.

### **Limitations**
- **Data Imbalance:** The dataset used for training was imbalanced, which could affect the model’s performance in real-world scenarios with different distributions of spam and non-spam messages.
- **Language Support:** This model was fine-tuned on English text only and may not perform well on SMS messages in other languages.

### **Ethical Considerations**
When using this model, be mindful of privacy concerns and ensure that the deployment complies with relevant regulations, especially in handling user-generated content.