Amenallah2001's picture
Update README.md
2c174c9 verified
---
license: mit
datasets:
- codesignal/sms-spam-collection
language:
- en
library_name: transformers
pipeline_tag: text-classification
---
## **Model Overview**
This model is a fine-tuned version of BERT designed to classify SMS messages as either spam or not spam. It was developed as part of a technical test for the startup **IntiGo**.
### **Model Details**
- **Model Name:** BERT Fine-Tuned for SMS Spam Classification
- **Library:** [Transformers](https://huggingface.co/transformers/)
- **Language:** English
- **Pipeline Tag:** `text-classification`
### **License**
This model is released under the [MIT License](https://opensource.org/licenses/MIT).
## **Datasets**
- **Training Dataset:** [codesignal/sms-spam-collection](https://huggingface.co/datasets/codesignal/sms-spam-collection)
## **Fine-Tuning Procedure**
This model was fine-tuned on the SMS Spam Collection dataset. The dataset contains a collection of SMS messages labeled as "spam" or "ham" (not spam).
### **Metrics**
- **Precision:** 0.99
- **Recall:** 0.81
- **F1 Score:** 0.96
These metrics were computed on the validation set and indicate that the model is highly precise, with a strong ability to balance false positives and false negatives.
### **Usage**
You can use this model to classify SMS messages into spam or not spam. The model accepts raw text input and outputs a label prediction.
#### Example:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load the model and tokenizer
model_name = "Amenallah2001/intigo-technical-test"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Example input
text = "Congratulations! You've won a free ticket to Bahamas. Call now!"
# Tokenize and classify
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
predicted_class = logits.argmax().item()
# Output prediction
label_map = {0: "ham", 1: "spam"}
print(f"Prediction: {label_map[predicted_class]}")
```
### **Intended Use**
This model is intended for detecting spam in SMS messages. It can be integrated into systems that require spam detection, such as messaging apps or SMS gateways.
### **Limitations**
- **Data Imbalance:** The dataset used for training was imbalanced, which could affect the model’s performance in real-world scenarios with different distributions of spam and non-spam messages.
- **Language Support:** This model was fine-tuned on English text only and may not perform well on SMS messages in other languages.
### **Ethical Considerations**
When using this model, be mindful of privacy concerns and ensure that the deployment complies with relevant regulations, especially in handling user-generated content.