--- license: mit datasets: - codesignal/sms-spam-collection language: - en library_name: transformers pipeline_tag: text-classification --- ## **Model Overview** This model is a fine-tuned version of BERT designed to classify SMS messages as either spam or not spam. It was developed as part of a technical test for the startup **IntiGo**. ### **Model Details** - **Model Name:** BERT Fine-Tuned for SMS Spam Classification - **Library:** [Transformers](https://huggingface.co/transformers/) - **Language:** English - **Pipeline Tag:** `text-classification` ### **License** This model is released under the [MIT License](https://opensource.org/licenses/MIT). ## **Datasets** - **Training Dataset:** [codesignal/sms-spam-collection](https://huggingface.co/datasets/codesignal/sms-spam-collection) ## **Fine-Tuning Procedure** This model was fine-tuned on the SMS Spam Collection dataset. The dataset contains a collection of SMS messages labeled as "spam" or "ham" (not spam). ### **Metrics** - **Precision:** 0.99 - **Recall:** 0.81 - **F1 Score:** 0.96 These metrics were computed on the validation set and indicate that the model is highly precise, with a strong ability to balance false positives and false negatives. ### **Usage** You can use this model to classify SMS messages into spam or not spam. The model accepts raw text input and outputs a label prediction. #### Example: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load the model and tokenizer model_name = "Amenallah2001/intigo-technical-test" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Example input text = "Congratulations! You've won a free ticket to Bahamas. Call now!" # Tokenize and classify inputs = tokenizer(text, return_tensors="pt") outputs = model(**inputs) logits = outputs.logits predicted_class = logits.argmax().item() # Output prediction label_map = {0: "ham", 1: "spam"} print(f"Prediction: {label_map[predicted_class]}") ``` ### **Intended Use** This model is intended for detecting spam in SMS messages. It can be integrated into systems that require spam detection, such as messaging apps or SMS gateways. ### **Limitations** - **Data Imbalance:** The dataset used for training was imbalanced, which could affect the model’s performance in real-world scenarios with different distributions of spam and non-spam messages. - **Language Support:** This model was fine-tuned on English text only and may not perform well on SMS messages in other languages. ### **Ethical Considerations** When using this model, be mindful of privacy concerns and ensure that the deployment complies with relevant regulations, especially in handling user-generated content.