Amenallah2001
/

intigo-technical-test

Text Classification

feature-extraction

Model card Files Files and versions

Amenallah2001 commited on Aug 20, 2024

Commit

871eeeb

·

verified ·

1 Parent(s): 69ca90b

Create README.md

Files changed (1) hide show

README.md +77 -0

README.md ADDED Viewed

	@@ -0,0 +1,77 @@

+---
+license: mit
+datasets:
+- codesignal/sms-spam-collection
+language:
+- en
+library_name: transformers
+pipeline_tag: text-classification
+---
+Creating a model card for your fine-tuned BERT model on Hugging Face involves clearly documenting the purpose, datasets, usage, and other relevant information. Below is an example template for your model card:
+---
+## **Model Overview**
+This model is a fine-tuned version of BERT designed to classify SMS messages as either spam or not spam. It was developed as part of a technical test for the startup **IntiGo**.
+### **Model Details**
+- **Model Name:** BERT Fine-Tuned for SMS Spam Classification
+- **Library:** [Transformers](https://huggingface.co/transformers/)
+- **Language:** English
+- **Pipeline Tag:** `text-classification`
+### **License**
+This model is released under the [MIT License](https://opensource.org/licenses/MIT).
+## **Datasets**
+- **Training Dataset:** [codesignal/sms-spam-collection](https://huggingface.co/datasets/codesignal/sms-spam-collection)
+## **Fine-Tuning Procedure**
+This model was fine-tuned on the SMS Spam Collection dataset. The dataset contains a collection of SMS messages labeled as "spam" or "ham" (not spam).
+### **Metrics**
+- **Precision:** 0.99
+- **Recall:** 0.81
+- **F1 Score:** 0.96
+These metrics were computed on the validation set and indicate that the model is highly precise, with a strong ability to balance false positives and false negatives.
+### **Usage**
+You can use this model to classify SMS messages into spam or not spam. The model accepts raw text input and outputs a label prediction.
+#### Example:
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+# Load the model and tokenizer
+model_name = "your-model-name"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# Example input
+text = "Congratulations! You've won a free ticket to Bahamas. Call now!"
+# Tokenize and classify
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model(**inputs)
+logits = outputs.logits
+predicted_class = logits.argmax().item()
+# Output prediction
+label_map = {0: "ham", 1: "spam"}
+print(f"Prediction: {label_map[predicted_class]}")
+```
+### **Intended Use**
+This model is intended for detecting spam in SMS messages. It can be integrated into systems that require spam detection, such as messaging apps or SMS gateways.
+### **Limitations**
+- **Data Imbalance:** The dataset used for training was imbalanced, which could affect the model’s performance in real-world scenarios with different distributions of spam and non-spam messages.
+- **Language Support:** This model was fine-tuned on English text only and may not perform well on SMS messages in other languages.
+### **Ethical Considerations**
+When using this model, be mindful of privacy concerns and ensure that the deployment complies with relevant regulations, especially in handling user-generated content.
+---
+Feel free to customize this template further to fit your specific needs and the context of your work with IntiGo.