Amenallah2001 commited on
Commit
871eeeb
·
verified ·
1 Parent(s): 69ca90b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - codesignal/sms-spam-collection
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ pipeline_tag: text-classification
9
+ ---
10
+ Creating a model card for your fine-tuned BERT model on Hugging Face involves clearly documenting the purpose, datasets, usage, and other relevant information. Below is an example template for your model card:
11
+
12
+ ---
13
+
14
+ ## **Model Overview**
15
+ This model is a fine-tuned version of BERT designed to classify SMS messages as either spam or not spam. It was developed as part of a technical test for the startup **IntiGo**.
16
+
17
+ ### **Model Details**
18
+ - **Model Name:** BERT Fine-Tuned for SMS Spam Classification
19
+ - **Library:** [Transformers](https://huggingface.co/transformers/)
20
+ - **Language:** English
21
+ - **Pipeline Tag:** `text-classification`
22
+
23
+ ### **License**
24
+ This model is released under the [MIT License](https://opensource.org/licenses/MIT).
25
+
26
+ ## **Datasets**
27
+ - **Training Dataset:** [codesignal/sms-spam-collection](https://huggingface.co/datasets/codesignal/sms-spam-collection)
28
+
29
+ ## **Fine-Tuning Procedure**
30
+ This model was fine-tuned on the SMS Spam Collection dataset. The dataset contains a collection of SMS messages labeled as "spam" or "ham" (not spam).
31
+
32
+ ### **Metrics**
33
+ - **Precision:** 0.99
34
+ - **Recall:** 0.81
35
+ - **F1 Score:** 0.96
36
+
37
+ These metrics were computed on the validation set and indicate that the model is highly precise, with a strong ability to balance false positives and false negatives.
38
+
39
+ ### **Usage**
40
+ You can use this model to classify SMS messages into spam or not spam. The model accepts raw text input and outputs a label prediction.
41
+
42
+ #### Example:
43
+ ```python
44
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
45
+
46
+ # Load the model and tokenizer
47
+ model_name = "your-model-name"
48
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
49
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
50
+
51
+ # Example input
52
+ text = "Congratulations! You've won a free ticket to Bahamas. Call now!"
53
+
54
+ # Tokenize and classify
55
+ inputs = tokenizer(text, return_tensors="pt")
56
+ outputs = model(**inputs)
57
+ logits = outputs.logits
58
+ predicted_class = logits.argmax().item()
59
+
60
+ # Output prediction
61
+ label_map = {0: "ham", 1: "spam"}
62
+ print(f"Prediction: {label_map[predicted_class]}")
63
+ ```
64
+
65
+ ### **Intended Use**
66
+ This model is intended for detecting spam in SMS messages. It can be integrated into systems that require spam detection, such as messaging apps or SMS gateways.
67
+
68
+ ### **Limitations**
69
+ - **Data Imbalance:** The dataset used for training was imbalanced, which could affect the model’s performance in real-world scenarios with different distributions of spam and non-spam messages.
70
+ - **Language Support:** This model was fine-tuned on English text only and may not perform well on SMS messages in other languages.
71
+
72
+ ### **Ethical Considerations**
73
+ When using this model, be mindful of privacy concerns and ensure that the deployment complies with relevant regulations, especially in handling user-generated content.
74
+
75
+ ---
76
+
77
+ Feel free to customize this template further to fit your specific needs and the context of your work with IntiGo.