udit-k commited on
Commit
e6a29ab
·
verified ·
1 Parent(s): 64319dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -5
README.md CHANGED
@@ -17,7 +17,7 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  # HamSpamBERT
19
 
20
- This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on an unknown dataset.
21
  It achieves the following results on the evaluation set:
22
  - Loss: 0.0072
23
  - Accuracy: 0.9991
@@ -25,19 +25,36 @@ It achieves the following results on the evaluation set:
25
  - Recall: 0.9933
26
  - F1: 0.9966
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ## Model description
29
 
30
- More information needed
 
 
31
 
32
  ## Intended uses & limitations
33
 
34
- More information needed
35
 
36
  ## Training and evaluation data
37
 
38
- More information needed
 
39
 
40
- ## Training procedure
41
 
42
  ### Training hyperparameters
43
 
 
17
 
18
  # HamSpamBERT
19
 
20
+ This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on [Spam-Ham](https://huggingface.co/datasets/SalehAhmad/Spam-Ham) dataset.
21
  It achieves the following results on the evaluation set:
22
  - Loss: 0.0072
23
  - Accuracy: 0.9991
 
25
  - Recall: 0.9933
26
  - F1: 0.9966
27
 
28
+ ```python
29
+ from transformers import pipeline, BertTokenizer, BertForSequenceClassification
30
+
31
+ tokenizer = BertTokenizer.from_pretrained("udit-k/HamSpamBERT")
32
+ model = BertForSequenceClassification.from_pretrained("udit-k/HamSpamBERT")
33
+
34
+ classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
35
+ text = "Call this number to win FREE IPL FINAL tickets!!!"
36
+ result = classifier(text)
37
+ print(result)
38
+ ```
39
+ ```
40
+ [{'label': 'LABEL_1', 'score': 0.9999189376831055}]
41
+ ```
42
+
43
  ## Model description
44
 
45
+ This model is a fine-tuned version of the [BERT](https://huggingface.co/bert-base-uncased) model on [Spam-Ham](https://huggingface.co/datasets/SalehAhmad/Spam-Ham) dataset to improve the performance of sentiment analysis on Spam Detection tasks.
46
+ LABEL_0 = Ham (Not spam)
47
+ LABEL_1 = Spam
48
 
49
  ## Intended uses & limitations
50
 
51
+ This model can be used to detect spam texts. The primary limitation of this model is that it was trained on a corpus of about 4700 rows and evaluated on around 1200 rows.
52
 
53
  ## Training and evaluation data
54
 
55
+ Training corpus = 80%
56
+ Evaluation corpus = 20%
57
 
 
58
 
59
  ### Training hyperparameters
60