ivanmartinezmurillo commited on
Commit
9b0e57f
·
verified ·
1 Parent(s): 199413d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - BSC-LT/mRoBERTa
7
+ pipeline_tag: text-classification
8
+ library_name: transformers
9
+ ---
10
+
11
+ # mRoBERTa_FT1_DFT1_fraude_phishing
12
+
13
+ ## Description
14
+ This model is fine-tuned from `BSC-LT/mRoBERTa` for **binary classification of phishing detection** in English texts.
15
+ It predicts whether a given **SMS or email message** belongs to the category of **phishing** or **not phishing**.
16
+
17
+
18
+ ## Dataset
19
+ The dataset used for fine-tuning contains **SMS and email texts** labeled as phishing or not phishing.
20
+
21
+ - **Training set**: 9,422 instances
22
+ - **Test set**: 2,357 instances
23
+
24
+ ## Training Parameters
25
+ - learning_rate: 2e-5
26
+ - num_train_epochs: 2
27
+ - per_device_train_batch_size: 8
28
+ - per_device_eval_batch_size: 8
29
+ - overwrite_output_dir: true
30
+ - logging_strategy: steps
31
+ - logging_steps: 10
32
+ - seed: 852
33
+ - fp16: true
34
+
35
+ ## Results
36
+
37
+ ### Combined dataset (SMS + emails)
38
+ **Confusion Matrix**
39
+ [[1793 16]
40
+ [ 18 530]]
41
+ | Class | Precision | Recall | F1-score | Support |
42
+ |-------|-----------|--------|----------|---------|
43
+ | 0 (Not phishing) | 0.9901 | 0.9912 | 0.9906 | 1809 |
44
+ | 1 (Phishing) | 0.9707 | 0.9672 | 0.9689 | 548 |
45
+ - Accuracy: **0.9856**
46
+ - Macro Avg F1: **0.9798**
47
+ ---
48
+
49
+ ### Only Emails
50
+ **Confusion Matrix**
51
+ [[823 12]
52
+ [ 14 313]]
53
+ | Class | Precision | Recall | F1-score | Support |
54
+ |-------|-----------|--------|----------|---------|
55
+ | 0 (Not phishing) | 0.9833 | 0.9856 | 0.9845 | 835 |
56
+ | 1 (Phishing) | 0.9631 | 0.9572 | 0.9601 | 327 |
57
+ - Accuracy: **0.9776**
58
+ - Macro Avg F1: **0.9723**
59
+ ---
60
+
61
+ ### Only SMS
62
+ **Confusion Matrix**
63
+ [[969 5]
64
+ [ 6 215]]
65
+ | Class | Precision | Recall | F1-score | Support |
66
+ |-------|-----------|--------|----------|---------|
67
+ | 0 (Not phishing) | 0.9939 | 0.9949 | 0.9944 | 974 |
68
+ | 1 (Phishing) | 0.9773 | 0.9729 | 0.9751 | 221 |
69
+ - Accuracy: **0.9908**
70
+ - Macro Avg F1: **0.9847**
71
+ ---
72
+
73
+ ## Reference
74
+ ```bibtex
75
+ @misc{gplsi-mroberta-fraudephishing,
76
+ author = {Martínez-Murillo, Iván and Bonora, Mar and Sepúlveda-Torres, Robiert},
77
+ title = {mRoBERTa_FT1_DFT1_fraude_phishing: Fine-tuned model for phishing detection},
78
+ year = {2025},
79
+ howpublished = {\url{https://huggingface.co/gplsi/mRoBERTa_FT1_DFT1_fraude_phishing}},
80
+ note = {Accessed: 2025-10-03}
81
+ }