File size: 3,234 Bytes
9b0e57f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef1638f
 
 
 
 
 
 
9b0e57f
 
 
 
2336cf2
9b0e57f
 
 
 
 
 
ef1638f
 
 
 
 
 
 
9b0e57f
 
 
 
2336cf2
9b0e57f
 
 
 
 
 
ef1638f
 
 
 
 
 
9b0e57f
 
 
 
2336cf2
9b0e57f
 
 
 
145336b
 
 
9b0e57f
 
 
2ba8a14
9b0e57f
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
license: apache-2.0
language:
- en
base_model:
- BSC-LT/mRoBERTa
pipeline_tag: text-classification
library_name: transformers
---

# mRoBERTa_FT1_DFT1_fraude_phishing

## Description
This model is fine-tuned from `BSC-LT/mRoBERTa` for **binary classification of phishing detection** in English texts.  
It predicts whether a given **SMS or email message** belongs to the category of **phishing** or **not phishing**.  


## Dataset
The dataset used for fine-tuning contains **SMS and email texts** labeled as phishing or not phishing.  

- **Training set**: 9,422 instances  
- **Test set**: 2,357 instances  

## Training Parameters
- learning_rate: 2e-5  
- num_train_epochs: 2  
- per_device_train_batch_size: 8  
- per_device_eval_batch_size: 8  
- overwrite_output_dir: true  
- logging_strategy: steps  
- logging_steps: 10  
- seed: 852  
- fp16: true  

## Results

### Combined dataset (SMS + emails)
**Confusion Matrix**  

|                       | Pred Not Phishing | Pred Phishing |
| --------------------- | ----------------- | ------------- |
| **True Not Phishing** | 1793              | 16            |
| **True Phishing**     | 18                | 530           |


| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 0 (Not phishing) | 0.9901 | 0.9912 | 0.9906 | 1809 |
| 1 (Phishing)     | 0.9707 | 0.9672 | 0.9689 | 548  |

- Accuracy: **0.9856**  
- Macro Avg F1: **0.9798**  
---

### Only Emails
**Confusion Matrix**  

|                       | Pred Not Phishing | Pred Phishing |
| --------------------- | ----------------- | ------------- |
| **True Not Phishing** | 823               | 12            |
| **True Phishing**     | 14                | 313           |


| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 0 (Not phishing) | 0.9833 | 0.9856 | 0.9845 | 835 |
| 1 (Phishing)     | 0.9631 | 0.9572 | 0.9601 | 327 |

- Accuracy: **0.9776**  
- Macro Avg F1: **0.9723**  
---

### Only SMS
**Confusion Matrix**  
|                       | Pred Not Phishing | Pred Phishing |
| --------------------- | ----------------- | ------------- |
| **True Not Phishing** | 969               | 5             |
| **True Phishing**     | 6                 | 215           |


| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 0 (Not phishing) | 0.9939 | 0.9949 | 0.9944 | 974 |
| 1 (Phishing)     | 0.9773 | 0.9729 | 0.9751 | 221 |

- Accuracy: **0.9908**  
- Macro Avg F1: **0.9847**  
---

## Funding
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública, co-financed by the EU – NextGenerationEU, within the framework of the project Desarrollo de Modelos ALIA.

## Reference
```bibtex
@misc{gplsi-mroberta-fraudephishing,
  author       = {Martínez-Murillo, Iván and Consuegra-Ayala, Juan Pablo and Bonora, Mar and Sepúlveda-Torres, Robiert},
  title        = {mRoBERTa_FT1_DFT1_fraude_phishing: Fine-tuned model for phishing detection},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/gplsi/mRoBERTa_FT1_DFT1_fraude_phishing}},
  note         = {Accessed: 2025-10-03}
}