File size: 3,234 Bytes
9b0e57f ef1638f 9b0e57f 2336cf2 9b0e57f ef1638f 9b0e57f 2336cf2 9b0e57f ef1638f 9b0e57f 2336cf2 9b0e57f 145336b 9b0e57f 2ba8a14 9b0e57f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
---
license: apache-2.0
language:
- en
base_model:
- BSC-LT/mRoBERTa
pipeline_tag: text-classification
library_name: transformers
---
# mRoBERTa_FT1_DFT1_fraude_phishing
## Description
This model is fine-tuned from `BSC-LT/mRoBERTa` for **binary classification of phishing detection** in English texts.
It predicts whether a given **SMS or email message** belongs to the category of **phishing** or **not phishing**.
## Dataset
The dataset used for fine-tuning contains **SMS and email texts** labeled as phishing or not phishing.
- **Training set**: 9,422 instances
- **Test set**: 2,357 instances
## Training Parameters
- learning_rate: 2e-5
- num_train_epochs: 2
- per_device_train_batch_size: 8
- per_device_eval_batch_size: 8
- overwrite_output_dir: true
- logging_strategy: steps
- logging_steps: 10
- seed: 852
- fp16: true
## Results
### Combined dataset (SMS + emails)
**Confusion Matrix**
| | Pred Not Phishing | Pred Phishing |
| --------------------- | ----------------- | ------------- |
| **True Not Phishing** | 1793 | 16 |
| **True Phishing** | 18 | 530 |
| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 0 (Not phishing) | 0.9901 | 0.9912 | 0.9906 | 1809 |
| 1 (Phishing) | 0.9707 | 0.9672 | 0.9689 | 548 |
- Accuracy: **0.9856**
- Macro Avg F1: **0.9798**
---
### Only Emails
**Confusion Matrix**
| | Pred Not Phishing | Pred Phishing |
| --------------------- | ----------------- | ------------- |
| **True Not Phishing** | 823 | 12 |
| **True Phishing** | 14 | 313 |
| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 0 (Not phishing) | 0.9833 | 0.9856 | 0.9845 | 835 |
| 1 (Phishing) | 0.9631 | 0.9572 | 0.9601 | 327 |
- Accuracy: **0.9776**
- Macro Avg F1: **0.9723**
---
### Only SMS
**Confusion Matrix**
| | Pred Not Phishing | Pred Phishing |
| --------------------- | ----------------- | ------------- |
| **True Not Phishing** | 969 | 5 |
| **True Phishing** | 6 | 215 |
| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 0 (Not phishing) | 0.9939 | 0.9949 | 0.9944 | 974 |
| 1 (Phishing) | 0.9773 | 0.9729 | 0.9751 | 221 |
- Accuracy: **0.9908**
- Macro Avg F1: **0.9847**
---
## Funding
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública, co-financed by the EU – NextGenerationEU, within the framework of the project Desarrollo de Modelos ALIA.
## Reference
```bibtex
@misc{gplsi-mroberta-fraudephishing,
author = {Martínez-Murillo, Iván and Consuegra-Ayala, Juan Pablo and Bonora, Mar and Sepúlveda-Torres, Robiert},
title = {mRoBERTa_FT1_DFT1_fraude_phishing: Fine-tuned model for phishing detection},
year = {2025},
howpublished = {\url{https://huggingface.co/gplsi/mRoBERTa_FT1_DFT1_fraude_phishing}},
note = {Accessed: 2025-10-03}
}
|