|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- BSC-LT/mRoBERTa |
|
|
pipeline_tag: text-classification |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# mRoBERTa_FT1_DFT1_fraude_phishing |
|
|
|
|
|
## Description |
|
|
This model is fine-tuned from `BSC-LT/mRoBERTa` for **binary classification of phishing detection** in English texts. |
|
|
It predicts whether a given **SMS or email message** belongs to the category of **phishing** or **not phishing**. |
|
|
|
|
|
|
|
|
## Dataset |
|
|
The dataset used for fine-tuning contains **SMS and email texts** labeled as phishing or not phishing. |
|
|
|
|
|
- **Training set**: 9,422 instances |
|
|
- **Test set**: 2,357 instances |
|
|
|
|
|
## Training Parameters |
|
|
- learning_rate: 2e-5 |
|
|
- num_train_epochs: 2 |
|
|
- per_device_train_batch_size: 8 |
|
|
- per_device_eval_batch_size: 8 |
|
|
- overwrite_output_dir: true |
|
|
- logging_strategy: steps |
|
|
- logging_steps: 10 |
|
|
- seed: 852 |
|
|
- fp16: true |
|
|
|
|
|
## Results |
|
|
|
|
|
### Combined dataset (SMS + emails) |
|
|
**Confusion Matrix** |
|
|
|
|
|
| | Pred Not Phishing | Pred Phishing | |
|
|
| --------------------- | ----------------- | ------------- | |
|
|
| **True Not Phishing** | 1793 | 16 | |
|
|
| **True Phishing** | 18 | 530 | |
|
|
|
|
|
|
|
|
| Class | Precision | Recall | F1-score | Support | |
|
|
|-------|-----------|--------|----------|---------| |
|
|
| 0 (Not phishing) | 0.9901 | 0.9912 | 0.9906 | 1809 | |
|
|
| 1 (Phishing) | 0.9707 | 0.9672 | 0.9689 | 548 | |
|
|
|
|
|
- Accuracy: **0.9856** |
|
|
- Macro Avg F1: **0.9798** |
|
|
--- |
|
|
|
|
|
### Only Emails |
|
|
**Confusion Matrix** |
|
|
|
|
|
| | Pred Not Phishing | Pred Phishing | |
|
|
| --------------------- | ----------------- | ------------- | |
|
|
| **True Not Phishing** | 823 | 12 | |
|
|
| **True Phishing** | 14 | 313 | |
|
|
|
|
|
|
|
|
| Class | Precision | Recall | F1-score | Support | |
|
|
|-------|-----------|--------|----------|---------| |
|
|
| 0 (Not phishing) | 0.9833 | 0.9856 | 0.9845 | 835 | |
|
|
| 1 (Phishing) | 0.9631 | 0.9572 | 0.9601 | 327 | |
|
|
|
|
|
- Accuracy: **0.9776** |
|
|
- Macro Avg F1: **0.9723** |
|
|
--- |
|
|
|
|
|
### Only SMS |
|
|
**Confusion Matrix** |
|
|
| | Pred Not Phishing | Pred Phishing | |
|
|
| --------------------- | ----------------- | ------------- | |
|
|
| **True Not Phishing** | 969 | 5 | |
|
|
| **True Phishing** | 6 | 215 | |
|
|
|
|
|
|
|
|
| Class | Precision | Recall | F1-score | Support | |
|
|
|-------|-----------|--------|----------|---------| |
|
|
| 0 (Not phishing) | 0.9939 | 0.9949 | 0.9944 | 974 | |
|
|
| 1 (Phishing) | 0.9773 | 0.9729 | 0.9751 | 221 | |
|
|
|
|
|
- Accuracy: **0.9908** |
|
|
- Macro Avg F1: **0.9847** |
|
|
--- |
|
|
|
|
|
## Funding |
|
|
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública, co-financed by the EU – NextGenerationEU, within the framework of the project Desarrollo de Modelos ALIA. |
|
|
|
|
|
## Reference |
|
|
```bibtex |
|
|
@misc{gplsi-mroberta-fraudephishing, |
|
|
author = {Martínez-Murillo, Iván and Consuegra-Ayala, Juan Pablo and Bonora, Mar and Sepúlveda-Torres, Robiert}, |
|
|
title = {mRoBERTa_FT1_DFT1_fraude_phishing: Fine-tuned model for phishing detection}, |
|
|
year = {2025}, |
|
|
howpublished = {\url{https://huggingface.co/gplsi/mRoBERTa_FT1_DFT1_fraude_phishing}}, |
|
|
note = {Accessed: 2025-10-03} |
|
|
} |
|
|
|