marbonora's picture
Updated README.md - Funding
145336b verified
---
license: apache-2.0
language:
- en
base_model:
- BSC-LT/mRoBERTa
pipeline_tag: text-classification
library_name: transformers
---
# mRoBERTa_FT1_DFT1_fraude_phishing
## Description
This model is fine-tuned from `BSC-LT/mRoBERTa` for **binary classification of phishing detection** in English texts.
It predicts whether a given **SMS or email message** belongs to the category of **phishing** or **not phishing**.
## Dataset
The dataset used for fine-tuning contains **SMS and email texts** labeled as phishing or not phishing.
- **Training set**: 9,422 instances
- **Test set**: 2,357 instances
## Training Parameters
- learning_rate: 2e-5
- num_train_epochs: 2
- per_device_train_batch_size: 8
- per_device_eval_batch_size: 8
- overwrite_output_dir: true
- logging_strategy: steps
- logging_steps: 10
- seed: 852
- fp16: true
## Results
### Combined dataset (SMS + emails)
**Confusion Matrix**
| | Pred Not Phishing | Pred Phishing |
| --------------------- | ----------------- | ------------- |
| **True Not Phishing** | 1793 | 16 |
| **True Phishing** | 18 | 530 |
| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 0 (Not phishing) | 0.9901 | 0.9912 | 0.9906 | 1809 |
| 1 (Phishing) | 0.9707 | 0.9672 | 0.9689 | 548 |
- Accuracy: **0.9856**
- Macro Avg F1: **0.9798**
---
### Only Emails
**Confusion Matrix**
| | Pred Not Phishing | Pred Phishing |
| --------------------- | ----------------- | ------------- |
| **True Not Phishing** | 823 | 12 |
| **True Phishing** | 14 | 313 |
| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 0 (Not phishing) | 0.9833 | 0.9856 | 0.9845 | 835 |
| 1 (Phishing) | 0.9631 | 0.9572 | 0.9601 | 327 |
- Accuracy: **0.9776**
- Macro Avg F1: **0.9723**
---
### Only SMS
**Confusion Matrix**
| | Pred Not Phishing | Pred Phishing |
| --------------------- | ----------------- | ------------- |
| **True Not Phishing** | 969 | 5 |
| **True Phishing** | 6 | 215 |
| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 0 (Not phishing) | 0.9939 | 0.9949 | 0.9944 | 974 |
| 1 (Phishing) | 0.9773 | 0.9729 | 0.9751 | 221 |
- Accuracy: **0.9908**
- Macro Avg F1: **0.9847**
---
## Funding
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública, co-financed by the EU – NextGenerationEU, within the framework of the project Desarrollo de Modelos ALIA.
## Reference
```bibtex
@misc{gplsi-mroberta-fraudephishing,
author = {Martínez-Murillo, Iván and Consuegra-Ayala, Juan Pablo and Bonora, Mar and Sepúlveda-Torres, Robiert},
title = {mRoBERTa_FT1_DFT1_fraude_phishing: Fine-tuned model for phishing detection},
year = {2025},
howpublished = {\url{https://huggingface.co/gplsi/mRoBERTa_FT1_DFT1_fraude_phishing}},
note = {Accessed: 2025-10-03}
}