--- license: apache-2.0 language: - en base_model: - BSC-LT/mRoBERTa pipeline_tag: text-classification library_name: transformers --- # mRoBERTa_FT1_DFT1_fraude_phishing ## Description This model is fine-tuned from `BSC-LT/mRoBERTa` for **binary classification of phishing detection** in English texts. It predicts whether a given **SMS or email message** belongs to the category of **phishing** or **not phishing**. ## Dataset The dataset used for fine-tuning contains **SMS and email texts** labeled as phishing or not phishing. - **Training set**: 9,422 instances - **Test set**: 2,357 instances ## Training Parameters - learning_rate: 2e-5 - num_train_epochs: 2 - per_device_train_batch_size: 8 - per_device_eval_batch_size: 8 - overwrite_output_dir: true - logging_strategy: steps - logging_steps: 10 - seed: 852 - fp16: true ## Results ### Combined dataset (SMS + emails) **Confusion Matrix** | | Pred Not Phishing | Pred Phishing | | --------------------- | ----------------- | ------------- | | **True Not Phishing** | 1793 | 16 | | **True Phishing** | 18 | 530 | | Class | Precision | Recall | F1-score | Support | |-------|-----------|--------|----------|---------| | 0 (Not phishing) | 0.9901 | 0.9912 | 0.9906 | 1809 | | 1 (Phishing) | 0.9707 | 0.9672 | 0.9689 | 548 | - Accuracy: **0.9856** - Macro Avg F1: **0.9798** --- ### Only Emails **Confusion Matrix** | | Pred Not Phishing | Pred Phishing | | --------------------- | ----------------- | ------------- | | **True Not Phishing** | 823 | 12 | | **True Phishing** | 14 | 313 | | Class | Precision | Recall | F1-score | Support | |-------|-----------|--------|----------|---------| | 0 (Not phishing) | 0.9833 | 0.9856 | 0.9845 | 835 | | 1 (Phishing) | 0.9631 | 0.9572 | 0.9601 | 327 | - Accuracy: **0.9776** - Macro Avg F1: **0.9723** --- ### Only SMS **Confusion Matrix** | | Pred Not Phishing | Pred Phishing | | --------------------- | ----------------- | ------------- | | **True Not Phishing** | 969 | 5 | | **True Phishing** | 6 | 215 | | Class | Precision | Recall | F1-score | Support | |-------|-----------|--------|----------|---------| | 0 (Not phishing) | 0.9939 | 0.9949 | 0.9944 | 974 | | 1 (Phishing) | 0.9773 | 0.9729 | 0.9751 | 221 | - Accuracy: **0.9908** - Macro Avg F1: **0.9847** --- ## Funding This work is funded by the Ministerio para la Transformación Digital y de la Función Pública, co-financed by the EU – NextGenerationEU, within the framework of the project Desarrollo de Modelos ALIA. ## Reference ```bibtex @misc{gplsi-mroberta-fraudephishing, author = {Martínez-Murillo, Iván and Consuegra-Ayala, Juan Pablo and Bonora, Mar and Sepúlveda-Torres, Robiert}, title = {mRoBERTa_FT1_DFT1_fraude_phishing: Fine-tuned model for phishing detection}, year = {2025}, howpublished = {\url{https://huggingface.co/gplsi/mRoBERTa_FT1_DFT1_fraude_phishing}}, note = {Accessed: 2025-10-03} }