gplsi
/

Aitana-FraudDetection-R-1.0

Text Classification

Model card Files Files and versions

Aitana-FraudDetection-R-1.0 / README.md

marbonora's picture

Updated README.md - Funding

145336b verified 2 months ago

|

history blame contribute delete

3.23 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- BSC-LT/mRoBERTa
	pipeline_tag: text-classification
	library_name: transformers
	---

	# mRoBERTa_FT1_DFT1_fraude_phishing

	## Description
	This model is fine-tuned from `BSC-LT/mRoBERTa` for binary classification of phishing detection in English texts.
	It predicts whether a given SMS or email message belongs to the category of phishing or not phishing.


	## Dataset
	The dataset used for fine-tuning contains SMS and email texts labeled as phishing or not phishing.

	- Training set: 9,422 instances
	- Test set: 2,357 instances

	## Training Parameters
	- learning_rate: 2e-5
	- num_train_epochs: 2
	- per_device_train_batch_size: 8
	- per_device_eval_batch_size: 8
	- overwrite_output_dir: true
	- logging_strategy: steps
	- logging_steps: 10
	- seed: 852
	- fp16: true

	## Results

	### Combined dataset (SMS + emails)
	Confusion Matrix

	\| \| Pred Not Phishing \| Pred Phishing \|
	\| --------------------- \| ----------------- \| ------------- \|
	\| True Not Phishing \| 1793 \| 16 \|
	\| True Phishing \| 18 \| 530 \|


	\| Class \| Precision \| Recall \| F1-score \| Support \|
	\|-------\|-----------\|--------\|----------\|---------\|
	\| 0 (Not phishing) \| 0.9901 \| 0.9912 \| 0.9906 \| 1809 \|
	\| 1 (Phishing) \| 0.9707 \| 0.9672 \| 0.9689 \| 548 \|

	- Accuracy: 0.9856
	- Macro Avg F1: 0.9798
	---

	### Only Emails
	Confusion Matrix

	\| \| Pred Not Phishing \| Pred Phishing \|
	\| --------------------- \| ----------------- \| ------------- \|
	\| True Not Phishing \| 823 \| 12 \|
	\| True Phishing \| 14 \| 313 \|


	\| Class \| Precision \| Recall \| F1-score \| Support \|
	\|-------\|-----------\|--------\|----------\|---------\|
	\| 0 (Not phishing) \| 0.9833 \| 0.9856 \| 0.9845 \| 835 \|
	\| 1 (Phishing) \| 0.9631 \| 0.9572 \| 0.9601 \| 327 \|

	- Accuracy: 0.9776
	- Macro Avg F1: 0.9723
	---

	### Only SMS
	Confusion Matrix
	\| \| Pred Not Phishing \| Pred Phishing \|
	\| --------------------- \| ----------------- \| ------------- \|
	\| True Not Phishing \| 969 \| 5 \|
	\| True Phishing \| 6 \| 215 \|


	\| Class \| Precision \| Recall \| F1-score \| Support \|
	\|-------\|-----------\|--------\|----------\|---------\|
	\| 0 (Not phishing) \| 0.9939 \| 0.9949 \| 0.9944 \| 974 \|
	\| 1 (Phishing) \| 0.9773 \| 0.9729 \| 0.9751 \| 221 \|

	- Accuracy: 0.9908
	- Macro Avg F1: 0.9847
	---

	## Funding
	This work is funded by the Ministerio para la Transformación Digital y de la Función Pública, co-financed by the EU – NextGenerationEU, within the framework of the project Desarrollo de Modelos ALIA.

	## Reference
	```bibtex
	@misc{gplsi-mroberta-fraudephishing,
	author = {Martínez-Murillo, Iván and Consuegra-Ayala, Juan Pablo and Bonora, Mar and Sepúlveda-Torres, Robiert},
	title = {mRoBERTa_FT1_DFT1_fraude_phishing: Fine-tuned model for phishing detection},
	year = {2025},
	howpublished = {\url{https://huggingface.co/gplsi/mRoBERTa_FT1_DFT1_fraude_phishing}},
	note = {Accessed: 2025-10-03}
	}