gplsi
/

Aitana-ClearLangDetection-R-1.0

Text Classification

Model card Files Files and versions

ivanmartinezmurillo commited on Oct 3, 2025

Commit

fa8d7de

·

verified ·

1 Parent(s): 4bdf289

Create README.md

Files changed (1) hide show

README.md +60 -0

README.md ADDED Viewed

	@@ -0,0 +1,60 @@

+---
+license: apache-2.0
+language:
+- en
+- es
+base_model:
+- BSC-LT/mRoBERTa
+pipeline_tag: text-classification
+library_name: transformers
+---
+# mRoBERTa_FT3_DFT3_fraude_spam
+## Description
+This model is fine-tuned from `BSC-LT/mRoBERTa` for **binary classification of spam detection** in English and Spanish texts.
+It predicts whether a given **SMS or email message** belongs to the category of **spam** or **not spam**.
+The model was trained using the Hugging Face `Trainer` API with the same configuration as the phishing detection model.
+## Dataset
+The dataset used for fine-tuning contains **SMS and email texts** labeled as spam or not spam in both English and Spanish.
+- **Training set**: 15,893 instances
+- **Test set**: 1,766 instances
+## Training Parameters
+- learning_rate: 2e-5
+- num_train_epochs: 2
+- per_device_train_batch_size: 8
+- per_device_eval_batch_size: 8
+- overwrite_output_dir: true
+- logging_strategy: steps
+- logging_steps: 10
+- seed: 852
+- fp16: true
+## Results (Test set)
+**Confusion Matrix**
+[[1506 4]
+[ 8 248]]
+| Class | Precision | Recall | F1-score | Support |
+|-------|-----------|--------|----------|---------|
+| 0 (Not spam) | 0.9947 | 0.9974 | 0.9960 | 1510 |
+| 1 (Spam)     | 0.9841 | 0.9688 | 0.9764 | 256  |
+- Accuracy: **0.9932**
+- Macro Avg F1: **0.9862**
+---
+## Reference
+```bibtex
+@misc{gplsi-mroberta-fraudespam,
+  author       = {Bonora, Mar and Sepúlveda-Torres, Robiert and Martínez-Murillo, Iván},
+  title        = {mRoBERTa_FT3_DFT3_fraude_spam: Fine-tuned model for spam detection},
+  year         = {2025},
+  howpublished = {\url{https://huggingface.co/gplsi/mRoBERTa_FT3_DFT3_fraude_spam}},
+  note         = {Accessed: 2025-10-03}
+}