Aitana-MarcasFalsas-R-1.0

Model Description
Training Details
Technical Specifications
Evaluation
Additional Information

Model Description

This model is fine-tuned from BSC-LT/MrBert and distributed via Hugging Face for the task of fake brand detection in Spanish and English texts. It performs binary classification and predicts between two categories:

fake: The text refers to counterfeit, fraudulent, impersonated, or non-legitimate brands.
not-fake: The text refers to legitimate, authentic, or non-deceptive brands.

The model is designed to support brand protection, marketplace moderation, and fraud detection systems in Spanish and English language content.

Training Details

Training Data

The dataset consists of Spanish and English texts annotated with binary labels (fake / not-fake).

Dataset Size

Training set: 3,879,861 instances
Validation set: 431,096 instances
Test set: 760,759 instances

Class Distribution

Training set

fake: 2,155,743
not-fake: 2,155,214

Test distribution

fake: 380,115
not-fake: 380,644

Dataset Structure

DatasetDict({
    train: Dataset({
        features: ['text', 'labels', 'features', '__index_level_0__'],
        num_rows: 3879861
    })
    validation: Dataset({
        features: ['text', 'labels', 'features', '__index_level_0__'],
        num_rows: 431096
    })
    test: Dataset({
        features: ['text', 'labels', 'features'],
        num_rows: 760759
    })
})

Training Hyperparameters

Parameter	Value
learning_rate	2e-5
num_train_epochs	2
per_device_train_batch_size	16

Evaluation

Test Set

760,759 instances

Confusion Matrix

	Predicted fake	Predicted not-fake
True fake	354,237	25,878
True not-fake	34,116	346,528

Classification Report

Class	Precision	Recall	F1-score	Support
fake	0.9122	0.9319	0.9219	380,115
not-fake	0.9305	0.9104	0.9203	380,644

Accuracy: 0.9211
Macro Avg F1: 0.9211
Weighted Avg F1: 0.9211

Technical Specifications

Hardware and Software

For training, we used custom code developed to fine-tuned model using Transformers library.

Compute Infrastructure

This model was trained on NVIDIA DGX systems equipped with A100 GPUs, which enabled efficient large-scale training. For this model, we used 2-A100 GPU.

Additional Information

Author

The model has been developed by the Language and Information Systems Group (GPLSI) and the Centro de Inteligencia Digital (CENID), both part of the University of Alicante (UA), as part of their ongoing research in Natural Language Processing (NLP).

Funding

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública, co-financed by the EU – NextGenerationEU, within the framework of the project Desarrollo de Modelos ALIA.

Acknowledgments

We would like to express our gratitude to all individuals and institutions that have contributed to the development of this work.

Special thanks to:

We also acknowledge the financial, technical, and scientific support of the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA, whose contribution has been essential to the completion of this research.

License

Apache License, Version 2.0

Disclaimer

This model has been developed and fine-tuned specifically for classification task. The authors are not responsible for potential errors, misinterpretations, or inappropriate use of the model beyond its intended purpose.

Reference

@misc{martinez-mrbert-fakebrands,
  author       = {Martínez-Murillo, Iván and Sepúlveda-Torres, Robiert and Consuegra-Ayala, Juan Pablo and Galeano, Santiago and Miró Maestre, María and  and Grande, Eduardo and Canal-Esteve, Miquel and Estevanell-Valladares, Ernesto L. and Yáñez-Romero, Fabio and Bonora, Mar and Gutierrez, Yoan and Abreu Salas, José Ignacio and Lloret, Elena and Montoyo, Andrés and Muñoz-Guillena and Palomar, Manuel},
  title        = {Aitana-MarcasFalsas-R-1.0: Fine-tuned model for fake brand detection in Spanish and English},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/gplsi/Aitana-MarcasFalsas-R-1.0}},
  note         = {Accessed: 2026-02-17}
}

Downloads last month: 11

Safetensors

Model size

0.3B params

Tensor type

F32

gplsi
/

Aitana-MarcasFalsas-R-1.0