Aitana-MarcasFalsas-R-1.0

Table of Contents

Model Description

This model is fine-tuned from BSC-LT/MrBert and distributed via Hugging Face for the task of fake brand detection in Spanish and English texts. It performs binary classification and predicts between two categories:

  • fake: The text refers to counterfeit, fraudulent, impersonated, or non-legitimate brands.
  • not-fake: The text refers to legitimate, authentic, or non-deceptive brands.

The model is designed to support brand protection, marketplace moderation, and fraud detection systems in Spanish and English language content.

Training Details

Training Data

The dataset consists of Spanish and English texts annotated with binary labels (fake / not-fake).

Dataset Size

  • Training set: 3,879,861 instances
  • Validation set: 431,096 instances
  • Test set: 760,759 instances

Class Distribution

Training set

  • fake: 2,155,743
  • not-fake: 2,155,214

Test distribution

  • fake: 380,115
  • not-fake: 380,644

Dataset Structure

DatasetDict({
    train: Dataset({
        features: ['text', 'labels', 'features', '__index_level_0__'],
        num_rows: 3879861
    })
    validation: Dataset({
        features: ['text', 'labels', 'features', '__index_level_0__'],
        num_rows: 431096
    })
    test: Dataset({
        features: ['text', 'labels', 'features'],
        num_rows: 760759
    })
})

Training Hyperparameters

Parameter Value
learning_rate 2e-5
num_train_epochs 2
per_device_train_batch_size 16

Evaluation

Test Set

760,759 instances

Confusion Matrix

Predicted fake Predicted not-fake
True fake 354,237 25,878
True not-fake 34,116 346,528

Classification Report

Class Precision Recall F1-score Support
fake 0.9122 0.9319 0.9219 380,115
not-fake 0.9305 0.9104 0.9203 380,644

Accuracy: 0.9211
Macro Avg F1: 0.9211
Weighted Avg F1: 0.9211

Technical Specifications

Hardware and Software

For training, we used custom code developed to fine-tuned model using Transformers library.

Compute Infrastructure

This model was trained on NVIDIA DGX systems equipped with A100 GPUs, which enabled efficient large-scale training. For this model, we used 2-A100 GPU.

Additional Information

Author

The model has been developed by the Language and Information Systems Group (GPLSI) and the Centro de Inteligencia Digital (CENID), both part of the University of Alicante (UA), as part of their ongoing research in Natural Language Processing (NLP).

Funding

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública, co-financed by the EU – NextGenerationEU, within the framework of the project Desarrollo de Modelos ALIA.

Acknowledgments

We would like to express our gratitude to all individuals and institutions that have contributed to the development of this work.

Special thanks to:

We also acknowledge the financial, technical, and scientific support of the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA, whose contribution has been essential to the completion of this research.

License

Apache License, Version 2.0

Disclaimer

This model has been developed and fine-tuned specifically for classification task. The authors are not responsible for potential errors, misinterpretations, or inappropriate use of the model beyond its intended purpose.


Reference

@misc{martinez-mrbert-fakebrands,
  author       = {Martínez-Murillo, Iván and Sepúlveda-Torres, Robiert and Consuegra-Ayala, Juan Pablo and Galeano, Santiago and Miró Maestre, María and  and Grande, Eduardo and Canal-Esteve, Miquel and Estevanell-Valladares, Ernesto L. and Yáñez-Romero, Fabio and Bonora, Mar and Gutierrez, Yoan and Abreu Salas, José Ignacio and Lloret, Elena and Montoyo, Andrés and Muñoz-Guillena and Palomar, Manuel},
  title        = {Aitana-MarcasFalsas-R-1.0: Fine-tuned model for fake brand detection in Spanish and English},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/gplsi/Aitana-MarcasFalsas-R-1.0}},
  note         = {Accessed: 2026-02-17}
}

Copyright © 2026 Language and Information Systems Group (GPLSI) and Centro de Inteligencia Digital (CENID), University of Alicante (UA). Distributed under the Apache License 2.0.

Downloads last month
11
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support