File size: 4,129 Bytes

---
library_name: transformers
base_model: dccuchile/bert-base-spanish-wwm-uncased
tags:
- generated_from_trainer
- hate towards LGBT communities
- BETO
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: LGBeTO_detection_Model
  results: []
license: cc-by-4.0
language:
- es
pipeline_tag: text-classification
---

# LGBeTO_detection_Model

This is LGBeTO model. Corresponding to a fine-tuned version of [dccuchile/bert-base-spanish-wwm-uncased](https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased)(Cañete et al., 2023).
It achieves the following results on the evaluation set:

- Accuracy: 0.835
- F1: 0.8533
- Precision: 0.8205
- Recall: 0.8889

## Authors

- **Developed by:** Claudia Martínez-Araneda, Mariella Gutiérrez V., Pedro Gómez M., Diego Maldonado M., Alejandra Segura N., Christian Vidal-Castro
- **Model type:** BERT-based sentiment analysis, BERT-based text classification.
- **Language(s) (NLP):** Spanish
- **License:** CC BY 4.0
- **Finetuned from model:** BETO (Cañete et al., 2023)

Cite as:

@misc{claudia_martínez-araneda_2025,
	author       = { Claudia Martínez-Araneda and Mariella Gutiérrez V. and Pedro Gómez M. and Diego Maldonado M. and Alejandra Segura N. and Christian Vidal-Castro },
	title        = { LGBeTO_detection_Model (Revision a8b5b38) },
	year         = 2025,
	url          = { https://huggingface.co/LaProfeClaudis/LGBeTO_detection_Model },
	doi          = { 10.57967/hf/5406 },
	publisher    = { Hugging Face }
}

## Model description

LGBeTO was designed to detect discriminatory or hateful language directed toward the LGBTQIA+ community, aiming to support safer and more inclusive online environments.

## Intended uses & limitations

This model was created for a study conducted strictly for academic and research purposes. The target of hate speech has been anonymised, and there is no intent to harm the perpetrators
in any way. We prioritise protecting the privacy and confidentiality of vulnerable individuals. We carefully remove identifying data, such as user IDs, phone numbers, and addresses, to safeguard privacy before 
sharing the data with our annotators. All data collected comes from public sources. 

As authors, we affirm our deep respect for all individuals and explicitly state that we have no intention of prejudicing, biasing, or disrespecting the LGBTQIA+ community or any group. Our work seeks to contribute constructively to inclusive 
and ethical research in artificial intelligence.


## Training and evaluation data

LGBeTO was fine-tuned using comments collected from digital media, such as Twitter, Instagram, websites, and YouTube comments.
The dataset is available in the Zenodo Repository.

Cite as:
Martínez-Araneda, C., Maldonado Montiel, D., Gutiérrez Valenzuela, M., Gómez Meneses, P., Segura Navarrete, A.,
& Vidal-Castro, C. (2025). LGBTQIAphobia dataset (augmented and balanced) [Data set]. Zenodo. 
https://doi.org/10.5281/zenodo.15385622

## Training procedure

- **step 1:** Load the dataSet
- **step 2:** Tokenization and model generation
- **step 3:** Split train-validation
- **step 4:** Training configuration
- **step 5:** Training/Evaluation


### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- num_epochs: 3

### Training results

  | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1     | Precision | Recall |
  |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
  | 0.4655        | 1.0   | 50   | 0.5517          | 0.755    | 0.7538 | 0.8242    | 0.6944 |
  | 0.1928        | 2.0   | 100  | 0.4830          | 0.825    | 0.8523 | 0.7829    | 0.9352 |
| **0.0718**        | **3.0**    | **150**   | **0.5393**          | **0.835**     | **0.8533**  | **0.8205**    | **0.8889** |


### Framework versions

- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.6.0
- Tokenizers 0.21.1