|
|
--- |
|
|
datasets: |
|
|
- jhovany/Homomex2024 |
|
|
language: |
|
|
- es |
|
|
metrics: |
|
|
- accuracy |
|
|
base_model: |
|
|
- LaProfeClaudis/LGBeTO_detection_Model |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
# Model Card for Model ID |
|
|
|
|
|
Trans / No Trans text clasifier in spanish |
|
|
|
|
|
Clasificador de textos trans / no trans en español |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
Trans / No Trans text clasifier in spanish |
|
|
|
|
|
Clasificador de textos trans / no trans en español |
|
|
|
|
|
- **Developed by:** Carlos Villalobos |
|
|
- **Model type:** Binary |
|
|
- **Language:** Spanish |
|
|
- **License:** Free |
|
|
- **Finetuned from model:** LaProfeClaudis/LGBeTO_detection_Model |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
A partir de 10,963 tweets LGBT se filtraron aquellos que hablan sobre temas "trans" (3,747) utilizando la lista de palabras clave: |
|
|
|
|
|
Palabras clave: "personas trans", "población trans", "transgénero", "transexual", "transexuales", "travesti", |
|
|
"travestis", "transvesti", "transvestis", "cambio de sexo", "reasignación de sexo", "sexo asignado", |
|
|
"reasignación de género", "género autopercibido", "cirugía de cambio de sexo", "disforia de género", |
|
|
"identidad trans", "identidad de género", "derechos trans", "derechos de los trans", "transfobia", |
|
|
"discriminación trans", "odio trans", "violencia trans", "feminicidio trans", "personas no binarias", |
|
|
"no binario", "no binaria", "no binarie", "género no binario", "género fluido", "genderqueer", "queer", |
|
|
"tercer género", "magistrade", "pronombres no binarios", "representación trans", "visibilidad trans", |
|
|
"marchas trans", "orgullo trans", "movimiento trans", "activismo trans", "colectivos trans", "ONG trans", |
|
|
"Pride", "Marcha del Orgullo", "Orgullo Gay", "expresión de género", "reconocimiento legal trans", |
|
|
"cambio de identidad de género", "ley de identidad de género", "mujeres trans", "hombres trans", |
|
|
"infancias trans", "salud trans", "hormonización trans", "terapia de reemplazo hormonal", |
|
|
"Clínica Condesa", "Grupo Eon", "Inteligencia Transgenérica", "Frente Pro Derechos Transgénero y Transexuales", |
|
|
"Red de Trabajo Trans", "Coalisión T47", "Almas Cautivas", "Impulso Trans", "Kenya Cuevas", "Paolita Suárez", |
|
|
"Casa de las Muñecas Tiresias", "trabajadoras sexuales trans", "transincluyente", "transexcluyente", |
|
|
"trans en prisión", "TERF", "migración trans", "diversidad sexual", |
|
|
|
|
|
"trans", "transgénero", "transgéneros", "transexual", "transexualidad", "transexuales", "travesti", |
|
|
"travestista", "trasvestista", "travestis", "transvesti", "transvestis", "reasignación", "transfeminicidio", |
|
|
"autopercibido", "disforia", "transfobia", "transfóbica", "genderqueer", "queer", "magistrade", "binario", |
|
|
"transincluyente", "transexcluyente", "TERF", "muxe", "LGBT", "LGBT+", "LGBTI", "LGBTI+", "LGBTT", "LGBTT+", |
|
|
"LGBTTT", "LGBTTT+", "LGBTTTI", "LGBTTTI+", "LGBTTTIQ", "LGBTTTIQ+", "LGBTTTIQA", "LGBTTTIQA+", "LGBTQ", |
|
|
"LGBTQ+", "LGBTQI", "LGBTQI+", "LGBTQIA", "LGBTQIA+", "Drag" |
|
|
|
|
|
Los tweets trans (3,747) fueron etiquetados con 1 |
|
|
Los tweets no-trans (7,216) fueron etiquetados con 0 |
|
|
|
|
|
Se afinó utilizando 520 frases sintéticas que resolvieran ambiguedades como "maiz transgénico" o "película de transformers" |
|
|
|
|
|
#### Tweets training |
|
|
|
|
|
Epoch Training Loss Validation Loss Accuracy F1 |
|
|
1 0.042100 0.036335 0.994529 0.991928 |
|
|
2 0.049300 0.032004 0.994529 0.991928 |
|
|
3 0.031900 0.027946 0.995137 0.992832 |
|
|
|
|
|
### Evaluation |
|
|
|
|
|
precision recall f1-score support |
|
|
|
|
|
no trans 1.00 0.99 1.00 1089 |
|
|
trans 0.99 1.00 0.99 556 |
|
|
|
|
|
accuracy 1.00 1645 |
|
|
macro avg 0.99 1.00 0.99 1645 |
|
|
weighted avg 1.00 1.00 1.00 1645 |
|
|
|
|
|
|
|
|
### Synthetic fine-tunning |
|
|
|
|
|
Epoch Training Loss Validation Loss Accuracy F1 |
|
|
1 0.146800 0.148145 0.948718 0.953488 |
|
|
2 0.048600 0.015481 1.000000 1.000000 |
|
|
3 0.064000 0.020288 0.987179 0.988764 |
|
|
|
|
|
### Evaluation |
|
|
|
|
|
precision recall f1-score support |
|
|
|
|
|
no trans 0.97 1.00 0.99 33 |
|
|
trans 1.00 0.98 0.99 45 |
|
|
|
|
|
accuracy 0.99 78 |
|
|
macro avg 0.99 0.99 0.99 78 |
|
|
weighted avg 0.99 0.99 0.99 78 |
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
**BibTeX:** |
|
|
|
|
|
@misc{villalobos2025bertrans, |
|
|
author = {Villalobos, Carlos}, |
|
|
title = {BERTrans}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/carevies/BERTrans} |
|
|
} |
|
|
|
|
|
**APA:** |
|
|
|
|
|
Villalobos, C. (2025). BERTrans [Modelo de lenguaje]. Hugging Face. |
|
|
|