Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
datasets:
|
| 3 |
+
- jhovany/Homomex2024
|
| 4 |
+
language:
|
| 5 |
+
- es
|
| 6 |
+
metrics:
|
| 7 |
+
- accuracy
|
| 8 |
+
base_model:
|
| 9 |
+
- LaProfeClaudis/LGBeTO_detection_Model
|
| 10 |
+
pipeline_tag: text-classification
|
| 11 |
+
---
|
| 12 |
+
# Model Card for Model ID
|
| 13 |
+
|
| 14 |
+
Trans / No Trans text clasifier in spanish
|
| 15 |
+
Clasificador de textos trans / no trans en español
|
| 16 |
+
|
| 17 |
+
## Model Details
|
| 18 |
+
|
| 19 |
+
### Model Description
|
| 20 |
+
|
| 21 |
+
Trans / No Trans text clasifier in spanish
|
| 22 |
+
Clasificador de textos trans / no trans en español
|
| 23 |
+
|
| 24 |
+
- **Developed by:** Carlos Villalobos
|
| 25 |
+
- **Model type:** Binary
|
| 26 |
+
- **Language:** Spanish
|
| 27 |
+
- **License:** Free
|
| 28 |
+
- **Finetuned from model:** LaProfeClaudis/LGBeTO_detection_Model
|
| 29 |
+
|
| 30 |
+
## Training Details
|
| 31 |
+
|
| 32 |
+
### Training Data
|
| 33 |
+
|
| 34 |
+
A partir de 10,963 tweets LGBT se filtraron aquellos que hablan sobre temas "trans" (3,747) utilizando la lista de palabras clave:
|
| 35 |
+
|
| 36 |
+
Palabras clave: "personas trans", "población trans", "transgénero", "transexual", "transexuales", "travesti",
|
| 37 |
+
"travestis", "transvesti", "transvestis", "cambio de sexo", "reasignación de sexo", "sexo asignado",
|
| 38 |
+
"reasignación de género", "género autopercibido", "cirugía de cambio de sexo", "disforia de género",
|
| 39 |
+
"identidad trans", "identidad de género", "derechos trans", "derechos de los trans", "transfobia",
|
| 40 |
+
"discriminación trans", "odio trans", "violencia trans", "feminicidio trans", "personas no binarias",
|
| 41 |
+
"no binario", "no binaria", "no binarie", "género no binario", "género fluido", "genderqueer", "queer",
|
| 42 |
+
"tercer género", "magistrade", "pronombres no binarios", "representación trans", "visibilidad trans",
|
| 43 |
+
"marchas trans", "orgullo trans", "movimiento trans", "activismo trans", "colectivos trans", "ONG trans",
|
| 44 |
+
"Pride", "Marcha del Orgullo", "Orgullo Gay", "expresión de género", "reconocimiento legal trans",
|
| 45 |
+
"cambio de identidad de género", "ley de identidad de género", "mujeres trans", "hombres trans",
|
| 46 |
+
"infancias trans", "salud trans", "hormonización trans", "terapia de reemplazo hormonal",
|
| 47 |
+
"Clínica Condesa", "Grupo Eon", "Inteligencia Transgenérica", "Frente Pro Derechos Transgénero y Transexuales",
|
| 48 |
+
"Red de Trabajo Trans", "Coalisión T47", "Almas Cautivas", "Impulso Trans", "Kenya Cuevas", "Paolita Suárez",
|
| 49 |
+
"Casa de las Muñecas Tiresias", "trabajadoras sexuales trans", "transincluyente", "transexcluyente",
|
| 50 |
+
"trans en prisión", "TERF", "migración trans", "diversidad sexual",
|
| 51 |
+
|
| 52 |
+
"trans", "transgénero", "transgéneros", "transexual", "transexualidad", "transexuales", "travesti",
|
| 53 |
+
"travestista", "trasvestista", "travestis", "transvesti", "transvestis", "reasignación", "transfeminicidio",
|
| 54 |
+
"autopercibido", "disforia", "transfobia", "transfóbica", "genderqueer", "queer", "magistrade", "binario",
|
| 55 |
+
"transincluyente", "transexcluyente", "TERF", "muxe", "LGBT", "LGBT+", "LGBTI", "LGBTI+", "LGBTT", "LGBTT+",
|
| 56 |
+
"LGBTTT", "LGBTTT+", "LGBTTTI", "LGBTTTI+", "LGBTTTIQ", "LGBTTTIQ+", "LGBTTTIQA", "LGBTTTIQA+", "LGBTQ",
|
| 57 |
+
"LGBTQ+", "LGBTQI", "LGBTQI+", "LGBTQIA", "LGBTQIA+", "Drag"
|
| 58 |
+
|
| 59 |
+
Los tweets trans (3,747) fueron etiquetados con 1
|
| 60 |
+
Los tweets no-trans (7,216) fueron etiquetados con 0
|
| 61 |
+
|
| 62 |
+
Se afinó utilizando 520 frases sintéticas que resolvieran ambiguedades como "maiz transgénico" o "película de transformers"
|
| 63 |
+
|
| 64 |
+
#### Tweets training
|
| 65 |
+
|
| 66 |
+
Epoch Training Loss Validation Loss Accuracy F1
|
| 67 |
+
1 0.042100 0.036335 0.994529 0.991928
|
| 68 |
+
2 0.049300 0.032004 0.994529 0.991928
|
| 69 |
+
3 0.031900 0.027946 0.995137 0.992832
|
| 70 |
+
|
| 71 |
+
### Evaluation
|
| 72 |
+
|
| 73 |
+
precision recall f1-score support
|
| 74 |
+
|
| 75 |
+
no trans 1.00 0.99 1.00 1089
|
| 76 |
+
trans 0.99 1.00 0.99 556
|
| 77 |
+
|
| 78 |
+
accuracy 1.00 1645
|
| 79 |
+
macro avg 0.99 1.00 0.99 1645
|
| 80 |
+
weighted avg 1.00 1.00 1.00 1645
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
### Synthetic fine-tunning
|
| 84 |
+
|
| 85 |
+
Epoch Training Loss Validation Loss Accuracy F1
|
| 86 |
+
1 0.146800 0.148145 0.948718 0.953488
|
| 87 |
+
2 0.048600 0.015481 1.000000 1.000000
|
| 88 |
+
3 0.064000 0.020288 0.987179 0.988764
|
| 89 |
+
|
| 90 |
+
### Evaluation
|
| 91 |
+
|
| 92 |
+
precision recall f1-score support
|
| 93 |
+
|
| 94 |
+
no trans 0.97 1.00 0.99 33
|
| 95 |
+
trans 1.00 0.98 0.99 45
|
| 96 |
+
|
| 97 |
+
accuracy 0.99 78
|
| 98 |
+
macro avg 0.99 0.99 0.99 78
|
| 99 |
+
weighted avg 0.99 0.99 0.99 78
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
## Citation
|
| 103 |
+
|
| 104 |
+
**BibTeX:**
|
| 105 |
+
|
| 106 |
+
[More Information Needed]
|
| 107 |
+
|
| 108 |
+
**APA:**
|
| 109 |
+
|
| 110 |
+
[More Information Needed]
|
| 111 |
+
|
| 112 |
+
## Glossary [optional]
|