LaProfeClaudis commited on
Commit
1889b6c
·
verified ·
1 Parent(s): 7ab244b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -14
README.md CHANGED
@@ -21,7 +21,7 @@ pipeline_tag: text-classification
21
 
22
  # LGBeTO_detection_Model
23
 
24
- This model is LGBeTO model. Corresponding to a fine-tuned version of [dccuchile/bert-base-spanish-wwm-uncased](https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased) (Cañete et al., 2023).
25
  It achieves the following results on the evaluation set:
26
 
27
  - Accuracy: 0.835
@@ -29,6 +29,13 @@ It achieves the following results on the evaluation set:
29
  - Precision: 0.8205
30
  - Recall: 0.8889
31
 
 
 
 
 
 
 
 
32
 
33
  ## Model description
34
 
@@ -36,19 +43,17 @@ LGBeTO was designed to detect discriminatory or hateful language directed toward
36
 
37
  ## Intended uses & limitations
38
 
39
- This model was created for a study that was conducted strictly for academic and research purposes. The target of hate speech has been anonymised, and there is no intent to harm the perpetrators
40
- in any way. We prioritize protecting the privacy and confidentiality of vulnerable individuals.
41
- We carefully remove identifying data, such as user IDs, phone numbers, and addresses, to safeguard privacy before
42
  sharing the data with our annotators. All data collected comes from public sources.
43
 
44
- As authors, we affirm our deep respect for all individuals and explicitly state that we have no intention of prejudicing,
45
- biasing, or disrespecting the LGBTQIA+ community or any group. Our work seeks to contribute constructively to inclusive
46
  and ethical research in artificial intelligence.
47
 
48
 
49
  ## Training and evaluation data
50
 
51
- LGBeTO was fine-tuned using comments collected from digital media, such as Twitter, Instagram, websites, and YouTube comments
52
  The dataset is available in the Zenodo Repository.
53
 
54
  Cite as:
@@ -58,11 +63,11 @@ https://doi.org/10.5281/zenodo.15385622
58
 
59
  ## Training procedure
60
 
61
- - step 1: Load the dataSet
62
- - step 2: Tokenization and model generation
63
- - step 3: Split train-validation
64
- - step 4: Training configuration
65
- - step 5: Training/Evaluation
66
 
67
 
68
  ### Training hyperparameters
@@ -73,7 +78,6 @@ The following hyperparameters were used during training:
73
  - eval_batch_size: 16
74
  - seed: 42
75
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
76
- - lr_scheduler_type: linear
77
  - num_epochs: 3
78
 
79
  ### Training results
@@ -82,7 +86,7 @@ The following hyperparameters were used during training:
82
  |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
83
  | 0.4655 | 1.0 | 50 | 0.5517 | 0.755 | 0.7538 | 0.8242 | 0.6944 |
84
  | 0.1928 | 2.0 | 100 | 0.4830 | 0.825 | 0.8523 | 0.7829 | 0.9352 |
85
- | 0.0718 | 3.0 | 150 | 0.5393 | 0.835 | 0.8533 | 0.8205 | 0.8889 |
86
 
87
 
88
  ### Framework versions
 
21
 
22
  # LGBeTO_detection_Model
23
 
24
+ This is LGBeTO model. Corresponding to a fine-tuned version of [dccuchile/bert-base-spanish-wwm-uncased](https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased)(Cañete et al., 2023).
25
  It achieves the following results on the evaluation set:
26
 
27
  - Accuracy: 0.835
 
29
  - Precision: 0.8205
30
  - Recall: 0.8889
31
 
32
+ ## Authors
33
+
34
+ - **Developed by:** Claudia Martínez-Araneda, Mariella Gutiérrez V., Pedro Gómez M., Diego Maldonado M., Alejandra Segura N., Christian Vidal-Castro
35
+ - **Model type:** BERT-based sentiment analysis, BERT-based text classification.
36
+ - **Language(s) (NLP):** Spanish
37
+ - **License:** CC BY 4.0
38
+ - **Finetuned from model:** BETO (Cañete et al., 2023)
39
 
40
  ## Model description
41
 
 
43
 
44
  ## Intended uses & limitations
45
 
46
+ This model was created for a study conducted strictly for academic and research purposes. The target of hate speech has been anonymised, and there is no intent to harm the perpetrators
47
+ in any way. We prioritise protecting the privacy and confidentiality of vulnerable individuals. We carefully remove identifying data, such as user IDs, phone numbers, and addresses, to safeguard privacy before
 
48
  sharing the data with our annotators. All data collected comes from public sources.
49
 
50
+ As authors, we affirm our deep respect for all individuals and explicitly state that we have no intention of prejudicing, biasing, or disrespecting the LGBTQIA+ community or any group. Our work seeks to contribute constructively to inclusive
 
51
  and ethical research in artificial intelligence.
52
 
53
 
54
  ## Training and evaluation data
55
 
56
+ LGBeTO was fine-tuned using comments collected from digital media, such as Twitter, Instagram, websites, and YouTube comments.
57
  The dataset is available in the Zenodo Repository.
58
 
59
  Cite as:
 
63
 
64
  ## Training procedure
65
 
66
+ - **step 1:** Load the dataSet
67
+ - **step 2:** Tokenization and model generation
68
+ - **step 3:** Split train-validation
69
+ - **step 4:** Training configuration
70
+ - **step 5:** Training/Evaluation
71
 
72
 
73
  ### Training hyperparameters
 
78
  - eval_batch_size: 16
79
  - seed: 42
80
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 
81
  - num_epochs: 3
82
 
83
  ### Training results
 
86
  |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
87
  | 0.4655 | 1.0 | 50 | 0.5517 | 0.755 | 0.7538 | 0.8242 | 0.6944 |
88
  | 0.1928 | 2.0 | 100 | 0.4830 | 0.825 | 0.8523 | 0.7829 | 0.9352 |
89
+ **| 0.0718 | 3.0 | 150 | 0.5393 | 0.835 | 0.8533 | 0.8205 | 0.8889 |**
90
 
91
 
92
  ### Framework versions