Toimil commited on
Commit
3f79a1f
·
verified ·
1 Parent(s): 13eb6b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -3
README.md CHANGED
@@ -1,3 +1,44 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - es
5
+ base_model:
6
+ - pysentimiento/robertuito-base-uncased
7
+ datasets:
8
+ - manueltonneau/spanish-hate-speech-superset
9
+ tags:
10
+ - RoBERTuito
11
+ - hate_speech
12
+ - misogyny
13
+ - RoBERTa
14
+ pipeline_tag: fill-mask
15
+ library_name: transformers
16
+ widget:
17
+ - text: "Ella es una<mask>"
18
+ ---
19
+ # misoRoBERTuito-4e
20
+
21
+ misoRoBERTuito-4e is a domain adaptation of [RoBERTuito](https://huggingface.co/pysentimiento/robertuito-base-uncased) language model, specifically adapted to the misogyny domain.
22
+
23
+ It was adapted using a guided lexical masking strategy during masked language model (MLM) pretraining.
24
+ Instead of randomly masking tokens, we prioritized masking words appearing in a [misogyny-specific lexicon](https://github.com/fmplaza/hate-speech-spanish-lexicons/blob/master/misogyny_lexicon.txt).
25
+ The base corpus used for domain adaptation was the [Spanish Hate Speech Superset](https://huggingface.co/datasets/manueltonneau/spanish-hate-speech-superset).
26
+
27
+
28
+ For training the model we used a batch size of 8, with a learning rate of 2e-5. We trained the model for four epochs using a NVIDIA GeForce RTX 5090 GPU.
29
+
30
+ ## Usage
31
+
32
+ ```python
33
+ from transformers import pipeline
34
+ pipe = pipeline("fill-mask", model="citiusLTL/misoRoBERTuito-4e")
35
+ text = pipe("Ella es una<mask>")
36
+ print(text)
37
+ ```
38
+
39
+ ## Load model directly
40
+ ```python
41
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
42
+ tokenizer = AutoTokenizer.from_pretrained("citiusLTL/misoRoBERTuito-4e")
43
+ model = AutoModelForMaskedLM.from_pretrained("citiusLTL/misoRoBERTuito-4e")
44
+ ```