Updated README
Browse files
README.md
CHANGED
|
@@ -8,7 +8,7 @@ A Pretrained model on the Kinyarwanda language dataset using a masked language m
|
|
| 8 |
#### Hyperparameters
|
| 9 |
The model was trained with the default configuration of BERT and Trainer from the Huggingface. However, due to some resource computation issues, we kept the number of transformer layers to 6.
|
| 10 |
# How to use:
|
| 11 |
-
The model can be used directly with the pipeline for masked language modeling as follows:
|
| 12 |
```
|
| 13 |
from transformers import pipeline
|
| 14 |
the_mask_pipe = pipeline(
|
|
@@ -23,5 +23,20 @@ the_mask_pipe("Ejo ndikwiga nagize [MASK] baje kunsura.")
|
|
| 23 |
{'sequence': 'ejo ndikwiga nagize inyota baje kunsura.', 'score': 0.07670339196920395, 'token': 8797, 'token_str': 'inyota'},
|
| 24 |
{'sequence': 'ejo ndikwiga nagize amahirwe baje kunsura.', 'score': 0.07234629988670349, 'token': 1501, 'token_str': 'amahirwe'},
|
| 25 |
{'sequence': 'ejo ndikwiga nagize abana baje kunsura.', 'score': 0.05717536434531212, 'token': 526, 'token_str': 'abana'}]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
```
|
| 27 |
__Note__: We used the huggingface implementations for pretraining BERT from scratch, both the BERT model and the classes needed to do it.
|
|
|
|
| 8 |
#### Hyperparameters
|
| 9 |
The model was trained with the default configuration of BERT and Trainer from the Huggingface. However, due to some resource computation issues, we kept the number of transformer layers to 6.
|
| 10 |
# How to use:
|
| 11 |
+
1) The model can be used directly with the pipeline for masked language modeling as follows:
|
| 12 |
```
|
| 13 |
from transformers import pipeline
|
| 14 |
the_mask_pipe = pipeline(
|
|
|
|
| 23 |
{'sequence': 'ejo ndikwiga nagize inyota baje kunsura.', 'score': 0.07670339196920395, 'token': 8797, 'token_str': 'inyota'},
|
| 24 |
{'sequence': 'ejo ndikwiga nagize amahirwe baje kunsura.', 'score': 0.07234629988670349, 'token': 1501, 'token_str': 'amahirwe'},
|
| 25 |
{'sequence': 'ejo ndikwiga nagize abana baje kunsura.', 'score': 0.05717536434531212, 'token': 526, 'token_str': 'abana'}]
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
2) Direct use from the transformer library to get features using AutoModel
|
| 29 |
+
|
| 30 |
+
```
|
| 31 |
+
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
| 32 |
+
|
| 33 |
+
tokenizer = AutoTokenizer.from_pretrained("jean-paul/KinyaBERT-small")
|
| 34 |
+
|
| 35 |
+
model = AutoModelForMaskedLM.from_pretrained("jean-paul/KinyaBERT-small")
|
| 36 |
+
|
| 37 |
+
input_text = "Ejo ndikwiga nagize abashyitsi baje kunsura."
|
| 38 |
+
encoded_input = tokenizer(input_text, return_tensors='pt')
|
| 39 |
+
output = model(**encoded_input)
|
| 40 |
+
|
| 41 |
```
|
| 42 |
__Note__: We used the huggingface implementations for pretraining BERT from scratch, both the BERT model and the classes needed to do it.
|