jean-paul commited on
Commit
4575f11
·
1 Parent(s): 9db6f6e

Updated README

Browse files
Files changed (1) hide show
  1. README.md +16 -1
README.md CHANGED
@@ -8,7 +8,7 @@ A Pretrained model on the Kinyarwanda language dataset using a masked language m
8
  #### Hyperparameters
9
  The model was trained with the default configuration of BERT and Trainer from the Huggingface. However, due to some resource computation issues, we kept the number of transformer layers to 6.
10
  # How to use:
11
- The model can be used directly with the pipeline for masked language modeling as follows:
12
  ```
13
  from transformers import pipeline
14
  the_mask_pipe = pipeline(
@@ -23,5 +23,20 @@ the_mask_pipe("Ejo ndikwiga nagize [MASK] baje kunsura.")
23
  {'sequence': 'ejo ndikwiga nagize inyota baje kunsura.', 'score': 0.07670339196920395, 'token': 8797, 'token_str': 'inyota'},
24
  {'sequence': 'ejo ndikwiga nagize amahirwe baje kunsura.', 'score': 0.07234629988670349, 'token': 1501, 'token_str': 'amahirwe'},
25
  {'sequence': 'ejo ndikwiga nagize abana baje kunsura.', 'score': 0.05717536434531212, 'token': 526, 'token_str': 'abana'}]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ```
27
  __Note__: We used the huggingface implementations for pretraining BERT from scratch, both the BERT model and the classes needed to do it.
 
8
  #### Hyperparameters
9
  The model was trained with the default configuration of BERT and Trainer from the Huggingface. However, due to some resource computation issues, we kept the number of transformer layers to 6.
10
  # How to use:
11
+ 1) The model can be used directly with the pipeline for masked language modeling as follows:
12
  ```
13
  from transformers import pipeline
14
  the_mask_pipe = pipeline(
 
23
  {'sequence': 'ejo ndikwiga nagize inyota baje kunsura.', 'score': 0.07670339196920395, 'token': 8797, 'token_str': 'inyota'},
24
  {'sequence': 'ejo ndikwiga nagize amahirwe baje kunsura.', 'score': 0.07234629988670349, 'token': 1501, 'token_str': 'amahirwe'},
25
  {'sequence': 'ejo ndikwiga nagize abana baje kunsura.', 'score': 0.05717536434531212, 'token': 526, 'token_str': 'abana'}]
26
+ ```
27
+
28
+ 2) Direct use from the transformer library to get features using AutoModel
29
+
30
+ ```
31
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
32
+
33
+ tokenizer = AutoTokenizer.from_pretrained("jean-paul/KinyaBERT-small")
34
+
35
+ model = AutoModelForMaskedLM.from_pretrained("jean-paul/KinyaBERT-small")
36
+
37
+ input_text = "Ejo ndikwiga nagize abashyitsi baje kunsura."
38
+ encoded_input = tokenizer(input_text, return_tensors='pt')
39
+ output = model(**encoded_input)
40
+
41
  ```
42
  __Note__: We used the huggingface implementations for pretraining BERT from scratch, both the BERT model and the classes needed to do it.