jean-paul
/

KinyaBERT-small

Model card Files Files and versions

jean-paul commited on Aug 29, 2021

Commit

4575f11

·

1 Parent(s): 9db6f6e

Updated README

Files changed (1) hide show

README.md +16 -1

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ A Pretrained model on the Kinyarwanda language dataset using a masked language m
  #### Hyperparameters
  The model was trained with the default configuration of BERT and Trainer from the Huggingface. However, due to some resource computation issues, we kept the number of transformer layers to 6.
 # How to use:
-The model can be used directly with the pipeline for masked language modeling as follows:
 ```
 from transformers import pipeline
 the_mask_pipe = pipeline(
@@ -23,5 +23,20 @@ the_mask_pipe("Ejo ndikwiga nagize [MASK] baje kunsura.")
 {'sequence': 'ejo ndikwiga nagize inyota baje kunsura.', 'score': 0.07670339196920395, 'token': 8797, 'token_str': 'inyota'},
 {'sequence': 'ejo ndikwiga nagize amahirwe baje kunsura.', 'score': 0.07234629988670349, 'token': 1501, 'token_str': 'amahirwe'},
 {'sequence': 'ejo ndikwiga nagize abana baje kunsura.', 'score': 0.05717536434531212, 'token': 526, 'token_str': 'abana'}]
 ```
 __Note__: We used the huggingface implementations for pretraining BERT from scratch, both the BERT model and the classes needed to do it.

  #### Hyperparameters
  The model was trained with the default configuration of BERT and Trainer from the Huggingface. However, due to some resource computation issues, we kept the number of transformer layers to 6.
 # How to use:
+1) The model can be used directly with the pipeline for masked language modeling as follows:
 ```
 from transformers import pipeline
 the_mask_pipe = pipeline(
 {'sequence': 'ejo ndikwiga nagize inyota baje kunsura.', 'score': 0.07670339196920395, 'token': 8797, 'token_str': 'inyota'},
 {'sequence': 'ejo ndikwiga nagize amahirwe baje kunsura.', 'score': 0.07234629988670349, 'token': 1501, 'token_str': 'amahirwe'},
 {'sequence': 'ejo ndikwiga nagize abana baje kunsura.', 'score': 0.05717536434531212, 'token': 526, 'token_str': 'abana'}]
+```
+2) Direct use from the transformer library to get features using AutoModel
+```
+from transformers import AutoTokenizer, AutoModelForMaskedLM
+tokenizer = AutoTokenizer.from_pretrained("jean-paul/KinyaBERT-small")
+model = AutoModelForMaskedLM.from_pretrained("jean-paul/KinyaBERT-small")
+input_text = "Ejo ndikwiga nagize abashyitsi baje kunsura."
+encoded_input = tokenizer(input_text, return_tensors='pt')
+output = model(**encoded_input)
 ```
 __Note__: We used the huggingface implementations for pretraining BERT from scratch, both the BERT model and the classes needed to do it.