| # NepaliBERT(Phase 1) | |
| NEPALIBERT is a state-of-the-art language model for Nepali based on the BERT model. The model is trained using a masked language modeling (MLM). | |
| # Loading the model and tokenizer | |
| 1. clone the model repo | |
| ``` | |
| git lfs install | |
| git clone https://huggingface.co/Rajan/NepaliBERT | |
| ``` | |
| 2. Loading the Tokenizer | |
| ``` | |
| from transformers import BertTokenizer | |
| vocab_file_dir = './NepaliBERT/' | |
| tokenizer = BertTokenizer.from_pretrained(vocab_file_dir, | |
| strip_accents=False, | |
| clean_text=False ) | |
| ``` | |
| 3. Loading the model: | |
| ``` | |
| from transformers import BertForMaskedLM | |
| model = BertForMaskedLM.from_pretrained('./NepaliBERT') | |
| ``` | |
| The easiest way to check whether our language model is learning anything interesting is via the ```FillMaskPipeline```. | |
| Pipelines are simple wrappers around tokenizers and models, and the 'fill-mask' one will let you input a sequence containing a masked token (here, [mask]) and return a list of the most probable filled sequences, with their probabilities. | |
| ``` | |
| from transformers import pipeline | |
| fill_mask = pipeline( | |
| "fill-mask", | |
| model=model, | |
| tokenizer=tokenizer | |
| ) | |
| ``` | |
| For more info visit the [GITHUB🤗](https://github.com/R4j4n/NepaliBERT) |