Update README.md
Browse files
README.md
CHANGED
|
@@ -7,7 +7,6 @@ tags:
|
|
| 7 |
- Bangla Base Bert
|
| 8 |
- Bangla Bert language model
|
| 9 |
- Bangla Bert
|
| 10 |
-
license: MIT
|
| 11 |
datasets:
|
| 12 |
- BanglaLM dataset
|
| 13 |
---
|
|
@@ -16,10 +15,11 @@ Here we published a pretrained Bangla bert language model as **bert-base-bangla*
|
|
| 16 |
Here we described [bert-base-bangla](https://github.com/Kowsher/bert-base-bangla) which is a pretrained Bangla language model based on mask language modeling described in [BERT](https://arxiv.org/abs/1810.04805) and the GitHub [repository](https://github.com/google-research/bert)
|
| 17 |
## Corpus Details
|
| 18 |
We trained the Bangla bert language model using BanglaLM dataset from kaggle [BanglaLM](https://www.kaggle.com/gakowsher/bangla-language-model-dataset). There is 3 version of dataset which is almost 40GB.
|
| 19 |
-
After downloading the dataset, we went on the way
|
| 20 |
-
|
| 21 |
|
| 22 |
**Bangla Base BERT Tokenizer**
|
|
|
|
| 23 |
```py
|
| 24 |
from transformers import AutoTokenizer, AutoModel
|
| 25 |
bnbert_tokenizer = AutoTokenizer.from_pretrained("Kowsher/bert-base-test")
|
|
|
|
| 7 |
- Bangla Base Bert
|
| 8 |
- Bangla Bert language model
|
| 9 |
- Bangla Bert
|
|
|
|
| 10 |
datasets:
|
| 11 |
- BanglaLM dataset
|
| 12 |
---
|
|
|
|
| 15 |
Here we described [bert-base-bangla](https://github.com/Kowsher/bert-base-bangla) which is a pretrained Bangla language model based on mask language modeling described in [BERT](https://arxiv.org/abs/1810.04805) and the GitHub [repository](https://github.com/google-research/bert)
|
| 16 |
## Corpus Details
|
| 17 |
We trained the Bangla bert language model using BanglaLM dataset from kaggle [BanglaLM](https://www.kaggle.com/gakowsher/bangla-language-model-dataset). There is 3 version of dataset which is almost 40GB.
|
| 18 |
+
After downloading the dataset, we went on the way to mask LM.
|
| 19 |
+
|
| 20 |
|
| 21 |
**Bangla Base BERT Tokenizer**
|
| 22 |
+
|
| 23 |
```py
|
| 24 |
from transformers import AutoTokenizer, AutoModel
|
| 25 |
bnbert_tokenizer = AutoTokenizer.from_pretrained("Kowsher/bert-base-test")
|