| | --- |
| | license: apache-2.0 |
| | language: fa |
| | widget: |
| | - text: "از هر دستی بگیری از همون [MASK] میدی" |
| | - text: "این آخرین باره بهت [MASK] میگم" |
| | - text: 'چرا آن جوان بیچاره را به سخره [MASK]' |
| | - text: 'آخه محسن [MASK] هم شد خواننده؟' |
| | - text: 'پسر عجب [MASK] زد' |
| | tags: |
| | - bert-fa |
| | - bert-persian |
| | model-index: |
| | - name: dal-bert |
| | results: [] |
| | --- |
| | |
| |
|
| | DAL-BERT: Another pre-trained language model for Persian |
| | --- |
| |
|
| | DAL-BERT is a transformer-based model trained on more than 80 gigabytes of Persian text including both formal and informal (conversational) contexts. The architecture of this model follows the original BERT [[Devlin et al.](https://arxiv.org/abs/1810.04805)]. |
| |
|
| | How to use the Model |
| | --- |
| | ```python |
| | from transformers import BertForMaskedLM, BertTokenizer, pipeline |
| | model = BertForMaskedLM.from_pretrained('sharif-dal/dal-bert') |
| | tokenizer = BertTokenizer.from_pretrained('sharif-dal/dal-bert') |
| | fill_sentence = pipeline('fill-mask', model=model, tokenizer=tokenizer) |
| | fill_sentence('اینجا جمله مورد نظر خود را بنویسید و کلمه موردنظر را [MASK] کنید') |
| | ``` |
| |
|
| | The Training Data |
| | --- |
| | The abovementioned model was trained on a bunch of newspapers, news agencies' websites, technology-related sources, people's comments, magazines, literary criticism, and some blogs. |
| |
|
| | Evaluation |
| | --- |
| |
|
| | | Training Loss | Epoch | Step | |
| | |:-------------:|:-----:|:-----:| |
| | | 2.1855 | 13 | 7649486 | |
| |
|
| | Contributors |
| | --- |
| | - Arman Malekzadeh [[Github](https://github.com/arm-on)] |
| | - Amirhossein Ramazani, Master's Student in AI @ Sharif University of Technology [[Linkedin](https://www.linkedin.com/in/amirhossein-ramazani/)] [[Github](https://github.com/amirhossein1376)] |
| |
|