Instructions to use ViraIntelligentDataMining/AriaBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ViraIntelligentDataMining/AriaBERT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="ViraIntelligentDataMining/AriaBERT")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("ViraIntelligentDataMining/AriaBERT") model = AutoModelForMaskedLM.from_pretrained("ViraIntelligentDataMining/AriaBERT") - Notebooks
- Google Colab
- Kaggle
Commit ·
8d33702
1
Parent(s): e9175c6
Update README.md
Browse files
README.md
CHANGED
|
@@ -13,6 +13,8 @@ tags:
|
|
| 13 |
## Introduction
|
| 14 |
AriaBERT represents a breakthrough in natural language processing (NLP) for the Persian language. Developed to address the critical gap in efficient pretrained language models for Persian, AriaBERT is tailored to elevate the standards of Persian language tasks.
|
| 15 |
|
|
|
|
|
|
|
| 16 |
## Key Features
|
| 17 |
- **Diverse Training Data:** AriaBERT has been trained on over 32 gigabytes of varied Persian textual data, spanning conversational, formal, and hybrid texts. This includes a rich mix of tweets, news articles, poems, medical and encyclopedia texts, user opinions, and more.
|
| 18 |
- **RoBERTa Architecture:** Leveraging the robustness of the RoBERTa architecture and the precision of Byte-Pair Encoding tokenizer, AriaBERT stands apart from traditional BERT-based models.
|
|
|
|
| 13 |
## Introduction
|
| 14 |
AriaBERT represents a breakthrough in natural language processing (NLP) for the Persian language. Developed to address the critical gap in efficient pretrained language models for Persian, AriaBERT is tailored to elevate the standards of Persian language tasks.
|
| 15 |
|
| 16 |
+
## Paper: https://www.researchsquare.com/article/rs-3558473/v1
|
| 17 |
+
|
| 18 |
## Key Features
|
| 19 |
- **Diverse Training Data:** AriaBERT has been trained on over 32 gigabytes of varied Persian textual data, spanning conversational, formal, and hybrid texts. This includes a rich mix of tweets, news articles, poems, medical and encyclopedia texts, user opinions, and more.
|
| 20 |
- **RoBERTa Architecture:** Leveraging the robustness of the RoBERTa architecture and the precision of Byte-Pair Encoding tokenizer, AriaBERT stands apart from traditional BERT-based models.
|