Update README.md
Browse files
README.md
CHANGED
|
@@ -43,11 +43,11 @@ We have published a comprehensive [report](https://arxiv.org/pdf/2501.13944) wit
|
|
| 43 |
## Model Training
|
| 44 |
|
| 45 |
#### Pretraining
|
| 46 |
-
Fanar was continually pretrained on 1T tokens, with a balanced focus on Arabic and English: ~515B English tokens from a carefully curated subset of the [Dolma](https://huggingface.co/datasets/allenai/dolma) dataset, 410B Arabic tokens that we collected, parsed, and flitered from a variety of sources, 102B code tokens curated from [The Stack](https://github.com/bigcode-project/the-stack-v2) dataset. Our codebase used the [LitGPT](https://github.com/Lightning-AI/litgpt) framework.
|
| 47 |
|
| 48 |
## Getting Started
|
| 49 |
|
| 50 |
-
Fanar is compatible with the Hugging Face `transformers` library (≥ v4.40.0). Here's how to load and use the model:
|
| 51 |
|
| 52 |
```python
|
| 53 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
|
|
| 43 |
## Model Training
|
| 44 |
|
| 45 |
#### Pretraining
|
| 46 |
+
Fanar-1-9B was continually pretrained on 1T tokens, with a balanced focus on Arabic and English: ~515B English tokens from a carefully curated subset of the [Dolma](https://huggingface.co/datasets/allenai/dolma) dataset, 410B Arabic tokens that we collected, parsed, and flitered from a variety of sources, 102B code tokens curated from [The Stack](https://github.com/bigcode-project/the-stack-v2) dataset. Our codebase used the [LitGPT](https://github.com/Lightning-AI/litgpt) framework.
|
| 47 |
|
| 48 |
## Getting Started
|
| 49 |
|
| 50 |
+
Fanar-1-9B is compatible with the Hugging Face `transformers` library (≥ v4.40.0). Here's how to load and use the model:
|
| 51 |
|
| 52 |
```python
|
| 53 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|