Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,9 @@ library_name: transformers
|
|
| 12 |
|
| 13 |
**Fanar-1-9B-Instruct** is a powerful Arabic-English LLM developed by [Qatar Computing Research Institute (QCRI)](https://www.hbku.edu.qa/en/qcri) and [Hamad Bin Khalifa University (HBKU)](https://www.hbku.edu.qa/). It is the instruction-tuned version of [Fanar-1-9B](). Fanar continually pretrains the `google/gemma-2-9b` model on 1T Arabic and English tokens. Fanar pays particular attention to the richness of the Arabic language by supporting a diverse set of Arabic dialects including Modern Standard Arabic (MSA), Levantine, and Egyptian. Fanar, through meticulous curation of the pretraining and instruction-tuning data, is aligned with Arab cultural values.
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
| 16 |
|
| 17 |
---
|
| 18 |
|
|
@@ -21,6 +23,7 @@ We have published a comprehensive [report](https://arxiv.org/pdf/2501.13944) wit
|
|
| 21 |
| Attribute | Value |
|
| 22 |
|---------------------------|------------------------------------|
|
| 23 |
| Developed by | [QCRI](https://www.hbku.edu.qa/en/qcri) and [HBKU](https://www.hbku.edu.qa/) |
|
|
|
|
| 24 |
| Model Type | Autoregressive Transformer |
|
| 25 |
| Parameter Count | 8.7 Billion |
|
| 26 |
| Context Length | 4096 Tokens |
|
|
@@ -32,6 +35,7 @@ We have published a comprehensive [report](https://arxiv.org/pdf/2501.13944) wit
|
|
| 32 |
| DPO Preference Pairs | 250K |
|
| 33 |
| Languages | Arabic, English |
|
| 34 |
| License | Apache 2.0 |
|
|
|
|
| 35 |
<!-- | Precision | bfloat16 | -->
|
| 36 |
|
| 37 |
---
|
|
@@ -64,6 +68,7 @@ model_name = "QCRI/Fanar-1-9B-Instruct"
|
|
| 64 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 65 |
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
|
| 66 |
|
|
|
|
| 67 |
messages = [
|
| 68 |
{"role": "user", "content": "ما هي عاصمة قطر؟"},
|
| 69 |
]
|
|
|
|
| 12 |
|
| 13 |
**Fanar-1-9B-Instruct** is a powerful Arabic-English LLM developed by [Qatar Computing Research Institute (QCRI)](https://www.hbku.edu.qa/en/qcri) and [Hamad Bin Khalifa University (HBKU)](https://www.hbku.edu.qa/). It is the instruction-tuned version of [Fanar-1-9B](). Fanar continually pretrains the `google/gemma-2-9b` model on 1T Arabic and English tokens. Fanar pays particular attention to the richness of the Arabic language by supporting a diverse set of Arabic dialects including Modern Standard Arabic (MSA), Levantine, and Egyptian. Fanar, through meticulous curation of the pretraining and instruction-tuning data, is aligned with Arab cultural values.
|
| 14 |
|
| 15 |
+
**Fanar-1-9B-Instruct** is a core component within the [Fanar GenAI platform](https://chat.fanar.qa/) that offers a suite of capabilities including image generation, video and image understanding, deep thinking, advanced text-to-speech (TTS) and automatic-speech-recognition (ASR), attribution and fact-checking, Islamic RAG, among several other features.
|
| 16 |
+
|
| 17 |
+
We have published a comprehensive [report](https://arxiv.org/pdf/2501.13944) with all the details regarding FANAR. We also provide an API to the model and our GenAI platform (request access [here](https://api.fanar.qa/request/en)).
|
| 18 |
|
| 19 |
---
|
| 20 |
|
|
|
|
| 23 |
| Attribute | Value |
|
| 24 |
|---------------------------|------------------------------------|
|
| 25 |
| Developed by | [QCRI](https://www.hbku.edu.qa/en/qcri) and [HBKU](https://www.hbku.edu.qa/) |
|
| 26 |
+
| Sponsored by | [MCIT](https://www.mcit.gov.qa/en/)
|
| 27 |
| Model Type | Autoregressive Transformer |
|
| 28 |
| Parameter Count | 8.7 Billion |
|
| 29 |
| Context Length | 4096 Tokens |
|
|
|
|
| 35 |
| DPO Preference Pairs | 250K |
|
| 36 |
| Languages | Arabic, English |
|
| 37 |
| License | Apache 2.0 |
|
| 38 |
+
|
| 39 |
<!-- | Precision | bfloat16 | -->
|
| 40 |
|
| 41 |
---
|
|
|
|
| 68 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 69 |
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
|
| 70 |
|
| 71 |
+
# message content may be in Arabic or English
|
| 72 |
messages = [
|
| 73 |
{"role": "user", "content": "ما هي عاصمة قطر؟"},
|
| 74 |
]
|