QCRI
/

Fanar-1-9B-Instruct

@@ -12,7 +12,9 @@ library_name: transformers
 **Fanar-1-9B-Instruct** is a powerful Arabic-English LLM developed by [Qatar Computing Research Institute (QCRI)](https://www.hbku.edu.qa/en/qcri) and [Hamad Bin Khalifa University (HBKU)](https://www.hbku.edu.qa/). It is the instruction-tuned version of [Fanar-1-9B](). Fanar continually pretrains the `google/gemma-2-9b` model on 1T Arabic and English tokens. Fanar pays particular attention to the richness of the Arabic language by supporting a diverse set of Arabic dialects including Modern Standard Arabic (MSA), Levantine, and Egyptian. Fanar, through meticulous curation of the pretraining and instruction-tuning data, is aligned with Arab cultural values.
-We have published a comprehensive [report](https://arxiv.org/pdf/2501.13944) with all the details regarding FANAR. We also provide an API to the model (request access [here](https://api.fanar.qa/request/en)).
 ---
@@ -21,6 +23,7 @@ We have published a comprehensive [report](https://arxiv.org/pdf/2501.13944) wit
 | Attribute                  | Value                              |
 |---------------------------|------------------------------------|
 | Developed by              | [QCRI](https://www.hbku.edu.qa/en/qcri) and [HBKU](https://www.hbku.edu.qa/)                      |
 | Model Type                | Autoregressive Transformer         |
 | Parameter Count           | 8.7 Billion                          |
 | Context Length            | 4096 Tokens                        |
@@ -32,6 +35,7 @@ We have published a comprehensive [report](https://arxiv.org/pdf/2501.13944) wit
 | DPO Preference Pairs      | 250K                               |
 | Languages                 | Arabic, English                    |
 | License                   | Apache 2.0                         |
 <!-- | Precision                 | bfloat16                           | -->
 ---
@@ -64,6 +68,7 @@ model_name = "QCRI/Fanar-1-9B-Instruct"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
 messages = [
     {"role": "user", "content": "ما هي عاصمة قطر؟"},
 ]

 **Fanar-1-9B-Instruct** is a powerful Arabic-English LLM developed by [Qatar Computing Research Institute (QCRI)](https://www.hbku.edu.qa/en/qcri) and [Hamad Bin Khalifa University (HBKU)](https://www.hbku.edu.qa/). It is the instruction-tuned version of [Fanar-1-9B](). Fanar continually pretrains the `google/gemma-2-9b` model on 1T Arabic and English tokens. Fanar pays particular attention to the richness of the Arabic language by supporting a diverse set of Arabic dialects including Modern Standard Arabic (MSA), Levantine, and Egyptian. Fanar, through meticulous curation of the pretraining and instruction-tuning data, is aligned with Arab cultural values.
+**Fanar-1-9B-Instruct** is a core component within the [Fanar GenAI platform](https://chat.fanar.qa/) that offers a suite of capabilities including image generation, video and image understanding, deep thinking, advanced text-to-speech (TTS) and automatic-speech-recognition (ASR), attribution and fact-checking, Islamic RAG, among several other features.
+We have published a comprehensive [report](https://arxiv.org/pdf/2501.13944) with all the details regarding FANAR. We also provide an API to the model and our GenAI platform (request access [here](https://api.fanar.qa/request/en)).
 ---
 | Attribute                  | Value                              |
 |---------------------------|------------------------------------|
 | Developed by              | [QCRI](https://www.hbku.edu.qa/en/qcri) and [HBKU](https://www.hbku.edu.qa/)                      |
+| Sponsored by              | [MCIT](https://www.mcit.gov.qa/en/)
 | Model Type                | Autoregressive Transformer         |
 | Parameter Count           | 8.7 Billion                          |
 | Context Length            | 4096 Tokens                        |
 | DPO Preference Pairs      | 250K                               |
 | Languages                 | Arabic, English                    |
 | License                   | Apache 2.0                         |
 <!-- | Precision                 | bfloat16                           | -->
 ---
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
+# message content may be in Arabic or English
 messages = [
     {"role": "user", "content": "ما هي عاصمة قطر؟"},
 ]