QCRI
/

Fanar-1-9B-Instruct

@@ -10,11 +10,14 @@ library_name: transformers
 base_model:
 - QCRI/Fanar-1-9B
 ---
 # Fanar-1-9B-Instruct
-**Fanar-1-9B-Instruct** is a powerful Arabic-English LLM developed by [Qatar Computing Research Institute (QCRI)](https://www.hbku.edu.qa/en/qcri) at [Hamad Bin Khalifa University (HBKU)](https://www.hbku.edu.qa/), a member of Qatar Foundation for Education, Science, and Community Development. It is the instruction-tuned version of [Fanar-1-9B](https://huggingface.co/QCRI/Fanar-1-9B). We continually pretrains the `google/gemma-2-9b` model on 1T Arabic and English tokens. We pay particular attention to the richness of the Arabic language by supporting Modern Standard Arabic (MSA) and a diverse set of Arabic dialects including Levantine and Egyptian. Fanar models, through meticulous curation of the pretraining and instruction-tuning data, are aligned with Islamic values and Arab cultures.
-**Fanar-1-9B-Instruct** is a core component within the [Fanar GenAI platform](https://fanar.qa/) that offers a suite of capabilities including image generation, video and image understanding, deep thinking, advanced text-to-speech (TTS) and automatic-speech-recognition (ASR), attribution and fact-checking, Islamic RAG, among several other features.
 We have published a comprehensive [report](https://arxiv.org/pdf/2501.13944) with all the details regarding our Fanar GenAI platform. We also provide an API to our models and the GenAI platform (request access [here](https://api.fanar.qa/request/en)).
@@ -25,7 +28,7 @@ We have published a comprehensive [report](https://arxiv.org/pdf/2501.13944) wit
 | Attribute                  | Value                              |
 |---------------------------|------------------------------------|
 | Developed by              | [QCRI](https://www.hbku.edu.qa/en/qcri) at [HBKU](https://www.hbku.edu.qa/)                      |
-| Sponsored by              | [Minisitry of Communications and Technology, State of Qatar](https://www.mcit.gov.qa/en/)
 | Model Type                | Autoregressive Transformer         |
 | Parameter Count           | 8.7 Billion                          |
 | Context Length            | 4096 Tokens                        |
@@ -45,7 +48,7 @@ We have published a comprehensive [report](https://arxiv.org/pdf/2501.13944) wit
 ## Model Training
 #### Pretraining
-Fanar-1-9B-Instruct was continually pretrained on 1T tokens, with a balanced focus on Arabic and English: ~515B English tokens from a carefully curated subset of the [Dolma](https://huggingface.co/datasets/allenai/dolma) dataset, 410B Arabic tokens that we collected, parsed, and flitered from a variety of sources, 102B code tokens curated from [The Stack](https://github.com/bigcode-project/the-stack-v2) dataset. Our codebase used the [LitGPT](https://github.com/Lightning-AI/litgpt) framework.
 #### Post-training
 Fanar-1-9B-Instruct underwent a two-phase post-training pipeline:
@@ -108,10 +111,10 @@ Fanar-1-9B-Instruct is built for:
 - Conversational agents (Arabic only or bilingual)
 - Cultural and dialectal question answering in Arabic
-- Educational, governmental, and civic NLP applications focussed on the Arab world or Arabic-speaking audiences
 - Research on Arabic natural language generation and understanding
-Fanar-1-9B-Instruct can be deployed as part of a broader AI system. Developers are encouraged to implement proper safeguards to ensure culturally respectful, accurate, and safe deployment. It should not be used to generate or spread **harmful, illegal, or misleading content**
 A version of this model can be accessed through [Fanar Chat](chat.fanar.qa). We are continuously improving the Fanar’s models and capabilities, and answers can differ from what you get from Fanar-1-9B-Instruct.
@@ -119,7 +122,7 @@ A version of this model can be accessed through [Fanar Chat](chat.fanar.qa). We
 ## Ethical Considerations & Limitations
-Fanar-1-9B-Instruct is capable of generating fluent and contextually appropriate responses, but as with any generative model there are uncertainities. The model may produce **biased, offensive, or incorrect outputs**. The model is **not suitable for high-stakes decision making** (e.g., legal, medical, or financial advice). Though we have extensively tested Fanar-1-9B-Instruct and attempted to mitigate these issues, we cannot redress every possible scenario. Thus, we advise developers to implement safety checks and perform domain-specific fine-tuning for sensitive use cases. Kindly refer to our [Terms of Service]( https://chat.fanar.qa/terms-of-service) and [Privacy Policy](https://chat.fanar.qa/privacy-policy).
 The output generated by this model is not considered a statement of QCRI, HBKU, Qatar Foundation, MCIT or any other organization or individual.
@@ -167,7 +170,7 @@ If you use Fanar-1-9B-Instruct or the Fanar GenAI system in your research or app
 ## Acknowledgements
 This project is from [Qatar Computing Research Institute (QCRI)](https://qcri.org) at [Hamad Bin Khalifa University (HBKU)](https://hbku.edu.qa), a member of Qatar Foundation. We thank our engineers, researchers, and support team for their efforts in advancing Arabic-centric large language models.
-Special thanks to the [Minisitry of Communications and Technology, State of Qatar](https://www.mcit.gov.qa/en/) for their continued support by providing the compute infrastructure through the Google Cloud Platform.
 ---

 base_model:
 - QCRI/Fanar-1-9B
 ---
+![Model Logo](./fanar_logo.jpg)
 # Fanar-1-9B-Instruct
+**Fanar-1-9B-Instruct** is a powerful Arabic-English LLM developed by [Qatar Computing Research Institute (QCRI)](https://www.hbku.edu.qa/en/qcri) at [Hamad Bin Khalifa University (HBKU)](https://www.hbku.edu.qa/), a member of Qatar Foundation for Education, Science, and Community Development. It is the instruction-tuned version of [Fanar-1-9B](https://huggingface.co/QCRI/Fanar-1-9B). We continually pretrain the `google/gemma-2-9b` model on 1T Arabic and English tokens. We pay particular attention to the richness of the Arabic language by supporting Modern Standard Arabic (MSA) and a diverse set of Arabic dialects, including Gulf, Levantine, and Egyptian. Fanar models, through meticulous curation of the pretraining and instruction-tuning data, are aligned with Islamic values and Arab cultures.
+**Fanar-1-9B-Instruct** is a core component of the [Fanar GenAI platform](https://fanar.qa/) that offers a suite of capabilities including image generation, video and image understanding, deep thinking, advanced text-to-speech (TTS) and automatic-speech-recognition (ASR), attribution and fact-checking, Islamic RAG, among several other features.
 We have published a comprehensive [report](https://arxiv.org/pdf/2501.13944) with all the details regarding our Fanar GenAI platform. We also provide an API to our models and the GenAI platform (request access [here](https://api.fanar.qa/request/en)).
 | Attribute                  | Value                              |
 |---------------------------|------------------------------------|
 | Developed by              | [QCRI](https://www.hbku.edu.qa/en/qcri) at [HBKU](https://www.hbku.edu.qa/)                      |
+| Sponsored by              | [Ministry of Communications and Information Technology, State of Qatar](https://www.mcit.gov.qa/en/)
 | Model Type                | Autoregressive Transformer         |
 | Parameter Count           | 8.7 Billion                          |
 | Context Length            | 4096 Tokens                        |
 ## Model Training
 #### Pretraining
+Fanar-1-9B-Instruct was continually pretrained on 1T tokens, with a balanced focus on Arabic and English: ~515B English tokens from a carefully curated subset of the [Dolma](https://huggingface.co/datasets/allenai/dolma) dataset, 410B Arabic tokens that we collected, parsed, and filtered from a variety of sources, and 102B code tokens curated from [The Stack](https://github.com/bigcode-project/the-stack-v2) dataset. Our codebase used the [LitGPT](https://github.com/Lightning-AI/litgpt) framework.
 #### Post-training
 Fanar-1-9B-Instruct underwent a two-phase post-training pipeline:
 - Conversational agents (Arabic only or bilingual)
 - Cultural and dialectal question answering in Arabic
+- Educational, governmental, and civic NLP applications focused on the Arab world or Arabic-speaking audiences
 - Research on Arabic natural language generation and understanding
+Fanar-1-9B-Instruct can be deployed as part of a broader AI system. Developers are encouraged to implement proper safeguards to ensure culturally respectful, accurate, and safe deployment. It should not be used to generate or spread **harmful, illegal, or misleading content.**
 A version of this model can be accessed through [Fanar Chat](chat.fanar.qa). We are continuously improving the Fanar’s models and capabilities, and answers can differ from what you get from Fanar-1-9B-Instruct.
 ## Ethical Considerations & Limitations
+Fanar-1-9B-Instruct is capable of generating fluent and contextually appropriate responses. However, as with any generative model there are uncertainities. The model may produce **biased, offensive, or incorrect outputs**. The model is **not suitable for high-stakes decision-making** (e.g., legal, medical, or financial advice). Though we have extensively tested Fanar-1-9B-Instruct and attempted to mitigate these issues, we cannot redress every possible scenario. Thus, we advise developers to implement safety checks and perform domain-specific fine-tuning for sensitive use cases. Kindly refer to our [Terms of Service]( https://chat.fanar.qa/terms-of-service) and [Privacy Policy](https://chat.fanar.qa/privacy-policy).
 The output generated by this model is not considered a statement of QCRI, HBKU, Qatar Foundation, MCIT or any other organization or individual.
 ## Acknowledgements
 This project is from [Qatar Computing Research Institute (QCRI)](https://qcri.org) at [Hamad Bin Khalifa University (HBKU)](https://hbku.edu.qa), a member of Qatar Foundation. We thank our engineers, researchers, and support team for their efforts in advancing Arabic-centric large language models.
+Special thanks to the [Ministry of Communications and Information Technology, State of Qatar](https://www.mcit.gov.qa/en/) for their continued support by providing the compute infrastructure through the Google Cloud Platform.
 ---