QCRI
/

Fanar-1-9B-Instruct

Text Generation

text-generation-inference

Model card Files Files and versions

shamz15531 commited on Jun 4, 2025

Commit

e7018d4

·

verified ·

1 Parent(s): 595dc6b

Update README.md

Files changed (1) hide show

README.md +22 -3

README.md CHANGED Viewed

@@ -45,10 +45,10 @@ We have published a comprehensive [report](https://arxiv.org/pdf/2501.13944) wit
 ## Model Training
 #### Pretraining
-Fanar was continually pretrained on 1T tokens, with a balanced focus on Arabic and English: ~515B English tokens from a carefully curated subset of the [Dolma](https://huggingface.co/datasets/allenai/dolma) dataset, 410B Arabic tokens that we collected, parsed, and flitered from a variety of sources, 102B code tokens curated from [The Stack](https://github.com/bigcode-project/the-stack-v2) dataset. Our codebase used the [LitGPT](https://github.com/Lightning-AI/litgpt) framework.
 #### Post-training
-Fanar underwent a two-phase post-training pipeline:
 | Phase | Size |
 |-------|------|
@@ -60,7 +60,7 @@ Fanar underwent a two-phase post-training pipeline:
 ## Getting Started
-Fanar is compatible with the Hugging Face `transformers` library (≥ v4.40.0). Here's how to load and use the model:
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
@@ -81,6 +81,25 @@ outputs = model.generate(**tokenizer(inputs, return_tensors="pt", return_token_t
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
 ---
 ## Intended Use

 ## Model Training
 #### Pretraining
+Fanar-1-9B-Instruct was continually pretrained on 1T tokens, with a balanced focus on Arabic and English: ~515B English tokens from a carefully curated subset of the [Dolma](https://huggingface.co/datasets/allenai/dolma) dataset, 410B Arabic tokens that we collected, parsed, and flitered from a variety of sources, 102B code tokens curated from [The Stack](https://github.com/bigcode-project/the-stack-v2) dataset. Our codebase used the [LitGPT](https://github.com/Lightning-AI/litgpt) framework.
 #### Post-training
+Fanar-1-9B-Instruct underwent a two-phase post-training pipeline:
 | Phase | Size |
 |-------|------|
 ## Getting Started
+Fanar-1-9B-Instruct is compatible with the Hugging Face `transformers` library (≥ v4.40.0). Here's how to load and use the model:
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
+Inference using VLLM is also supported:
+```python
+from vllm import LLM, SamplingParams
+model_name = "QCRI/Fanar-1-9B-Instruct"
+llm = LLM(model=model_name)
+sampling_params = SamplingParams(temperature=0.7, max_tokens=256)
+# message content may be in Arabic or English
+messages = [
+    {"role": "user", "content": "ما هي عاصمة قطر؟"},
+]
+outputs = llm.chat(messages, sampling_params)
+print(outputs[0].outputs[0].text)
+```
 ---
 ## Intended Use