fdalvi commited on
Commit
5b4d0b7
·
verified ·
1 Parent(s): 8f2bfc0

Update model card

Browse files
Files changed (1) hide show
  1. README.md +463 -3
README.md CHANGED
@@ -1,3 +1,463 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - ar
5
+ - en
6
+ pipeline_tag: image-text-to-text
7
+ tags:
8
+ - pytorch
9
+ - vision-language
10
+ - multimodal
11
+ - cultural-understanding
12
+ library_name: transformers
13
+ base_model: Qwen/Qwen2.5-VL-7B-Instruct
14
+ ---
15
+
16
+ <p align="center">
17
+ <img src="./assets/fanar_logo.jpg" width="200"/>
18
+ </p>
19
+
20
+ # Fanar-2-Oryx-IVU (Image & Video Understanding)
21
+
22
+ **Fanar-2-Oryx-IVU** is an Arabic-first vision-language model for culturally-aware image and video understanding, developed by [Qatar Computing Research Institute (QCRI)](https://www.hbku.edu.qa/en/qcri) at [Hamad Bin Khalifa University (HBKU)](https://www.hbku.edu.qa/), a member of Qatar Foundation for Education, Science, and Community Development. It is part of the **Fanar 2.0 release**, a comprehensive Arabic-centric multimodal generative AI platform that also includes [text generation](https://huggingface.co/QCRI/Fanar-2-27B-Instruct), [image generation](https://huggingface.co/QCRI/Fanar-2-Oryx-IG) and [poetry generation](https://huggingface.co/QCRI/Fanar-2-Diwan).
23
+
24
+ Fanar-2-Oryx-IVU specializes in understanding images and videos with strong Arabic language support, cultural awareness, and Arabic calligraphy recognition. Trained on **62M bilingual examples** (50/50 Arabic/English), the model outperforms its base model (Qwen2.5-VL-7B) on culturally-relevant content while achieving **70% user satisfaction** and significantly reduced code-switching in Arabic responses.
25
+
26
+ We have published a [report](https://arxiv.org/abs/2603.16397) with all the details regarding Fanar 2.0 GenAI platform. We also provide a [chat interface](https://chat.fanar.qa), mobile apps for [iOS](https://apps.apple.com/jo/app/fanar-فنار/id6741857943) and [Android](https://play.google.com/store/apps/details?id=com.fanarmobile), and [API access](https://api.fanar.qa/docs) to our models and the GenAI platform (request access [here](https://api.fanar.qa/request/en)).
27
+
28
+ ---
29
+
30
+ ## Model Details
31
+
32
+ | Attribute | Value |
33
+ |---------------------------|------------------------------------|
34
+ | Developed by | [QCRI](https://www.hbku.edu.qa/en/qcri) at [HBKU](https://www.hbku.edu.qa/) |
35
+ | Sponsored by | [Ministry of Communications and Information Technology, State of Qatar](https://www.mcit.gov.qa/en/)
36
+ | Model Type | Vision-Language Model (VLM) |
37
+ | Base Model | Qwen2.5-VL-Instruct (7B) |
38
+ | Parameter Count | 7 Billion |
39
+ | Architecture | Dynamic-resolution ViT + LLM |
40
+ | Fine-tuning Method | LoRA (rank 128) on attention layers |
41
+ | Vision Encoder | Frozen during training |
42
+ | Input Modalities | Images, Videos, Text |
43
+ | Output | Text (Arabic/English) |
44
+ | Training Framework | LLaMAFactory |
45
+ | Training Data | 62M multimodal examples |
46
+ | Languages | Arabic, English |
47
+ | License | Apache 2.0 |
48
+
49
+ ---
50
+
51
+ ## Model Training
52
+
53
+ ### Training Data (62M Examples)
54
+
55
+ Fanar-2-Oryx-IVU was trained on a comprehensive multimodal dataset with **balanced Arabic-English representation** (approximately 50/50):
56
+
57
+ #### 1. Cultural Content (24M VQA pairs)
58
+ - **240K internally collected images** from taxonomy-driven crawling
59
+ - Coverage: 22 Arab countries across cultural categories
60
+ - **Dense supervision**: Up to 63 QA pairs per image
61
+ - **Bilingual VQA synthesis**: English + Modern Standard Arabic
62
+ - **Null-field supervision**: Explicit "absence" questions to reduce hallucinations
63
+ - Generated via Gemini 2.5 Flash with structured metadata
64
+
65
+ #### 2. Arabic Fonts & Calligraphy (54K pairs)
66
+
67
+ - **20K calligraphy images** featuring Qur'anic verses
68
+ - **5 major Arabic scripts**: Thuluth (الثلث), Naskh (النسخ), Ruq'ah (الرقعة), Kufi (الكوفي), Diwani (الديواني)
69
+ - Dual objectives:
70
+ - Content identification (transcribing Arabic text)
71
+ - Script classification (recognizing calligraphic style)
72
+ - All prompts and responses are in Arabic
73
+
74
+ #### 3. Object Detection & Localization (1.6M pairs)
75
+
76
+ - Based on AllenAI public datasets
77
+ - Enhanced with instance-level bounding boxes
78
+ - **WordNet-style taxonomic expansion** for robust semantic coverage
79
+ - **Point-based grounding**: (x,y) coordinate lists for spatial reasoning
80
+ - **Bilingual**: 800K English + 800K Arabic (translated)
81
+
82
+ #### 4. General Image Captioning (34M pairs)
83
+
84
+ - **566K source images** from Pixmo dataset
85
+ - Detailed audio-transcribed captions (high quality)
86
+ - **27 paraphrased templates per language** to increase diversity
87
+ - **17M English + 17M Arabic** caption pairs
88
+
89
+ #### 5. Text-only Instruction (1.9M)
90
+
91
+ - UltraChat in English + Arabic translation
92
+ - Maintains dialogue capability independent of visual input
93
+ - Supports mixed text-visual interactions in realistic deployments
94
+
95
+ ### Training Methodology
96
+
97
+ - **Parameter-efficient fine-tuning**: LoRA (rank 128) on attention layers
98
+ - **Vision encoder frozen**: Preserves pretrained visual representations
99
+ - **Multi-run training with TIES merging**: Combines complementary strengths
100
+ - **Training scale**: 16 nodes, approximately 2 weeks
101
+ - **Language balance**: Strict 50/50 Arabic-English distribution maintained
102
+
103
+ ### Key Innovations
104
+
105
+ - **Taxonomy-guided cultural crawling** across 22 Arab countries
106
+ - **WordNet-style augmentation** for synonyms, hypernyms, long-tail concepts
107
+ - **Faithfulness by design**: Null-field supervision for hallucination reduction
108
+ - **Point-based grounding**: (x,y) coordinates for spatial understanding
109
+ - **Native Arabic calligraphy recognition**: 5 major script styles
110
+
111
+ ---
112
+ <!--
113
+ ## Examples
114
+
115
+ Below are real examples demonstrating Fanar-2-Oryx-IVU's capabilities across different tasks:
116
+
117
+ ### Example 1: Calligraphy Recognition
118
+
119
+ **Image**: *[Arabic calligraphy in Thuluth script]*
120
+
121
+ **Question (Arabic)**: ما نوع الخط المستخدم في هذه الصورة؟
122
+
123
+ **Fanar-2-Oryx-IVU Response**:
124
+ > الخط المستخدم هو خط الثلث، وهو من أجمل وأشهر الخطوط العربية. يتميز بحروفه الممدودة وأشكاله المتناسقة، وغالباً ما يُستخدم في كتابة الآيات القرآنية والزخارف الإسلامية.
125
+
126
+ *Translation: "The script used is Thuluth, one of the most beautiful and famous Arabic scripts. It is characterized by elongated letters and harmonious shapes, and is often used in writing Qur'anic verses and Islamic decorations."*
127
+
128
+ ---
129
+
130
+ ### Example 2: Cultural Understanding
131
+
132
+ **Image**: *[Traditional Gulf wedding ceremony with guests in cultural attire]*
133
+
134
+ **Question (English)**: What ceremony is shown in this image?
135
+
136
+ **Fanar-2-Oryx-IVU Response**:
137
+ > This appears to be a traditional Gulf wedding ceremony. The guests are wearing traditional attire - men in white thobes and ghutras, women in elegant abayas and hijabs. The setting shows the characteristic majlis-style seating arrangement and decorative elements typical of Qatari and Gulf wedding celebrations. The formal yet celebratory atmosphere is evident from the gathering and the traditional dress code being observed.
138
+
139
+ ---
140
+
141
+ ### Example 3: Object Detection & Counting
142
+
143
+ **Image**: *[Market scene with multiple people and objects]*
144
+
145
+ **Question (Arabic)**: كم عدد الأشخاص في الصورة؟ وأين يقفون؟
146
+
147
+ **Fanar-2-Oryx-IVU Response**:
148
+ > في الصورة خمسة أشخاص. ثلاثة منهم يقفون بالقرب من العربة على اليمين [(0.65, 0.42), (0.71, 0.45), (0.68, 0.52)]، وشخصان آخران على اليسار بالقرب من المحل [(0.23, 0.38), (0.28, 0.41)]. يبدو أنهم في سوق تقليدي أو سوق شعبي.
149
+
150
+ *Translation: "There are five people in the image. Three of them are standing near the cart on the right [coordinates], and two others on the left near the shop [coordinates]. They appear to be in a traditional or popular market."*
151
+
152
+ ---
153
+
154
+ ### Example 4: Landmark Recognition
155
+
156
+ **Image**: *[Museum of Islamic Art, Doha]*
157
+
158
+ **Question**: What is this building?
159
+
160
+ **Fanar-2-Oryx-IVU Response**:
161
+ > This is the Museum of Islamic Art (MIA) in Doha, Qatar. Designed by renowned architect I.M. Pei and opened in 2008, it is one of the world's most comprehensive collections of Islamic art. The distinctive modernist architecture features geometric Islamic patterns and sits on an artificial peninsula overlooking the Doha Corniche. The museum houses artifacts spanning 1,400 years from three continents and is a major cultural landmark in Qatar.
162
+
163
+ ---
164
+
165
+ ### Example 5: Dialect Understanding
166
+
167
+ **Image**: *[Street sign in Arabic]*
168
+
169
+ **Question (Egyptian dialect)**: إيه المكتوب في اللافتة دي؟
170
+
171
+ **Fanar-2-Oryx-IVU Response**:
172
+ > اللافتة مكتوب عليها "شارع الاستقلال" بالعربي. دي لافتة شارع واضحة بخط كبير، ومكتوبة بالعربية الفصحى.
173
+
174
+ *Translation: "The sign says 'Independence Street' in Arabic. It's a clear street sign in large font, written in Modern Standard Arabic."*
175
+
176
+ ---
177
+
178
+ ### Example 6: General Captioning
179
+
180
+ **Image**: *[Traditional Qatari family gathering]*
181
+
182
+ **Question (Arabic)**: اوصف ما تراه في هذه الصورة
183
+
184
+ **Fanar-2-Oryx-IVU Response**:
185
+ > الصورة تُظهر عائلة قطرية مجتمعة في مجلس تقليدي. الرجال يرتدون الثوب الأبيض والغترة، بينما النساء يرتدين العباءات والحجاب. الجلسة مُرتبة بأسلوب المجلس العربي التقليدي مع الوسائد والسجاد. في الوسط يوجد دلة القهوة العربية وأكواب صغيرة، مما يُشير إلى جلسة ضيافة تقليدية. الأجواء العائلية الدافئة واضحة من خلال التجمع والترتيب التقليدي للمكان.
186
+
187
+ *Translation: "The image shows a Qatari family gathered in a traditional majlis. Men wear white thobes and ghutras, while women wear abayas and hijabs. The seating is arranged in traditional Arab majlis style with cushions and carpets. In the center is an Arabic coffee dallah and small cups, indicating a traditional hospitality session. The warm family atmosphere is evident through the gathering and traditional arrangement of the space."*
188
+
189
+ ---
190
+ -->
191
+ ## Custom Evaluation Benchmarks
192
+
193
+ Fanar-2-Oryx-IVU was evaluated on multiple custom benchmarks designed specifically for Arabic cultural and linguistic assessment:
194
+
195
+ ### 1. Oryx-Almieyar (12K questions)
196
+
197
+ - **200 images** (10 per country, 20 Arab countries)
198
+ - **30 dialect experts** for manual annotation
199
+ - **Three language variants**: English, MSA, country-specific dialects
200
+ - **Country-level diagnostic analysis** for geographic coverage
201
+
202
+ ### 2. Oryx-BloomBench (7,747 pairs)
203
+
204
+ - **Bilingual** (English/Arabic)
205
+ - **6 Bloom's taxonomy levels**:
206
+ - Remember (2,948)
207
+ - Understand (1,592)
208
+ - Analyze (1,431)
209
+ - Create (685)
210
+ - Evaluate (592)
211
+ - Apply (499)
212
+ - Tests reasoning depth beyond surface perception
213
+
214
+ ### 3. TaskGalaxy Subset (12K samples)
215
+
216
+ - Broad regression test for general capabilities
217
+ - **19,227 hierarchical vision task types**
218
+ - Bilingual Arabic/English
219
+ - Prevents capability degradation during Arabic optimization
220
+
221
+ ---
222
+
223
+ ## Getting Started
224
+
225
+ Oryx-IVU is compatible with the Hugging Face `transformers` library. Here's how to load and use the model:
226
+
227
+ ```python
228
+ from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
229
+ from PIL import Image
230
+ import torch
231
+
232
+ model_name = "QCRI/Fanar-2-Oryx-IVU"
233
+
234
+ # Load model and processor
235
+ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
236
+ model_name,
237
+ torch_dtype=torch.bfloat16,
238
+ device_map="auto"
239
+ )
240
+ processor = AutoProcessor.from_pretrained(model_name)
241
+
242
+ # Load image
243
+ image = Image.open("path/to/image.jpg")
244
+
245
+ # Prepare conversation (supports Arabic or English)
246
+ messages = [
247
+ {
248
+ "role": "user",
249
+ "content": [
250
+ {"type": "image"},
251
+ {"type": "text", "text": "ما الذي تراه في هذه الصورة؟"}
252
+ ]
253
+ }
254
+ ]
255
+
256
+ # Process and generate
257
+ text_prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
258
+ inputs = processor(
259
+ text=[text_prompt],
260
+ images=[image],
261
+ padding=True,
262
+ return_tensors="pt"
263
+ ).to(model.device)
264
+
265
+ output_ids = model.generate(**inputs, max_new_tokens=256)
266
+ generated_text = processor.batch_decode(
267
+ output_ids,
268
+ skip_special_tokens=True,
269
+ clean_up_tokenization_spaces=True
270
+ )
271
+
272
+ print(generated_text[0])
273
+ ```
274
+
275
+ ### Multi-turn Conversation
276
+
277
+ ```python
278
+ # First turn
279
+ messages = [
280
+ {
281
+ "role": "user",
282
+ "content": [
283
+ {"type": "image"},
284
+ {"type": "text", "text": "What is this landmark?"}
285
+ ]
286
+ }
287
+ ]
288
+
289
+ # ... generate response ...
290
+
291
+ # Second turn (building on context)
292
+ messages.append({"role": "assistant", "content": generated_text[0]})
293
+ messages.append({
294
+ "role": "user",
295
+ "content": [{"type": "text", "text": "Tell me more about its history"}]
296
+ })
297
+
298
+ # ... generate response ...
299
+ ```
300
+
301
+ ---
302
+
303
+ ## Evaluation
304
+
305
+ ### Multiple-Choice Benchmarks (Arabic)
306
+
307
+ | Model | Arabic Culture | CamelBench | BloomBench | TaskGalaxy |
308
+ |-------|----------------|------------|------------|------------|
309
+ | **Fanar-2-Oryx-IVU** | **48.0%** | **45.0%** | **58.0%** | **74.0%** |
310
+ | Qwen2.5-VL (base) | **48.0%** | **45.0%** | **58.0%** | **74.0%** |
311
+ | Gemma-3-12B | 40.0% | 50.0% | 48.0% | 20.0% |
312
+ | Qwen2-VL-7B | 30.0% | 41.0% | 37.0% | 51.0% |
313
+ | AIN-7B | 33.0% | 45.0% | 45.0% | 61.0% |
314
+
315
+ *Note: Oryx-IVU matches base model on MCQ but excels in generation quality and Arabic coherence.*
316
+
317
+ ### Generative Evaluation (LLM-as-a-Judge, 1-5 scale)
318
+
319
+ Evaluated on **3,300 real user queries** with Gemini 2.5 Flash as judge:
320
+
321
+ | Model | Average Score | Comments |
322
+ |-------|---------------|----------|
323
+ | GPT-4o | 4.51 | Strongest overall |
324
+ | **Fanar-2-Oryx-IVU** | **3.03** | **Best among similar-sized models** |
325
+ | Qwen3-VL | 2.96 | Newer but lower quality |
326
+ | Qwen2.5-VL (base) | 2.76 | Our base model |
327
+ | Qwen2-VL | 2.21 | Older version |
328
+ | AIN-7B | 2.23 | Similar size competitor |
329
+
330
+ **Key Achievements:**
331
+
332
+ - Outperforms base model by **+0.27 points** (10% relative improvement)
333
+ - Outperforms newer Qwen3-VL despite being based on older Qwen2.5
334
+ - Best among all tested 7B-class models
335
+
336
+ ### Language Consistency Improvements
337
+
338
+ | Metric | Base Model (Qwen2.5-VL) | Fanar-2-Oryx-IVU | Improvement |
339
+ |--------|-------------------------|----------|-------------|
340
+ | Arabic-English code-switching | 11% | 6% | **-45% reduction** |
341
+ | Arabic-Chinese mixing | 3% | 1.5% | **-50% reduction** |
342
+
343
+ ### User Satisfaction (3,300 queries)
344
+
345
+ | Rating | Percentage |
346
+ |--------|-----------|
347
+ | Like | **70%** |
348
+ | Dislike | 25% |
349
+ | No Reaction | 5% |
350
+
351
+ ### Cultural Domain Excellence
352
+
353
+ Fanar-2-Oryx-IVU achieves leading performance in culturally-sensitive categories:
354
+
355
+ - **Food & Drink**: Top performer
356
+ - **Islamic Culture**: Top performer
357
+ - **Landmarks**: Top performer
358
+ - **Country-specific content**: Best for Algeria, Jordan, Palestine, Qatar, Sudan
359
+
360
+ ---
361
+
362
+ ## Intended Use, Limitations & Ethical Considerations
363
+
364
+
365
+ Fanar-2-Oryx-IVU is built for:
366
+
367
+ - **Cultural heritage documentation** and preservation
368
+ - **Educational applications** teaching Arabic culture and history
369
+ - **Accessibility tools** for Arabic-speaking visually impaired users
370
+ - **Content moderation** for Arabic social media platforms
371
+ - **E-commerce** product description generation in Arabic
372
+ - **Museum and tourism** applications with multilingual support
373
+ - **Calligraphy and document analysis** for historical texts
374
+ - **Research** on Arabic vision-language understanding
375
+
376
+ **Limitations:**
377
+
378
+ - May produce hallucinations despite mitigation strategies
379
+ - Arabic text recognition in images remains challenging
380
+ - Performance varies across different Arabic dialects
381
+ - May reflect biases present in training data
382
+ - Cannot perfectly understand all cultural nuances
383
+
384
+ **Recommendations:**
385
+
386
+ - Verify critical information from generated responses
387
+ - Use human review for sensitive applications
388
+ - Provide user feedback mechanisms
389
+ - Monitor for cultural appropriateness, hallucinations and errors
390
+ - Consider fine-tuning for domain-specific needs
391
+ - Implement fallback mechanisms for uncertain responses
392
+
393
+ **Not Suitable For:**
394
+
395
+ - Medical diagnosis or legal advice
396
+ - High-stakes decision-making
397
+ - Situations requiring perfect accuracy
398
+ - Replacing human judgment in cultural matters
399
+ - Surveillance applications
400
+
401
+ Kindly refer to our [Terms of Service](https://chat.fanar.qa/terms-of-service) and [Privacy Policy](https://chat.fanar.qa/privacy-policy).
402
+
403
+ The output generated by this model is not considered a statement of QCRI, HBKU, Qatar Foundation, MCIT, or any other organization or individual.
404
+
405
+ ---
406
+
407
+ ## Fanar Platform
408
+
409
+ While Fanar-2-27B-Instruct is a powerful standalone model, it is part of the broader **Fanar Platform**—an integrated Arabic-centric multimodal AI ecosystem that provides enhanced capabilities and continuous updates. The platform includes:
410
+
411
+ **Core Capabilities:**
412
+
413
+ - **Text Generation**: Multiple conversational models optimized for different tasks
414
+ - **Speech (Aura)**: Speech-to-text (short-form and long-form) and text-to-speech synthesis with Arabic dialect support and bilingual Arabic-English capabilities
415
+ - **Image Understanding (Oryx-IVU)**: Vision-language model for culturally-grounded image and video understanding including Arabic calligraphy recognition
416
+ - **Image Generation (Oryx-IG)**: Culturally-aligned text-to-image generation trained on taxonomy-driven data across 23,000+ cultural search terms
417
+ - **Machine Translation (FanarShaheen)**: High-quality bilingual Arabic↔English translation across diverse domains (e.g., news, STEM, and medical)
418
+ - **Poetry Generation (Diwan)**: Classical Arabic poetry generation respecting prosodic meters (Buhur) and maintaining diacritization accuracy
419
+
420
+ **Specialized Systems:**
421
+
422
+ - **Fanar-Sadiq**: Multi-agent Islamic question-answering system with 9 specialized tools (Fiqh reasoning, Quran/Hadith retrieval, zakat/inheritance calculation, prayer times, and Hijri calendar). Deployed in production on [IslamWeb](https://islamweb.net) and [IslamOnline](https://islamonline.net) platforms.
423
+ - **Safety & Moderation**: Fanar-Guard and culturally-informed content filtering trained on 468K annotated Arabic-English safety examples
424
+
425
+ **Access Points:**
426
+
427
+ - **[Fanar Chat](https://chat.fanar.qa)**: Web conversational interface integrating all modalities
428
+ - **[iOS](https://apps.apple.com/jo/app/fanar-فنار/id6741857943) and [Android](https://play.google.com/store/apps/details?id=com.fanarmobile) apps**: Mobile apps for on-the-go access to the Fanar Platform
429
+ - **[Fanar API](https://api.fanar.qa)**: Programmatic access to models and specialized capabilities
430
+
431
+ The Fanar Platform continuously evolves with model updates, new capabilities, and improved safety mechanisms. For production deployments requiring the latest features, multimodal integration, cross-model orchestration, and ongoing support, we recommend using the [Fanar Platform](https://fanar.qa) rather than the standalone models published here.
432
+
433
+ ---
434
+
435
+ ## Citation
436
+
437
+ If you use Fanar-2-Oryx-IVU or the Fanar 2.0 GenAI platform in your research or applications, please cite:
438
+
439
+ ```bibtex
440
+ @misc{fanarteam2026fanar20arabicgenerative,
441
+ title={Fanar 2.0: Arabic Generative AI Stack},
442
+ author={FANAR TEAM and Ummar Abbas and Mohammad Shahmeer Ahmad and Minhaj Ahmad and Abdulaziz Al-Homaid and Anas Al-Nuaimi and Enes Altinisik and Ehsaneddin Asgari and Sanjay Chawla and Shammur Chowdhury and Fahim Dalvi and Kareem Darwish and Nadir Durrani and Mohamed Elfeky and Ahmed Elmagarmid and Mohamed Eltabakh and Asim Ersoy and Masoomali Fatehkia and Mohammed Qusay Hashim and Majd Hawasly and Mohamed Hefeeda and Mus'ab Husaini and Keivin Isufaj and Soon-Gyo Jung and Houssam Lachemat and Ji Kim Lucas and Abubakr Mohamed and Tasnim Mohiuddin and Basel Mousi and Hamdy Mubarak and Ahmad Musleh and Mourad Ouzzani and Amin Sadeghi and Husrev Taha Sencar and Mohammed Shinoy and Omar Sinan and Yifan Zhang},
443
+ year={2026},
444
+ eprint={2603.16397},
445
+ archivePrefix={arXiv},
446
+ primaryClass={cs.CL},
447
+ url={https://arxiv.org/abs/2603.16397},
448
+ }
449
+ ```
450
+
451
+ ---
452
+
453
+ ## Acknowledgements
454
+
455
+ This project is from [Qatar Computing Research Institute (QCRI)](https://www.hbku.edu.qa/en/qcri) at [Hamad Bin Khalifa University (HBKU)](https://hbku.edu.qa), a member of Qatar Foundation. We thank our engineers, researchers, and support team for their efforts in advancing Arabic-centric large language models.
456
+
457
+ Special thanks to the [Ministry of Communications and Information Technology, State of Qatar](https://www.mcit.gov.qa/en/) for their continued support by providing the compute infrastructure needed to develop and serve the platform through the Google Cloud Platform.
458
+
459
+ ---
460
+
461
+ ## License
462
+
463
+ This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).