Applied-Innovation-Center
/

AIC-1

text-generation-inference

Model card Files Files and versions

AIC-MCIT commited on Sep 10, 2025

Commit

fdddc69

·

verified ·

1 Parent(s): 3fc2ed4

Update README.md

Files changed (1) hide show

README.md +66 -3

README.md CHANGED Viewed

@@ -1,3 +1,66 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- ar
+- en
+base_model:
+- Qwen/Qwen2.5-32B-Instruct
+tags:
+- text-generation-inference
+---
+## Model Overview
+This model is an extended version of the **Qwen2.5- 32B-Instruct** model, specifically adapted to enhance its performance in Arabic. While Qwen 2.5 provides strong general instruction-following capabilities across multiple languages, this extended version focuses on improving fluency, comprehension, and reasoning in Arabic, with particular emphasis on low-resource domains where information is often sparse or underrepresented. The model was further tuned to handle diverse Arabic styles and information, improve factual grounding in regional knowledge, and provide more accurate responses in contexts where existing multilingual models may fall short.
+---
+## Training Strategy
+- **Instruction Fine-Tuning (IFT):**
+  - Fine-tuned on a mix of Arabic and English instruction–response datasets.
+  - Covered both high-resource and low-resource domains.
+  - Included different writing styles to improve adaptability.
+- **Human Alignment:**
+  - Collected human preference data on Arabic and bilingual outputs.
+  - Applied Direct Preference Optimization **(DPO)**.
+  - Focused on factual accuracy, safety, and cultural sensitivity.
+---
+## Usage
+### How to Use
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "Applied-Innovation-Center/AIC-1"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "ما هي عاصمة مصر"
+messages = [
+	{"role": "system", "content": "You are an AI assistant. Always answer user questions with factual, evidence-based information. If you are unsure or the information is unavailable, clearly state that you do not know instead of guessing. Do not invent details. Keep responses concise, clear, and accurate. Avoid speculation, opinions, or creative storytelling unless explicitly asked for."}
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=512
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]