|
|
--- |
|
|
language: |
|
|
- ar |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- qwen |
|
|
- llama-factory |
|
|
- lora |
|
|
- arabic |
|
|
- question-answering |
|
|
- instruction-tuning |
|
|
- kaggle |
|
|
- transformers |
|
|
- fine-tuned |
|
|
model_name: QWEN_Arabic_Q&A |
|
|
base_model: Qwen/Qwen2.5-1.5B |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
datasets: |
|
|
- custom |
|
|
--- |
|
|
|
|
|
# ๐ง Qwen2.5-1.5B - LoRA Fine-Tuned on Arabic Q&A ๐ |
|
|
|
|
|
This model is a LoRA fine-tuned version of **[Qwen/Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B)** designed for Arabic Question Answering tasks. It was trained using the **LLaMA-Factory** framework on a custom curated dataset of Arabic Q&A pairs. |
|
|
|
|
|
## ๐ Training Configuration |
|
|
|
|
|
- **Base Model**: `Qwen/Qwen2.5-1.5B` |
|
|
- **Method**: Supervised Fine-Tuning (SFT) with [LoRA](https://arxiv.org/abs/2106.09685) |
|
|
- **Framework**: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) |
|
|
- **Batch Size**: 1 (gradient accumulation = 16) |
|
|
- **Epochs**: 3 |
|
|
- **Cutoff Length**: 2048 tokens |
|
|
- **Learning Rate**: 1e-4 |
|
|
- **Scheduler**: Cosine with warmup ratio 0.1 |
|
|
- **Precision**: bf16 |
|
|
- **LoRA Rank**: 64 |
|
|
- **LoRA Target**: all layers |
|
|
- **Eval Strategy**: every 200 steps |
|
|
- **Eval Set Size**: 3020 examples |
|
|
- **WandB Tracking**: Enabled [`Run Link`](https://wandb.ai/youssefhassan437972-kafr-el-sheikh-university/llamafactory/runs/rdrftts8) |
|
|
|
|
|
## ๐ Evaluation (Epoch ~1.77) |
|
|
|
|
|
- **Eval Loss**: 0.4321 |
|
|
- **Samples/sec**: 1.389 |
|
|
- **Steps/sec**: 0.695 |
|
|
|
|
|
## ๐ Usage |
|
|
|
|
|
You can use the model via `transformers`: |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("Youssef/QWEN_Arabic_Q&A") |
|
|
tokenizer = AutoTokenizer.from_pretrained("Youssef/QWEN_Arabic_Q&A") |
|
|
|
|
|
prompt = "ู
ู ูู ู
ุคุณุณ ุนูู
ุงูุฌุจุฑุ" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate(**inputs, max_new_tokens=100) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
|
|
|
|
|
|
<|user|> |
|
|
ู
ุง ูู ุฃุฑูุงู ุงูุฅุณูุงู
ู
ุน ุฐูุฑ ุงูุญุฏูุซ ุงูุฐู ูุฐูุฑูุงุ |
|
|
<|assistant|> |
|
|
ุฃุฑูุงู ุงูุฅุณูุงู
ุฎู
ุณุฉุ ูู
ุง ุฌุงุก ูู ุงูุญุฏูุซ ุงูุตุญูุญ: |
|
|
|
|
|
ุนู ุนุจุฏ ุงููู ุจู ุนู
ุฑ ุฑุถู ุงููู ุนููู
ุง ูุงู: ูุงู ุฑุณูู ุงููู ๏ทบ: "ุจูู ุงูุฅุณูุงู
ุนูู ุฎู
ุณ: ุดูุงุฏุฉ ุฃู ูุง ุฅูู ุฅูุง ุงูููุ ูุฃู ู
ุญู
ุฏูุง ุฑุณูู ุงูููุ ูุฅูุงู
ุงูุตูุงุฉุ ูุฅูุชุงุก ุงูุฒูุงุฉุ ูุตูู
ุฑู
ุถุงูุ ูุญุฌ ุงูุจูุช ูู
ู ุงุณุชุทุงุน ุฅููู ุณุจูููุง" (ุฑูุงู ุงูุจุฎุงุฑู ูู
ุณูู
). |
|
|
|
|
|
## ๐ Training Loss Over Epochs |
|
|
|
|
|
| Epoch | Learning Rate | Loss | |
|
|
|-------|------------------------|--------| |
|
|
| 0.16 | 5.39e-05 | 0.6304 | |
|
|
| 0.18 | 5.88e-05 | 0.6179 | |
|
|
| 0.19 | 6.37e-05 | 0.6042 | |
|
|
| 0.21 | 6.86e-05 | 0.6138 | |
|
|
| 0.22 | 7.35e-05 | 0.5940 | |
|
|
| 0.24 | 7.84e-05 | 0.5838 | |
|
|
| 0.25 | 8.33e-05 | 0.5842 | |
|
|
| 0.26 | 8.82e-05 | 0.5786 | |
|
|
| 0.28 | 9.31e-05 | 0.5713 | |
|
|
| 0.65 | 9.60e-05 | 0.6122 | |
|
|
| 0.71 | 9.45e-05 | 0.5809 | |
|
|
| 0.77 | 9.29e-05 | 0.5446 | |
|
|
| 0.82 | 9.10e-05 | 0.5339 | |
|
|
| 0.88 | 8.90e-05 | 0.5296 | |
|
|
| 0.94 | 8.67e-05 | 0.5176 | |
|
|
| 1.00 | 8.43e-05 | 0.5104 | |
|
|
| 1.06 | 8.17e-05 | 0.4685 | |
|
|
| 1.12 | 7.90e-05 | 0.4730 | |
|
|
| 1.18 | 7.62e-05 | 0.4679 | |
|
|
| 1.24 | 7.32e-05 | 0.4541 | |
|
|
| 1.30 | 7.01e-05 | 0.4576 | |
|
|
| 1.35 | 6.69e-05 | 0.4472 | |
|
|
| 1.41 | 6.36e-05 | 0.4427 | |
|
|
| 1.47 | 6.03e-05 | 0.4395 | |
|
|
| 1.53 | 5.69e-05 | 0.4305 | |
|
|
| 1.59 | 5.35e-05 | 0.4280 | |
|
|
| 1.65 | 5.01e-05 | 0.4251 | |
|
|
| 1.71 | 4.67e-05 | 0.4188 | |
|
|
| 1.77 | 4.33e-05 | 0.4177 | |
|
|
| 1.83 | 3.99e-05 | 0.4128 | |
|
|
|
|
|
**Evaluation Losses:** |
|
|
|
|
|
- ๐ Epoch 1.18 โ `0.4845` |
|
|
- ๐ Epoch 1.77 โ `0.4321` |
|
|
|