Qwen3.5-4B Saudi Dialect
This model is a Saudi dialect conversational fine-tune of unsloth/Qwen3.5-4B, trained from the notebook qwen3-5-4b-saudi-dialect-sft-modal.ipynb and pushed to Hugging Face as a merged standalone model:
- Model: https://huggingface.co/AyoubChLin/Qwen3.5-4B-saudi-dialect
- LoRA adapters: https://huggingface.co/AyoubChLin/Qwen3.5-4B-saudi-dialect-lora
- Dataset: https://huggingface.co/datasets/HeshamHaroon/saudi-dialect-conversations
- Base model: https://huggingface.co/unsloth/Qwen3.5-4B
The training setup uses Unsloth + TRL SFTTrainer with LoRA adapters and then merges the adapters back into the base model for easier deployment.
Model Details
- Base model:
unsloth/Qwen3.5-4B - Fine-tuning method: LoRA SFT
- Language: Arabic, focused on Saudi dialect conversations
- Training modality in this run: text-only conversational SFT
- Dataset split:
3545total examples ->3366train /179eval - System prompt used in training:
أنت مساعد مفيد يتحدث باللهجة السعودية العامية. - Tracking: Weights & Biases
- W&B run: https://wandb.ai/cherguelainea/qwen-saudi-dialect/runs/6udmlaan
Training Arguments
| Argument | Value |
|---|---|
max_seq_length |
4096 |
load_in_4bit |
False |
load_in_8bit |
False |
lora_r |
16 |
lora_alpha |
16 |
lora_dropout |
0 |
target_modules |
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
use_gradient_checkpointing |
"unsloth" |
per_device_train_batch_size |
16 |
per_device_eval_batch_size |
16 |
gradient_accumulation_steps |
4 |
| Effective global batch size | 64 |
warmup_steps |
5 |
num_train_epochs |
4 |
learning_rate |
4e-4 |
lr_scheduler_type |
linear |
optim |
adamw_8bit |
weight_decay |
0.01 |
dataset_text_field |
messages |
packing |
True in config, but Unsloth reported Sample packing skipped (vision-language model detected) |
remove_unused_columns |
False |
save_strategy |
steps |
save_steps |
100 |
eval_strategy |
steps |
eval_steps |
50 |
seed |
3407 |
report_to |
wandb |
| Precision used in this run | bf16 |
Training Results
Loss and Metrics
| Metric | Value |
|---|---|
eval/loss |
1.49976 |
train/loss (final W&B summary) |
1.18529 |
training_loss (trainer_stats) |
1.4871071903210766 |
train_runtime_seconds |
2490.3044 s |
train_runtime_minutes |
41.51 min |
train_samples_per_second |
5.407 |
train_steps_per_second |
0.085 |
eval/runtime |
9.6061 s |
eval/samples_per_second |
18.53 |
eval/steps_per_second |
1.249 |
train/global_step |
212 |
train/epoch |
4 |
train/grad_norm |
0.69472 |
total_flos |
7.760619536796672e+16 |
Trainable Parameters
| Item | Value |
|---|---|
| Total parameters | 4,560,499,200 |
| Trainable LoRA parameters | 21,233,664 |
| Trainable ratio | 0.4656% |
Hardware
| Item | Value |
|---|---|
| GPU | NVIDIA A100-SXM4-40GB |
| Number of GPUs | 1 |
| CUDA toolkit | 12.9 |
| Torch | 2.8.0+cu129 |
| Transformers | 5.3.0 |
| Unsloth | 2026.3.6 |
| GPU total memory | 39.494 GB |
| GPU memory reserved before training | 8.547 GB |
| Peak reserved GPU memory | 38.455 GB |
| Peak reserved GPU memory for LoRA training | 29.908 GB |
| Peak GPU memory usage | 97.37% of available GPU memory |
| System RAM | Not logged in the notebook outputs |
Recorded memory numbers above are GPU memory / VRAM measurements taken from the training run. The notebook did not record host system RAM.
Data Preparation
The dataset examples are conversation turns stored under messages. During preprocessing, a Saudi Arabic system prompt is prepended to each conversation before fine-tuning. The training notebook keeps only valid conversations and then performs a 5% evaluation split with seed 3407.
Usage
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "AyoubChLin/Qwen3.5-4B-saudi-dialect"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
torch_dtype="auto",
device_map="auto",
)
messages = [
{"role": "system", "content": "أنت مساعد مفيد يتحدث باللهجة السعودية العامية."},
{"role": "user", "content": "كيف حالك اليوم؟"},
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
enable_thinking=False,
return_tensors="pt",
).to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=200,
temperature=0.7,
top_p=0.9,
)
print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
Unsloth
Install
%%capture
import re, torch
v = re.match(r"[\d]{1,}\.[\d]{1,}", str(torch.__version__)).group(0)
xformers = "xformers==" + {
"2.10": "0.0.34",
"2.9": "0.0.33.post1",
"2.8": "0.0.32.post2",
}.get(v, "0.0.34")
!pip install sentencepiece protobuf "datasets>=2.18.0" "huggingface_hub>=0.34.0" hf_transfer wandb
!pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth
!pip install -q "transformers>=5.0.0"
!pip install -q --no-deps "trl>=0.15.0"
Run
from unsloth import FastLanguageModel
repo_id = "AyoubChLin/Qwen3.5-4B-saudi-dialect"
max_seq_length = 4096
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=repo_id,
max_seq_length=max_seq_length,
load_in_4bit=False, # this repo was pushed as merged_16bit
)
FastLanguageModel.for_inference(model)
messages = [
{
"role": "system",
"content": [
{"type": "text", "text": "أنت مساعد مفيد يتحدث باللهجة السعودية العامية."}
],
},
{
"role": "user",
"content": [
{"type": "text", "text": "كيف حالك اليوم؟"}
],
},
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
enable_thinking=False,
return_tensors="pt",
).to(model.device)
output_ids = model.generate(
input_ids=input_ids,
max_new_tokens=200,
use_cache=True,
temperature=0.7,
top_p=0.9,
)
response = tokenizer.decode(
output_ids[0][input_ids.shape[-1]:],
skip_special_tokens=True,
)
print(response)
Notes
- This repository contains the merged full model pushed with
save_method="merged_16bit". - A separate LoRA adapter repository is also available:
AyoubChLin/Qwen3.5-4B-saudi-dialect-lora. - The base checkpoint is multimodal-capable, but this fine-tune was trained on text-only dialogue data.
- The training data is conversational and dialect-specific, so outputs may reflect biases or stylistic patterns present in the source dataset.
- Downloads last month
- 104
