Saudi Dialect Fine-Tuned Models
Collection
A collection of Saudi Arabic chat models fine-tuned on the same HeshamHaroon/saudi-dialect-conversations dataset. All checkpoints follow the same supe • 5 items • Updated • 1
This model is a Saudi dialect conversational fine-tune of unsloth/Qwen3.5-2B:
The training setup uses Unsloth + TRL SFTTrainer with LoRA adapters and then merges the adapters back into the base model for easier deployment.
unsloth/Qwen3.5-2B3545 total examples -> 3366 train / 179 evalأنت مساعد مفيد يتحدث باللهجة السعودية العامية.| Argument | Value |
|---|---|
max_seq_length |
4096 |
load_in_4bit |
False |
load_in_8bit |
False |
lora_r |
16 |
lora_alpha |
16 |
lora_dropout |
0 |
target_modules |
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
use_gradient_checkpointing |
"unsloth" |
per_device_train_batch_size |
16 |
per_device_eval_batch_size |
16 |
gradient_accumulation_steps |
4 |
| Effective global batch size | 64 |
warmup_steps |
5 |
num_train_epochs |
4 |
learning_rate |
4e-4 |
lr_scheduler_type |
linear |
optim |
adamw_8bit |
weight_decay |
0.01 |
dataset_text_field |
messages |
packing |
True in config, but Unsloth reported Sample packing skipped (vision-language model detected) |
remove_unused_columns |
False |
save_strategy |
steps |
save_steps |
100 |
eval_strategy |
steps |
eval_steps |
50 |
seed |
3407 |
report_to |
wandb |
| Precision used in this run | bf16 |
| Metric | Value |
|---|---|
eval/loss |
1.69532 |
train/loss (final W&B summary) |
1.321431 |
training_loss (trainer_stats) |
1.4274427658981748 |
train/epoch |
8 |
train/grad_norm |
0.94683 |
total_flos |
3.5740866072896256e+16 |
| Item | Value |
|---|---|
| Total parameters | 2,224,153,408 |
| Trainable LoRA parameters | 10,911,744 |
| Trainable ratio | 0.4956% |
| Item | Value |
|---|---|
| GPU | NVIDIA A100-SXM4-40GB |
| Number of GPUs | 1 |
| CUDA toolkit | 12.9 |
| Torch | 2.8.0+cu129 |
| Transformers | 5.3.0 |
| Unsloth | 2026.3.6 |
| GPU total memory | 39.494 GB |
| GPU memory reserved before training | 8.547 GB |
| Peak reserved GPU memory | 38.455 GB |
| Peak reserved GPU memory for LoRA training | 29.908 GB |
| Peak GPU memory usage | 97.37% of available GPU memory |
| System RAM | Not logged in the notebook outputs |
Recorded memory numbers above are GPU memory / VRAM measurements taken from the training run. The notebook did not record host system RAM.
The dataset examples are conversation turns stored under messages. During preprocessing, a Saudi Arabic system prompt is prepended to each conversation before fine-tuning. The training notebook keeps only valid conversations and then performs a 5% evaluation split with seed 3407.
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "AyoubChLin/Qwen3.5-2B-saudi-dialect"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
torch_dtype="auto",
device_map="auto",
)
messages = [
{"role": "system", "content": "أنت مساعد مفيد يتحدث باللهجة السعودية العامية."},
{"role": "user", "content": "كيف حالك اليوم؟"},
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
enable_thinking=False,
return_tensors="pt",
).to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=200,
temperature=0.7,
top_p=0.9,
)
print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
Install
%%capture
import re, torch
v = re.match(r"[\d]{1,}\.[\d]{1,}", str(torch.__version__)).group(0)
xformers = "xformers==" + {
"2.10": "0.0.34",
"2.9": "0.0.33.post1",
"2.8": "0.0.32.post2",
}.get(v, "0.0.34")
!pip install sentencepiece protobuf "datasets>=2.18.0" "huggingface_hub>=0.34.0" hf_transfer wandb
!pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth
!pip install -q "transformers>=5.0.0"
!pip install -q --no-deps "trl>=0.15.0"
Run
from unsloth import FastLanguageModel
repo_id = "AyoubChLin/Qwen3.5-2B-saudi-dialect"
max_seq_length = 4096
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=repo_id,
max_seq_length=max_seq_length,
load_in_4bit=False, # this repo was pushed as merged_16bit
)
FastLanguageModel.for_inference(model)
messages = [
{
"role": "system",
"content": [
{"type": "text", "text": "أنت مساعد مفيد يتحدث باللهجة السعودية العامية."}
],
},
{
"role": "user",
"content": [
{"type": "text", "text": "كيف حالك اليوم؟"}
],
},
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
enable_thinking=False,
return_tensors="pt",
).to(model.device)
output_ids = model.generate(
input_ids=input_ids,
max_new_tokens=200,
use_cache=True,
temperature=0.7,
top_p=0.9,
)
response = tokenizer.decode(
output_ids[0][input_ids.shape[-1]:],
skip_special_tokens=True,
)
print(response)
save_method="merged_16bit".AyoubChLin/Qwen3.5-2B-saudi-dialect-lora.