Instructions to use AyoubChLin/Qwen3.5-9B-saudi-dialect with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AyoubChLin/Qwen3.5-9B-saudi-dialect with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AyoubChLin/Qwen3.5-9B-saudi-dialect")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("AyoubChLin/Qwen3.5-9B-saudi-dialect")
model = AutoModelForMultimodalLM.from_pretrained("AyoubChLin/Qwen3.5-9B-saudi-dialect")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AyoubChLin/Qwen3.5-9B-saudi-dialect with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AyoubChLin/Qwen3.5-9B-saudi-dialect"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AyoubChLin/Qwen3.5-9B-saudi-dialect",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AyoubChLin/Qwen3.5-9B-saudi-dialect

SGLang

How to use AyoubChLin/Qwen3.5-9B-saudi-dialect with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AyoubChLin/Qwen3.5-9B-saudi-dialect" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AyoubChLin/Qwen3.5-9B-saudi-dialect",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AyoubChLin/Qwen3.5-9B-saudi-dialect" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AyoubChLin/Qwen3.5-9B-saudi-dialect",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use AyoubChLin/Qwen3.5-9B-saudi-dialect with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AyoubChLin/Qwen3.5-9B-saudi-dialect to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AyoubChLin/Qwen3.5-9B-saudi-dialect to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for AyoubChLin/Qwen3.5-9B-saudi-dialect to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="AyoubChLin/Qwen3.5-9B-saudi-dialect",
    max_seq_length=2048,
)

Docker Model Runner
How to use AyoubChLin/Qwen3.5-9B-saudi-dialect with Docker Model Runner:
```
docker model run hf.co/AyoubChLin/Qwen3.5-9B-saudi-dialect
```

Qwen3.5-9B Saudi Dialect

This repository contains merged full weights for Saudi-dialect chat generation, not just LoRA adapters. The model was fine-tuned from unsloth/Qwen3.5-9B with Unsloth LoRA SFT on Saudi Arabic conversations, then merged into a standalone merged_16bit checkpoint for direct use with plain transformers or Unsloth.

Model details

Base model: unsloth/Qwen3.5-9B
Training style: Unsloth LoRA SFT
System prompt: أنت مساعد مفيد يتحدث باللهجة السعودية العامية.
Max sequence length: 4096

Training data and setup

Dataset: HeshamHaroon/saudi-dialect-conversations
Raw dataset size: 3545
Post-filter split: 3366 train / 179 eval
Eval split: 5%
Seed: 3407
LoRA config: r=16, alpha=16, dropout=0
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Batch size: 16
Gradient accumulation: 4
Effective batch size: 64
Epochs: 4
Learning rate: 4e-4
Warmup steps: 5
Optimizer: adamw_8bit
Packing: enabled
Hardware/runtime: NVIDIA A100-SXM4-80GB, bf16, 2608.5s (43.48 min)

Results

Final eval/loss: 1.41955
Final logged train/loss: 1.10579
Trainer aggregate training_loss: 1.3807367114525921
This is the run-level average reported by the trainer, not the last logged training step.
Peak reserved memory: 57.799 GB
LoRA-attributed reserved memory: 40.145 GB
Peak memory share: 72.93%

From the provided loss screenshot, eval loss drops from about 1.47 early in training to about 1.40 around the middle of the run, then rises slightly to about 1.42 near step 200. Train loss falls from about 3.1 at the start to about 1.1 by the end.

The published repository is the final merged checkpoint from the run, not an explicitly selected best-eval checkpoint.

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "AyoubChLin/Qwen3.5-9B-saudi-dialect"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "system", "content": "أنت مساعد مفيد يتحدث باللهجة السعودية العامية."},
    {"role": "user", "content": "كيف حالك اليوم؟"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    enable_thinking=False,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9,
)

print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))

Install Unsloth

%%capture
import re, torch

v = re.match(r"[\d]{1,}\.[\d]{1,}", str(torch.__version__)).group(0)
xformers = "xformers==" + {
    "2.10": "0.0.34",
    "2.9": "0.0.33.post1",
    "2.8": "0.0.32.post2",
}.get(v, "0.0.34")

!pip install sentencepiece protobuf "datasets>=2.18.0" "huggingface_hub>=0.34.0" hf_transfer wandb
!pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth
!pip install -q "transformers>=5.0.0"
!pip install -q --no-deps "trl>=0.15.0"

Unsloth

This repo was pushed as merged_16bit, so load it with load_in_4bit=False.

from unsloth import FastLanguageModel

repo_id = "AyoubChLin/Qwen3.5-9B-saudi-dialect"
max_seq_length = 4096

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=repo_id,
    max_seq_length=max_seq_length,
    load_in_4bit=False,  # this repo was pushed as merged_16bit
)

FastLanguageModel.for_inference(model)

messages = [
    {
        "role": "system",
        "content": [
            {"type": "text", "text": "أنت مساعد مفيد يتحدث باللهجة السعودية العامية."}
        ],
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "كيف حالك اليوم؟"}
        ],
    },
]

input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    enable_thinking=False,
    return_tensors="pt",
).to(model.device)

output_ids = model.generate(
    input_ids=input_ids,
    max_new_tokens=200,
    use_cache=True,
    temperature=0.7,
    top_p=0.9,
)

response = tokenizer.decode(
    output_ids[0][input_ids.shape[-1]:],
    skip_special_tokens=True,
)
print(response)

Related artifacts

LoRA adapters: AyoubChLin/Qwen3.5-9B-saudi-dialect-lora

Limitations

No external benchmark evaluation is included beyond train/eval loss on the source dataset.
Saudi dialect coverage is likely uneven across regions, phrasing styles, and topics.
The model can still hallucinate, over-generalize, or drift toward more formal Arabic depending on the prompt.