Qwen3.5-9B-Instruct-Turca-TurkishLLM 🇹🇷

A supervised fine-tune of Qwen/Qwen3.5-9B designed for Turkish instruction following, reasoning, and natural language generation. Trained using LoRA on a large-scale Turkish instruction dataset to improve fluency, instruction adherence, and conversational quality in Turkish while preserving strong English capabilities.


Model Information

Field Value
Developer Muhammed Köse
LinkedIn muhammedksee
GitHub MuhammedKsee
Base Model Qwen/Qwen3.5-9B
Fine-tuning Method SFT (Supervised Fine-Tuning) via LoRA
Libraries PEFT 0.18.1, Transformers, TRL
Languages Turkish (primary), English
License Apache-2.0
Training Hardware NVIDIA H100 80GB

Key Features

  • Strong Turkish instruction-following capability
  • Natural and fluent conversational responses
  • Improved Turkish grammar and semantic understanding
  • Preserves strong English reasoning ability — no catastrophic forgetting observed
  • Long-form text generation support
  • Chat-optimized behavior for assistant use cases

Training Dataset

Field Value
Dataset InstrucTurca
Samples used 500,000
Task coverage Instruction following, QA, summarization, translation, reasoning
Data format ChatML multi-turn conversation format

Training Configuration

Parameter Value
Method SFT + LoRA
LoRA Rank (r) 32
LoRA Alpha 32
LoRA Dropout 0.05
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Max Sequence Length 4096
Epochs 1
Learning Rate 1.5e-4 (Cosine Scheduler)
Effective Batch Size 32 (4 per device × 8 grad accum)
Optimizer adamw_torch_fused
Precision BF16 / TF32
Attention Flash Attention 2

Benchmark Results

Evaluated using EleutherAI lm-evaluation-harness v0.4.2 on an H100 GPU. pretrained: MuhammedKsee/Qwen3.5-9B-Instruct-Turca-TurkishLLM · batch_size: auto (64)

English Reasoning & Knowledge (0-shot)

Task Metric Score Stderr
MMLU (Overall) acc 0.7787
— Humanities acc 0.6922
— Social Sciences acc 0.8694
— STEM acc 0.7758
— Other acc 0.8230
HellaSwag acc_norm 0.7834
ARC-Challenge acc_norm 0.5375

Reasoning & Code (few-shot)

Task n-shot Filter Metric Score Stderr
GSM8K 5 flexible-extract exact_match 0.8438 ±0.0100
GSM8K 5 strict-match exact_match 0.8491 ±0.0099
HumanEval (instruct) 0 create_test pass@1 0.2622 ±0.0345
TinyTruthfulQA 0 none acc 0.4724

Turkish NLP Benchmarks (0-shot)

Task Metric Score
Belebele (TR) acc 0.8144
Turkish MMLU (avg) acc 0.6555
XCOPA (TR) acc 0.6780

No catastrophic forgetting observed — English reasoning is fully preserved after Turkish fine-tuning.


Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "MuhammedKsee/Qwen3.5-9B-Instruct-Turca-TurkishLLM"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "Sen yardımcı, dürüst ve zararsız bir Türkçe yapay zeka asistanısın."},
    {"role": "user",   "content": "Türkiye'nin en büyük şehri hangisidir?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1
)
generated_ids = [o[inputs.input_ids.shape[1]:] for o in generated_ids]
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])

The system prompt is kept in Turkish to match the fine-tuning distribution. English system prompts also work but have not been formally evaluated.


Example Prompts

General chat

Bana motivasyon veren kısa bir konuşma yap.

Reasoning

Bir öğrenciye yapay zekayı basit şekilde anlat.

Content generation

Türkçe öğrenen biri için kısa bir hikaye yaz.

Translation

"Machine learning is transforming the world" cümlesini Türkçeye çevir.

Intended Use

Use Case Verdict
Turkish conversational assistant ✅ Primary use case
Turkish RAG / document QA ✅ Recommended
Summarization & translation ✅ Intended use
Educational & tutoring applications ✅ Good fit
Local inference (quantized) ✅ See GGUF repo
Production enterprise (high-stakes) ⚠️ Evaluate on your specific workload
Advanced mathematical proofs ⚠️ Not specialized for this

Limitations

  • SFT-only model — no DPO or RLHF alignment stage applied
  • May occasionally produce non-optimal answers in safety-critical scenarios
  • Single epoch training over 500K samples
  • Turkish-specific benchmarks are a subset of available evaluations — community results welcome

Related Resources

Resource Link
GGUF (local inference) MuhammedKsee/Qwen3.5-9B-Instruct-Turca-TurkishLLM-GGUF
Training dataset turkish-nlp-suite/InstrucTurca
Base model Qwen/Qwen3.5-9B

Citation

@misc{kose2026qwen35turca,
  author    = {Muhammed Köse},
  title     = {Qwen3.5-9B-Instruct-Turca-TurkishLLM},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/MuhammedKsee/Qwen3.5-9B-Instruct-Turca-TurkishLLM}
}
Downloads last month
-
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MuhammedKsee/Qwen3.5-9B-Instruct-Turca-TurkishLLM

Finetuned
Qwen/Qwen3.5-9B
Adapter
(161)
this model

Dataset used to train MuhammedKsee/Qwen3.5-9B-Instruct-Turca-TurkishLLM

Space using MuhammedKsee/Qwen3.5-9B-Instruct-Turca-TurkishLLM 1