Turkish LLM Family
Collection
Open-source Turkish LLM family (1.5B-32B). Models, GGUF quantizations, datasets, and demos. • 8 items • Updated
The largest open-source Turkish-enhanced language model. Fine-tuned from Qwen2.5-32B-Instruct with QLoRA on 242K Turkish instruction examples.
Part of the Turkish LLM Family - a complete suite of Turkish language models from 7B to 32B.
| Benchmark | Base (Qwen2.5-32B) | Ours | Delta |
|---|---|---|---|
| MMLU-TR (57 categories) | 0.6518 | 0.6564 | +0.46 |
| XNLI-TR (NLI) | 0.4578 | 0.4610 | +0.32 |
| XCOPA-TR (Causal) | 0.6800 | 0.6740 | -0.60 |
| Category | Base | Ours | Delta |
|---|---|---|---|
| College Computer Science | 0.545 | 0.616 | +7.1 |
| Logical Fallacies | 0.640 | 0.696 | +5.6 |
| College Mathematics | 0.530 | 0.580 | +5.0 |
| Formal Logic | 0.508 | 0.556 | +4.8 |
| High School Mathematics | 0.507 | 0.548 | +4.1 |
32 of 57 categories improved vs base model.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"ogulcanaydogan/Turkish-LLM-32B-Instruct",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("ogulcanaydogan/Turkish-LLM-32B-Instruct")
messages = [
{"role": "system", "content": "Sen yardimci bir Turkce asistansin."},
{"role": "user", "content": "Yapay zekanin saglik sektorundeki uygulamalarini acikla."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
ollama run hf.co/ogulcanaydogan/Turkish-LLM-32B-Instruct-GGUF:Q4_K_M
vllm serve ogulcanaydogan/Turkish-LLM-32B-Instruct --dtype auto --max-model-len 4096
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-32B-Instruct |
| Method | QLoRA (4-bit NF4 + double quantization) |
| LoRA rank / alpha | 32 / 64 |
| Learning rate | 1e-5 (cosine schedule) |
| Epochs | 1 |
| Effective batch size | 16 |
| Max sequence length | 2048 |
| Training time | ~3 days on NVIDIA A100 80GB |
| Dataset | 242K Turkish instruction examples |
| Model | Size | MMLU-TR | Download |
|---|---|---|---|
| Turkish-LLM-7B-Instruct | 7B | - | GGUF |
| Turkish-LLM-14B-Instruct | 14B | 0.5977 | GGUF |
| Turkish-LLM-32B-Instruct | 32B | 0.6564 | GGUF |
@misc{aydogan2026turkishllm,
title={Turkish LLM Family: Open-Source Turkish Language Models},
author={Ogulcan Aydogan},
year={2026},
url={https://huggingface.co/collections/ogulcanaydogan/turkish-llm-family-69b303b4ef1c36caffca4e94}
}