Turkish-LLM-32B-Instruct / README.md

ogulcanaydogan

Update model card with v2 benchmark results (MMLU-TR 0.6789, XCOPA-TR 0.690)

6edfde8 verified 19 days ago

preview code

raw

history blame contribute delete

5.8 kB

metadata

language:
  - tr
  - en
license: apache-2.0
base_model: Qwen/Qwen2.5-32B-Instruct
tags:
  - turkish
  - qwen2
  - instruction-tuned
  - sft
  - qlora
  - tr
  - reasoning
  - conversational
  - low-resource
  - turkish-nlp
datasets:
  - ogulcanaydogan/Turkish-LLM-v10-Training
pipeline_tag: text-generation
model-index:
  - name: Turkish-LLM-32B-Instruct
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-TR
          type: custom
        metrics:
          - name: accuracy
            type: acc
            value: 0.6789
      - task:
          type: text-generation
        dataset:
          name: XCOPA-TR
          type: xcopa
        metrics:
          - name: accuracy
            type: acc
            value: 0.69
      - task:
          type: text-generation
        dataset:
          name: XNLI-TR
          type: xnli
        metrics:
          - name: accuracy
            type: acc
            value: 0.4514

Turkish-LLM-32B-Instruct

The largest open-source Turkish-enhanced language model. Fine-tuned from Qwen2.5-32B-Instruct with QLoRA on a carefully curated 173K Turkish instruction dataset.

Part of the Turkish LLM Family - a complete suite of Turkish language models from 7B to 32B.

Highlights

32B parameters - largest openly available Turkish fine-tuned model
Outperforms base model on MMLU-TR (+2.71) and XCOPA-TR (+1.00)
67.89% MMLU-TR - significant improvement through iterative dataset engineering
GGUF available - Q4/Q5/Q8 quantizations for local inference

Benchmark Results

Benchmark	Base (Qwen2.5-32B)	v1 (Mar 21)	v2 (Current)	Delta vs Base
MMLU-TR (57 categories)	0.6518	0.6564	0.6789	+2.71
XCOPA-TR (Causal Reasoning)	0.6800	0.6740	0.6900	+1.00
XNLI-TR (NLI)	0.4578	0.4610	0.4514	-0.64

Iterative Improvement

This model is the result of systematic dataset engineering across multiple iterations:

v1 (Mar 21): Initial fine-tune with 242K examples. Improved MMLU-TR and XNLI-TR but regressed on XCOPA-TR.
v2 (Mar 29): Rebalanced dataset (173K examples) with XCOPA augmentation and evaluation-aligned NLI formatting. Achieved improvements on both MMLU-TR and XCOPA-TR.

Key insight: reducing dataset size from 242K to 173K while improving data quality led to better results — quality over quantity.

MMLU-TR: Strongest Category Improvements (v2)

Category	Base	Ours	Delta
College Computer Science	0.545	0.616	+7.1
Logical Fallacies	0.640	0.696	+5.6
College Mathematics	0.530	0.580	+5.0
Formal Logic	0.508	0.556	+4.8
High School Mathematics	0.507	0.548	+4.1

Quick Start

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "ogulcanaydogan/Turkish-LLM-32B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("ogulcanaydogan/Turkish-LLM-32B-Instruct")

messages = [
    {"role": "system", "content": "Sen yardimci bir Turkce asistansin."},
    {"role": "user", "content": "Yapay zekanin saglik sektorundeki uygulamalarini acikla."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With Ollama (GGUF)

ollama run hf.co/ogulcanaydogan/Turkish-LLM-32B-Instruct-GGUF:Q4_K_M

With vLLM

vllm serve ogulcanaydogan/Turkish-LLM-32B-Instruct --dtype auto --max-model-len 4096

Training Details

Parameter	Value
Base Model	Qwen/Qwen2.5-32B-Instruct
Method	QLoRA (4-bit NF4 + double quantization)
LoRA rank / alpha	32 / 64
Learning rate	1e-5 (cosine schedule)
Epochs	1
Effective batch size	16
Max sequence length	2048
Training time	~55 hours on NVIDIA A100 80GB
Dataset	173K Turkish instruction examples (v7.1)

Dataset Composition (v7.1)

Source	Examples	Percentage
Turkish Math	100,000	57.9%
Turkish Exam Instructions	41,297	23.9%
XNLI Augmented (MC format)	10,000	5.8%
GSM8K Turkish	8,760	5.1%
Alignment Data	7,245	4.2%
XCOPA Augmented	5,000	2.9%
GPQA Turkish	545	0.3%

Turkish LLM Family

Model	Size	MMLU-TR	Download
Turkish-LLM-7B-Instruct	7B	-	GGUF
Turkish-LLM-14B-Instruct	14B	0.5977	GGUF
Turkish-LLM-32B-Instruct	32B	0.6789	GGUF

Limitations

Slight regression on XNLI-TR natural language inference (-0.64 points)
Inherits base model limitations for very long contexts
Best suited for Turkish STEM, reasoning, and general knowledge tasks

Citation

@misc{aydogan2026turkishllm,
  title={Turkish LLM Family: Open-Source Turkish Language Models},
  author={Ogulcan Aydogan},
  year={2026},
  url={https://huggingface.co/collections/ogulcanaydogan/turkish-llm-family-69b303b4ef1c36caffca4e94}
}