Avaria Pygmalion-TR

Avaria Pygmalion-TR

Avaria Pygmalion-TR is a QLoRA adapter trained on top of Trendyol/Trendyol-LLM-8B-T1 for Turkish-language grade-school math reasoning (GSM8K-style).

Important. This repository contains only the LoRA adapter weights (~167 MB). It is not a standalone model. To use it, load Trendyol/Trendyol-LLM-8B-T1 as the base model and attach this adapter on top via PEFT.

What it is

  • Adapter type: QLoRA (4-bit NF4 base + bf16 LoRA, rank 16, double-quant)
  • Base model: Trendyol/Trendyol-LLM-8B-T1
  • Training data: bezir/gsm8k-tr (7884 train rows, 90/10 split, seed 42)
  • Training: 1 epoch, 493 optimizer steps, paged AdamW 8-bit, max_seq_length=512, effective batch 16, lr 1e-4 cosine, warmup_ratio 0.03, gradient checkpointing
  • Hardware used: single NVIDIA RTX 5070 Ti, 16 GB VRAM
  • Trainable params: 43,646,976 (~0.92% of base)

Headline benchmark — malhajar/gsm8k_tr-v0.2, n=500 (test split)

The primary release benchmark. Evaluated on the first 500 examples of malhajar/gsm8k_tr-v0.2 test split, greedy deterministic decoding, 4-bit NF4 inference, batch_size=4.

metric base Avaria Pygmalion-TR
exact-match correct 122 / 500 305 / 500
accuracy 24.4% 61.0%
comparison value
absolute improvement +36.6 pp
relative improvement +150.0%
both correct 99
only base correct 23
only LoRA correct 206
both wrong 172
base extraction failures 1
LoRA extraction failures 0
batch size used 4
OOM fallbacks 0
total wall time 6,719.1 s (~1h 52m)
avg base gen time / sample 5.50 s
avg LoRA gen time / sample 7.89 s

Figures

accuracy

outcome breakdown

fixed vs regressed

generation time

Caveat — not a leaderboard claim. malhajar/gsm8k_tr-v0.2 is a Turkish translation of the same English GSM8K problems that bezir/gsm8k-tr (our train data) derives from. Treat the +36.6 pp lift as an upper bound on in-distribution reasoning gain, not as a generalization measurement. A truly held-out Turkish reasoning benchmark (e.g. lm-eval-harness Turkish tasks, Cetvel) is the next step and has not been run yet.

Catastrophic-forgetting / over-specialization check

30 deterministic prompts across 7 categories (general knowledge, writing/editing, coding, non-math logic, casual/emotional, instruction-following, math-control). Heuristic flags math-format leakage (####, Adım N, Cevap: N) into non-math answers.

metric value
total prompts 30
OK 28 / 30
over_specialized_math_format 2 (in coding & abstract-logic categories)
degraded 0
empty / refusal 0
non-math prompts containing #### 2 / 26 = 7.69%
verdict PASS (with minor caveat)

The two flagged cases were short coding/logic prompts where the adapter still emitted a math-format answer scaffold. General behavior, writing, and casual conversation remained intact. Use a clear system prompt for non-math tasks if you want to suppress math formatting entirely.

How to use

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

BASE = "Trendyol/Trendyol-LLM-8B-T1"
ADAPTER = "pancodurden/Avaria-Pygmalion-TR"  # this repo

bnb = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

tok = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

base = AutoModelForCausalLM.from_pretrained(
    BASE, quantization_config=bnb, device_map={"": 0}, trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, ADAPTER)
model.eval()

system = (
    "Sen Türkçe konuşan bir matematik öğretmenisin. Soruyu adım adım çöz "
    "ve cevabı en sonda mutlaka şu formatta ver: #### <sayı>"
)
user = "Ali'nin 5 elması var, 2 tane daha alırsa kaç elması olur?"
prompt = tok.apply_chat_template(
    [{"role": "system", "content": system},
     {"role": "user", "content": user}],
    tokenize=False, add_generation_prompt=True,
)
enc = tok(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
    out = model.generate(
        **enc, max_new_tokens=256, do_sample=False, num_beams=1,
        use_cache=True, pad_token_id=tok.eos_token_id,
    )
print(tok.decode(out[0], skip_special_tokens=True))

Training summary

method QLoRA (4-bit NF4 base + bf16 LoRA)
compute dtype bfloat16
LoRA rank / alpha / dropout 16 / 32 / 0.05
target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
max_seq_length 512
per_device_train_batch_size 1
gradient_accumulation_steps 16
effective batch size 16
lr / scheduler / warmup 1e-4 / cosine / 0.03
optimizer paged_adamw_8bit
epochs / steps 1 / 493
gradient checkpointing true
seed 42

Files in this repo

adapter_config.json          # PEFT LoRA config
adapter_model.safetensors    # ~167 MB, the adapter weights
chat_template.jinja          # base model's chat template, copied
tokenizer.json               # base tokenizer
tokenizer_config.json
metrics.json                 # all benchmark numbers in machine-readable form
limitations.md               # known limitations / failure modes
release_checklist.md         # validation status before push
README.md                    # this file
assets/                      # figures + card image

Not included: base model weights (load Trendyol/Trendyol-LLM-8B-T1 separately), optimizer state, RNG state, training_args.bin, trainer_state.json. These are training-time artifacts and are not needed for inference.

License

Apache-2.0 for the adapter weights. The base model carries its own license; see Trendyol/Trendyol-LLM-8B-T1 for terms.

Citation

If you use this adapter, please cite the base model and the training datasets:

@misc{avaria-pygmalion-tr-2026,
  title  = {Avaria Pygmalion-TR: Turkish GSM8K-style QLoRA adapter for Trendyol-LLM-8B-T1},
  year   = {2026},
  note   = {QLoRA adapter, 4-bit NF4 base, rank 16, trained on bezir/gsm8k-tr}
}
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pancodurden/Avaria-Pygmalion-TR

Finetuned
Qwen/Qwen3-8B
Adapter
(1)
this model

Datasets used to train pancodurden/Avaria-Pygmalion-TR