File size: 5,996 Bytes
a3b793d 53dfe64 a3b793d 53dfe64 a3b793d 53dfe64 a3b793d 53dfe64 a3b793d 53dfe64 a3b793d 53dfe64 a3b793d 53dfe64 a3b793d 53dfe64 a3b793d 53dfe64 a3b793d b67cb9b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 | ---
license: apache-2.0
datasets:
- RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset
language:
- tr
- en
base_model:
- ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1
library_name: transformers
tags:
- unsloth
- llama
- trl
- reasoning
- billingual
- stem
---
## Model Overview
**RefinedNeuro/RN_TR_R2** is a Turkish‐language reasoning model fine-tuned from
`ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1` using the GRPO method on distilled Q&A data of Qwen3 model. It excels at open-ended reasoning tasks in Turkish, particularly STEM, history, and cultural questions.
* **🌐 Available on:**: [https://ollama.com/RefinedNeuro/RN\_TR\_R2](https://ollama.com/RefinedNeuro/RN_TR_R2)
---
## Evaluation

Benchmark: [RN\_TR\_R2\_Benchmark\_Results](https://huggingface.co/datasets/RefinedNeuro/RN_TR_R2_Benchmark_Results)
(Open-ended Turkish culture & reasoning questions)
| Model Name | Score (%) | Description |
| --------------------------------------- | --------- | ----------------------------------------------------------- |
| **RefinedNeuro/RN_TR_R2** | **82.4** | Upgraded from RN_TR_R1 via GRPO on distilled Qwen3 data |
| Qwen3-8B | 76.8 | Off-the-shelf Qwen3-8B |
| RefinedNeuro/RN\_TR\_R1 | 64.8 | Baseline Turkish-fine-tuned model |
| ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 | 63.2 | Turkish-adapted Llama-8B via direct preference optimization |
| Meta-Llama-3.1-8B-Instruct | 63.2 | Meta’s Llama-3.1-8B with instruct tuning |
RN_TR_R2 achieves a **17.6 pp** improvement over the RN_TR_R1 baseline.
---
## Intended Use
* **Primary use cases**: Question answering, step-by-step reasoning, educational tools in Turkish (math, physics, chemistry, biology, geometry, trigonometry, statistics, history, culture).
* **Not recommended**: Generating creative fiction or tasks requiring memorized facts outside its training scope.
* **Known limitations**: While no systematic failure modes are identified, RN_TR_R2 may hallucinate like all LLMs.
---
## Training Data
* **Base model**: `ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1`
* **Fine-tuning dataset**: [RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset](https://huggingface.co/datasets/RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset)
* **Domains**: Mathematics, Physics, Chemistry, Biology, Geometry, Trigonometry, Statistics
* **Grade levels**: 6th–12th grade
* **Prompt variations** (e.g., long/detailed vs. short/easy; answer must be a single number)
* **Size**: 13 000 examples (\~9 million tokens)
* **License**: Creative Commons
---
## Training Configuration
* **Method**: GRPO (Generalized Reinforcement-Preference Optimization) via `unsloth`
* **Hardware**: 2× RTX 3090 GPUs for 1 week (\~336 GPU-hours)
* **Batch size**: 1
* **Context length**: 4 096 tokens (max\_seq\_length=8 192 at inference)
* **Optimizer & scheduler**: Cosine LR schedule, peak LR=5×10⁻⁶
* **Library**: `unsloth==2025.3.18`
---
## Usage Example
```bash
pip install unsloth==2025.3.18
```
```python
from unsloth import FastLanguageModel
from transformers import TextStreamer
# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="RefinedNeuro/RN_TR_R2",
max_seq_length=8192,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
SYSTEM_PROMPT = """
Sana sorulan sorulara cevap verirken, adım adım düşün ve gerekçelendir.
Önce <think></think> tagları arasında adım adım düşün.
Sonra düşünme adımlarını özetle.
En son \\boxed{} içerisinde sorunun cevabını ver.
"""
def answer_question(question: str):
messages = [
{'role': 'system', 'content': SYSTEM_PROMPT},
{'role': 'user', 'content': question},
]
inputs = tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to("cuda")
streamer = TextStreamer(tokenizer)
print("-" * 60)
_ = model.generate(
input_ids=inputs,
streamer=streamer,
max_new_tokens=2048,
temperature=0.6,
top_p=0.1,
)
# Examples
answer_question("Türkiye'nin başkenti neredir?")
answer_question("6 ve 2'nin çarpımı nedir?")
```
---
## Sample Q\&A
```
> **Q:** f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun x = 1 noktasındaki türevini bulunuz.
>
> **A:** <think>
1. Adım: f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun türevini bulmak için, önce f(x)'nin türevini bulmamız gerekmektedir.
2. Adım: Türevi bulmak için fonksiyonu parçalara ayırarak analiz edeceğiz.
f(x) = 3x⁴ - 2x³ + 5x - 7
3. Adım: Türevi bulmak için her terimi ayrı ayrı türevini buluyoruz.
- 3x⁴: Türevi 12x³'dür.
- -2x³: Türevi -6x²'dir.
- 5x: Türevi 5'dir.
- -7: Türevi 0'dır (çünkü sabit bir sayıdır).
4. Adım: Türevi bulmak için elde edilen türevleri birleştiriyoruz.
f'(x) = 12x³ - 6x² + 5
5. Adım: Şimdi x = 1 yerine koyarak f'(x)'nin türevini hesaplayacağız.
f'(1) = 12(1)³ - 6(1)² + 5
f'(1) = 12(1) - 6(1) + 5
f'(1) = 12 - 6 + 5
f'(1) = 11
6. Adım: x = 1 noktasındaki türevi 11 olarak bulduk.
\boxed{11}
</think>
\boxed{11}
```
---
## Citation
If you use this model, please cite:
```bibtex
@misc{farsak2025rn_tr_r2,
title = {RefinedNeuro/RN\_TR\_R2: A Turkish Reasoning Model},
author = {Mehmet Can Farsak and Serhat Atayeter and RefinedNeuro},
year = {2025},
howpublished = {\url{https://huggingface.co/RefinedNeuro/RN_TR_R2}},
license = {Apache-2.0}
}
```
---
## License
Apache 2.0 — the most permissive open-source license.
---
## Authors
* **Authors**: Mehmet Can Farsak, Serhat Atayeter, RefinedNeuro |