|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset |
|
|
language: |
|
|
- tr |
|
|
- en |
|
|
base_model: |
|
|
- ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 |
|
|
library_name: transformers |
|
|
tags: |
|
|
- unsloth |
|
|
- llama |
|
|
- trl |
|
|
- reasoning |
|
|
- billingual |
|
|
- stem |
|
|
--- |
|
|
|
|
|
|
|
|
## Model Overview |
|
|
|
|
|
**RefinedNeuro/RN_TR_R2** is a Turkish‐language reasoning model fine-tuned from |
|
|
`ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1` using the GRPO method on distilled Q&A data of Qwen3 model. It excels at open-ended reasoning tasks in Turkish, particularly STEM, history, and cultural questions. |
|
|
|
|
|
* **🌐 Available on:**: [https://ollama.com/RefinedNeuro/RN\_TR\_R2](https://ollama.com/RefinedNeuro/RN_TR_R2) |
|
|
|
|
|
--- |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
 |
|
|
|
|
|
Benchmark: [RN\_TR\_R2\_Benchmark\_Results](https://huggingface.co/datasets/RefinedNeuro/RN_TR_R2_Benchmark_Results) |
|
|
(Open-ended Turkish culture & reasoning questions) |
|
|
|
|
|
| Model Name | Score (%) | Description | |
|
|
| --------------------------------------- | --------- | ----------------------------------------------------------- | |
|
|
| **RefinedNeuro/RN_TR_R2** | **82.4** | Upgraded from RN_TR_R1 via GRPO on distilled Qwen3 data | |
|
|
| Qwen3-8B | 76.8 | Off-the-shelf Qwen3-8B | |
|
|
| RefinedNeuro/RN\_TR\_R1 | 64.8 | Baseline Turkish-fine-tuned model | |
|
|
| ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 | 63.2 | Turkish-adapted Llama-8B via direct preference optimization | |
|
|
| Meta-Llama-3.1-8B-Instruct | 63.2 | Meta’s Llama-3.1-8B with instruct tuning | |
|
|
|
|
|
RN_TR_R2 achieves a **17.6 pp** improvement over the RN_TR_R1 baseline. |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
* **Primary use cases**: Question answering, step-by-step reasoning, educational tools in Turkish (math, physics, chemistry, biology, geometry, trigonometry, statistics, history, culture). |
|
|
* **Not recommended**: Generating creative fiction or tasks requiring memorized facts outside its training scope. |
|
|
* **Known limitations**: While no systematic failure modes are identified, RN_TR_R2 may hallucinate like all LLMs. |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Data |
|
|
|
|
|
* **Base model**: `ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1` |
|
|
* **Fine-tuning dataset**: [RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset](https://huggingface.co/datasets/RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset) |
|
|
|
|
|
* **Domains**: Mathematics, Physics, Chemistry, Biology, Geometry, Trigonometry, Statistics |
|
|
* **Grade levels**: 6th–12th grade |
|
|
* **Prompt variations** (e.g., long/detailed vs. short/easy; answer must be a single number) |
|
|
* **Size**: 13 000 examples (\~9 million tokens) |
|
|
* **License**: Creative Commons |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Configuration |
|
|
|
|
|
* **Method**: GRPO (Generalized Reinforcement-Preference Optimization) via `unsloth` |
|
|
* **Hardware**: 2× RTX 3090 GPUs for 1 week (\~336 GPU-hours) |
|
|
* **Batch size**: 1 |
|
|
* **Context length**: 4 096 tokens (max\_seq\_length=8 192 at inference) |
|
|
* **Optimizer & scheduler**: Cosine LR schedule, peak LR=5×10⁻⁶ |
|
|
* **Library**: `unsloth==2025.3.18` |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage Example |
|
|
|
|
|
```bash |
|
|
pip install unsloth==2025.3.18 |
|
|
``` |
|
|
|
|
|
```python |
|
|
from unsloth import FastLanguageModel |
|
|
from transformers import TextStreamer |
|
|
|
|
|
# Load the model and tokenizer |
|
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
|
model_name="RefinedNeuro/RN_TR_R2", |
|
|
max_seq_length=8192, |
|
|
load_in_4bit=True, |
|
|
) |
|
|
|
|
|
FastLanguageModel.for_inference(model) |
|
|
|
|
|
SYSTEM_PROMPT = """ |
|
|
Sana sorulan sorulara cevap verirken, adım adım düşün ve gerekçelendir. |
|
|
Önce <think></think> tagları arasında adım adım düşün. |
|
|
Sonra düşünme adımlarını özetle. |
|
|
En son \\boxed{} içerisinde sorunun cevabını ver. |
|
|
""" |
|
|
|
|
|
def answer_question(question: str): |
|
|
messages = [ |
|
|
{'role': 'system', 'content': SYSTEM_PROMPT}, |
|
|
{'role': 'user', 'content': question}, |
|
|
] |
|
|
inputs = tokenizer.apply_chat_template( |
|
|
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" |
|
|
).to("cuda") |
|
|
|
|
|
streamer = TextStreamer(tokenizer) |
|
|
print("-" * 60) |
|
|
_ = model.generate( |
|
|
input_ids=inputs, |
|
|
streamer=streamer, |
|
|
max_new_tokens=2048, |
|
|
temperature=0.6, |
|
|
top_p=0.1, |
|
|
) |
|
|
|
|
|
# Examples |
|
|
answer_question("Türkiye'nin başkenti neredir?") |
|
|
answer_question("6 ve 2'nin çarpımı nedir?") |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Sample Q\&A |
|
|
|
|
|
``` |
|
|
> **Q:** f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun x = 1 noktasındaki türevini bulunuz. |
|
|
> |
|
|
> **A:** <think> |
|
|
|
|
|
1. Adım: f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun türevini bulmak için, önce f(x)'nin türevini bulmamız gerekmektedir. |
|
|
|
|
|
2. Adım: Türevi bulmak için fonksiyonu parçalara ayırarak analiz edeceğiz. |
|
|
|
|
|
f(x) = 3x⁴ - 2x³ + 5x - 7 |
|
|
|
|
|
3. Adım: Türevi bulmak için her terimi ayrı ayrı türevini buluyoruz. |
|
|
|
|
|
- 3x⁴: Türevi 12x³'dür. |
|
|
- -2x³: Türevi -6x²'dir. |
|
|
- 5x: Türevi 5'dir. |
|
|
- -7: Türevi 0'dır (çünkü sabit bir sayıdır). |
|
|
|
|
|
4. Adım: Türevi bulmak için elde edilen türevleri birleştiriyoruz. |
|
|
|
|
|
f'(x) = 12x³ - 6x² + 5 |
|
|
|
|
|
5. Adım: Şimdi x = 1 yerine koyarak f'(x)'nin türevini hesaplayacağız. |
|
|
|
|
|
f'(1) = 12(1)³ - 6(1)² + 5 |
|
|
|
|
|
f'(1) = 12(1) - 6(1) + 5 |
|
|
|
|
|
f'(1) = 12 - 6 + 5 |
|
|
|
|
|
f'(1) = 11 |
|
|
|
|
|
6. Adım: x = 1 noktasındaki türevi 11 olarak bulduk. |
|
|
|
|
|
\boxed{11} |
|
|
|
|
|
</think> |
|
|
|
|
|
\boxed{11} |
|
|
|
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{farsak2025rn_tr_r2, |
|
|
title = {RefinedNeuro/RN\_TR\_R2: A Turkish Reasoning Model}, |
|
|
author = {Mehmet Can Farsak and Serhat Atayeter and RefinedNeuro}, |
|
|
year = {2025}, |
|
|
howpublished = {\url{https://huggingface.co/RefinedNeuro/RN_TR_R2}}, |
|
|
license = {Apache-2.0} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 — the most permissive open-source license. |
|
|
|
|
|
--- |
|
|
|
|
|
## Authors |
|
|
|
|
|
* **Authors**: Mehmet Can Farsak, Serhat Atayeter, RefinedNeuro |