RN_TR_R2 / README.md

Update README.md

b67cb9b verified 9 months ago

6 kB

	---
	license: apache-2.0
	datasets:
	- RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset
	language:
	- tr
	- en
	base_model:
	- ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1
	library_name: transformers
	tags:
	- unsloth
	- llama
	- trl
	- reasoning
	- billingual
	- stem
	---


	## Model Overview

	RefinedNeuro/RN_TR_R2 is a Turkish‐language reasoning model fine-tuned from
	`ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1` using the GRPO method on distilled Q&A data of Qwen3 model. It excels at open-ended reasoning tasks in Turkish, particularly STEM, history, and cultural questions.

	* 🌐 Available on:: [https://ollama.com/RefinedNeuro/RN\_TR\_R2](https://ollama.com/RefinedNeuro/RN_TR_R2)

	---

	## Evaluation

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/680e40dae19765db2aeb9d80/hyOZSykSa609rDDzPpml2.png)

	Benchmark: [RN\_TR\_R2\_Benchmark\_Results](https://huggingface.co/datasets/RefinedNeuro/RN_TR_R2_Benchmark_Results)
	(Open-ended Turkish culture & reasoning questions)

	\| Model Name \| Score (%) \| Description \|
	\| --------------------------------------- \| --------- \| ----------------------------------------------------------- \|
	\| RefinedNeuro/RN_TR_R2 \| 82.4 \| Upgraded from RN_TR_R1 via GRPO on distilled Qwen3 data \|
	\| Qwen3-8B \| 76.8 \| Off-the-shelf Qwen3-8B \|
	\| RefinedNeuro/RN\_TR\_R1 \| 64.8 \| Baseline Turkish-fine-tuned model \|
	\| ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 \| 63.2 \| Turkish-adapted Llama-8B via direct preference optimization \|
	\| Meta-Llama-3.1-8B-Instruct \| 63.2 \| Meta’s Llama-3.1-8B with instruct tuning \|

	RN_TR_R2 achieves a 17.6 pp improvement over the RN_TR_R1 baseline.

	---

	## Intended Use

	* Primary use cases: Question answering, step-by-step reasoning, educational tools in Turkish (math, physics, chemistry, biology, geometry, trigonometry, statistics, history, culture).
	* Not recommended: Generating creative fiction or tasks requiring memorized facts outside its training scope.
	* Known limitations: While no systematic failure modes are identified, RN_TR_R2 may hallucinate like all LLMs.

	---

	## Training Data

	* Base model: `ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1`
	* Fine-tuning dataset: [RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset](https://huggingface.co/datasets/RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset)

	* Domains: Mathematics, Physics, Chemistry, Biology, Geometry, Trigonometry, Statistics
	* Grade levels: 6th–12th grade
	* Prompt variations (e.g., long/detailed vs. short/easy; answer must be a single number)
	* Size: 13 000 examples (\~9 million tokens)
	* License: Creative Commons

	---

	## Training Configuration

	* Method: GRPO (Generalized Reinforcement-Preference Optimization) via `unsloth`
	* Hardware: 2× RTX 3090 GPUs for 1 week (\~336 GPU-hours)
	* Batch size: 1
	* Context length: 4 096 tokens (max\_seq\_length=8 192 at inference)
	* Optimizer & scheduler: Cosine LR schedule, peak LR=5×10⁻⁶
	* Library: `unsloth==2025.3.18`

	---

	## Usage Example

	```bash
	pip install unsloth==2025.3.18
	```

	```python
	from unsloth import FastLanguageModel
	from transformers import TextStreamer

	# Load the model and tokenizer
	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name="RefinedNeuro/RN_TR_R2",
	max_seq_length=8192,
	load_in_4bit=True,
	)

	FastLanguageModel.for_inference(model)

	SYSTEM_PROMPT = """
	Sana sorulan sorulara cevap verirken, adım adım düşün ve gerekçelendir.
	Önce <think></think> tagları arasında adım adım düşün.
	Sonra düşünme adımlarını özetle.
	En son \\boxed{} içerisinde sorunun cevabını ver.
	"""

	def answer_question(question: str):
	messages = [
	{'role': 'system', 'content': SYSTEM_PROMPT},
	{'role': 'user', 'content': question},
	]
	inputs = tokenizer.apply_chat_template(
	messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
	).to("cuda")

	streamer = TextStreamer(tokenizer)
	print("-" * 60)
	_ = model.generate(
	input_ids=inputs,
	streamer=streamer,
	max_new_tokens=2048,
	temperature=0.6,
	top_p=0.1,
	)

	# Examples
	answer_question("Türkiye'nin başkenti neredir?")
	answer_question("6 ve 2'nin çarpımı nedir?")
	```

	---

	## Sample Q\&A

	```
	> Q: f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun x = 1 noktasındaki türevini bulunuz.
	>
	> A: <think>

	1. Adım: f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun türevini bulmak için, önce f(x)'nin türevini bulmamız gerekmektedir.

	2. Adım: Türevi bulmak için fonksiyonu parçalara ayırarak analiz edeceğiz.

	f(x) = 3x⁴ - 2x³ + 5x - 7

	3. Adım: Türevi bulmak için her terimi ayrı ayrı türevini buluyoruz.

	- 3x⁴: Türevi 12x³'dür.
	- -2x³: Türevi -6x²'dir.
	- 5x: Türevi 5'dir.
	- -7: Türevi 0'dır (çünkü sabit bir sayıdır).

	4. Adım: Türevi bulmak için elde edilen türevleri birleştiriyoruz.

	f'(x) = 12x³ - 6x² + 5

	5. Adım: Şimdi x = 1 yerine koyarak f'(x)'nin türevini hesaplayacağız.

	f'(1) = 12(1)³ - 6(1)² + 5

	f'(1) = 12(1) - 6(1) + 5

	f'(1) = 12 - 6 + 5

	f'(1) = 11

	6. Adım: x = 1 noktasındaki türevi 11 olarak bulduk.

	\boxed{11}

	</think>

	\boxed{11}

	```

	---

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{farsak2025rn_tr_r2,
	title = {RefinedNeuro/RN\_TR\_R2: A Turkish Reasoning Model},
	author = {Mehmet Can Farsak and Serhat Atayeter and RefinedNeuro},
	year = {2025},
	howpublished = {\url{https://huggingface.co/RefinedNeuro/RN_TR_R2}},
	license = {Apache-2.0}
	}
	```

	---

	## License

	Apache 2.0 — the most permissive open-source license.

	---

	## Authors

	* Authors: Mehmet Can Farsak, Serhat Atayeter, RefinedNeuro