eth-nlped
/

TutorRL-7B-think

Text Generation

text-generation-inference

Model card Files Files and versions

rd211 commited on May 27, 2025

Commit

6162199

·

verified ·

1 Parent(s): 657f56e

Update README.md

Files changed (1) hide show

README.md +71 -3

README.md CHANGED Viewed

@@ -1,3 +1,71 @@
----
-license: cc-by-sa-4.0
----

+---
+library_name: transformers
+license: apache-2.0
+license_link: https://github.com/eth-lre/PedagogicalRL/blob/main/LICENSE
+pipeline_tag: text-generation
+base_model:
+- Qwen/Qwen2.5-7B-Instruct
+tags:
+- math-tutor
+- grpo
+---
+# TutorRL-7B-think
+## Overview
+**TutorRL-7B-think** is a fine-tuned variant of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), trained to act as a math **tutor** rather than a solver. It is aligned to pedagogical principles using **reinforcement learning (GRPO)** in a synthetic multi-turn classroom setting, without requiring any human-labeled data.
+This model was developed as part of the research project [*From Problem-Solving to Teaching Problem-Solving*](https://arxiv.org/abs/2505.15607), which proposes a scalable, annotation-free approach to training LLMs as **educational tutors**. Instead of directly answering questions, the model is optimized to scaffold reasoning, guide through Socratic questioning, and withhold final solutions when beneficial for learning.
+Repository: [https://github.com/eth-lre/PedagogicalRL](https://github.com/eth-lre/PedagogicalRL)
+## Intended Use
+This model is intended for use in:
+* Interactive math tutoring
+* Socratic dialogue generation
+* Research on educational alignment of LLMs
+* Safe and indirect teaching in problem-solving contexts
+## Thinking
+This model variant allows for hidden thinking.
+The thinking content is enclosed in tags: `<think> ... </think>`.
+## Example Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_id = "eth-nlped/TutorRL-7B-think"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
+messages = [
+    {"role": "user", "content": "Can you help me solve 3x + 5 = 20?"}
+]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False)
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=512)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Citation
+If you use this model or build upon the training framework, please cite:
+```
+@misc{dinucujianu2025problemsolvingteachingproblemsolvingaligning,
+  title={From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning},
+  author={David Dinucu-Jianu and Jakub Macina and Nico Daheim and Ido Hakimi and Iryna Gurevych and Mrinmaya Sachan},
+  year={2025},
+  eprint={2505.15607},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL},
+  url={https://arxiv.org/abs/2505.15607}
+}
+```