rd211 commited on
Commit
6162199
·
verified ·
1 Parent(s): 657f56e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -3
README.md CHANGED
@@ -1,3 +1,71 @@
1
- ---
2
- license: cc-by-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ license_link: https://github.com/eth-lre/PedagogicalRL/blob/main/LICENSE
5
+ pipeline_tag: text-generation
6
+ base_model:
7
+ - Qwen/Qwen2.5-7B-Instruct
8
+ tags:
9
+ - math-tutor
10
+ - grpo
11
+ ---
12
+
13
+ # TutorRL-7B-think
14
+
15
+ ## Overview
16
+
17
+ **TutorRL-7B-think** is a fine-tuned variant of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), trained to act as a math **tutor** rather than a solver. It is aligned to pedagogical principles using **reinforcement learning (GRPO)** in a synthetic multi-turn classroom setting, without requiring any human-labeled data.
18
+
19
+ This model was developed as part of the research project [*From Problem-Solving to Teaching Problem-Solving*](https://arxiv.org/abs/2505.15607), which proposes a scalable, annotation-free approach to training LLMs as **educational tutors**. Instead of directly answering questions, the model is optimized to scaffold reasoning, guide through Socratic questioning, and withhold final solutions when beneficial for learning.
20
+
21
+ Repository: [https://github.com/eth-lre/PedagogicalRL](https://github.com/eth-lre/PedagogicalRL)
22
+
23
+ ## Intended Use
24
+
25
+ This model is intended for use in:
26
+
27
+ * Interactive math tutoring
28
+ * Socratic dialogue generation
29
+ * Research on educational alignment of LLMs
30
+ * Safe and indirect teaching in problem-solving contexts
31
+
32
+ ## Thinking
33
+ This model variant allows for hidden thinking.
34
+ The thinking content is enclosed in tags: `<think> ... </think>`.
35
+
36
+ ## Example Usage
37
+
38
+ ```python
39
+ from transformers import AutoTokenizer, AutoModelForCausalLM
40
+
41
+ model_id = "eth-nlped/TutorRL-7B-think"
42
+
43
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
44
+ model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
45
+
46
+ messages = [
47
+ {"role": "user", "content": "Can you help me solve 3x + 5 = 20?"}
48
+ ]
49
+
50
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False)
51
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
52
+
53
+ outputs = model.generate(**inputs, max_new_tokens=512)
54
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
55
+ ```
56
+
57
+ ## Citation
58
+
59
+ If you use this model or build upon the training framework, please cite:
60
+
61
+ ```
62
+ @misc{dinucujianu2025problemsolvingteachingproblemsolvingaligning,
63
+ title={From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning},
64
+ author={David Dinucu-Jianu and Jakub Macina and Nico Daheim and Ido Hakimi and Iryna Gurevych and Mrinmaya Sachan},
65
+ year={2025},
66
+ eprint={2505.15607},
67
+ archivePrefix={arXiv},
68
+ primaryClass={cs.CL},
69
+ url={https://arxiv.org/abs/2505.15607}
70
+ }
71
+ ```