Instructions to use eth-nlped/TutorRL-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use eth-nlped/TutorRL-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="eth-nlped/TutorRL-7B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("eth-nlped/TutorRL-7B") model = AutoModelForCausalLM.from_pretrained("eth-nlped/TutorRL-7B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use eth-nlped/TutorRL-7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "eth-nlped/TutorRL-7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "eth-nlped/TutorRL-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/eth-nlped/TutorRL-7B
- SGLang
How to use eth-nlped/TutorRL-7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "eth-nlped/TutorRL-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "eth-nlped/TutorRL-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "eth-nlped/TutorRL-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "eth-nlped/TutorRL-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use eth-nlped/TutorRL-7B with Docker Model Runner:
docker model run hf.co/eth-nlped/TutorRL-7B
TutorRL-7B
Overview
TutorRL-7B is a fine-tuned variant of Qwen/Qwen2.5-7B-Instruct, trained to act as a math tutor rather than a solver. It is aligned to pedagogical principles using reinforcement learning (GRPO) in a synthetic multi-turn classroom setting, without requiring any human-labeled data.
This model was developed as part of the research project From Problem-Solving to Teaching Problem-Solving, which proposes a scalable, annotation-free approach to training LLMs as educational tutors. Instead of directly answering questions, the model is optimized to scaffold reasoning, guide through Socratic questioning, and withhold final solutions when beneficial for learning.
Repository: https://github.com/eth-lre/PedagogicalRL
Intended Use
This model is intended for use in:
- Interactive math tutoring
- Socratic dialogue generation
- Research on educational alignment of LLMs
- Safe and indirect teaching in problem-solving contexts
Example Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "eth-nlped/TutorRL-7B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
messages = [
{"role": "user", "content": "Can you help me solve 3x + 5 = 20?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Note: This model does not generate
<think>blocks. If you want planning-based reasoning, refer to this model variant: TutorRL-7B-think
Citation
If you use this model or build upon the training framework, please cite:
@misc{dinucujianu2025problemsolvingteachingproblemsolvingaligning,
title={From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning},
author={David Dinucu-Jianu and Jakub Macina and Nico Daheim and Ido Hakimi and Iryna Gurevych and Mrinmaya Sachan},
year={2025},
eprint={2505.15607},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.15607}
}
- Downloads last month
- 1,071
docker model run hf.co/eth-nlped/TutorRL-7B