File size: 4,215 Bytes
f9b06fc
 
 
58693a4
 
 
 
 
 
 
 
 
 
 
 
f9b06fc
 
58693a4
f9b06fc
 
 
58693a4
 
f9b06fc
58693a4
f9b06fc
 
58693a4
 
 
 
 
 
f9b06fc
 
 
 
 
 
58693a4
f9b06fc
58693a4
 
 
 
f9b06fc
 
 
 
58693a4
 
 
 
 
f9b06fc
 
 
 
 
 
58693a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f9b06fc
75c69b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f9b06fc
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
base_model: unsloth/Qwen2.5-0.5B-Instruct
library_name: peft
license: mit
datasets:
- HoangHa/pensez-grpo
language:
- en
pipeline_tag: text-generation
tags:
- math
- trl
- unsloth
- grpo
- transformers
---

# Model Card for Math-RL

## Model Details

This model is a fine-tuned version of Qwen2.5-0.5B-Instruct, optimized with Group Relative Policy Optimization (GRPO) on a curated math dataset of 700 problems.
The fine-tuning process aims to enhance the model’s step-by-step reasoning ability in mathematical problem solving, improving its performance on structured reasoning tasks.

### Model Description


- **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model:** Qwen2.5-0.5B-Instruct
- **Fine-tuning Method**: GRPO with LoRa
- **Domain**: Mathematics (problem-solving, reasoning)
- **Dataset Size**: ~700 examples


## Uses

### Direct Use

The model is intended for:

- Educational purposes: assisting students with math problems
- Research on small-scale RLHF-style fine-tuning (GRPO)
- Experiments in reasoning with small instruction-tuned models
- Serving as a lightweight math reasoning assistant in constrained environments


## Bias, Risks, and Limitations

- Small Dataset: Fine-tuned only on 700 math problems, so generalization is limited.
- Reasoning Errors: May produce incorrect or hallucinated answers. Always verify results.
- Not a Math Oracle: Should not be used in high-stakes scenarios (e.g., exams, grading, critical calculations).
- Limited Scope: Performance is strongest on problems similar to the fine-tuning dataset; outside domains may degrade.
- Language: While the base model supports multiple languages, math-specific fine-tuning was primarily English-based.


## How to Get Started with the Model

Use the code below to get started with the model.

```python
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

login(token="")

tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-0.5B-Instruct",)
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/Qwen2.5-0.5B-Instruct",
    device_map={"": 0}, token=""
)

model = PeftModel.from_pretrained(base_model,"khazarai/Math-RL")

question = """
Translate the graph of the function $y=\sin 2x$ along the $x$-axis to the left by $\dfrac{\pi }{6}$ units, and stretch the ordinate to twice its original length (the abscissa remains unchanged) to obtain the graph of the function $y=f(x)$. If the minimum value of the function $y=f(x)+a$ on the interval $\left[ 0,\dfrac{\pi }{2} \right]$ is $\sqrt{3}$, then $a=\boxed{\_\_\_\_\_}$.
"""

system = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""

messages = [
    {"role" : "system", "content" : system},
    {"role" : "user",   "content" : question}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
)

from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 2048,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)
```

**For pipeline:** 

```python
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-0.5B-Instruct")
base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "khazarai/Math-RL")

question = """
Translate the graph of the function $y=\sin 2x$ along the $x$-axis to the left by $\dfrac{\pi }{6}$ units, and stretch the ordinate to twice its original length (the abscissa remains unchanged) to obtain the graph of the function $y=f(x)$. If the minimum value of the function $y=f(x)+a$ on the interval $\left[ 0,\dfrac{\pi }{2} \right]$ is $\sqrt{3}$, then $a=\boxed{\_\_\_\_\_}$.
"""

system = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
messages = [
    {"role" : "system", "content" : system},
    {"role": "user", "content": question}
]
pipe(messages)
```

### Framework versions

- PEFT 0.15.2