File size: 4,215 Bytes
f9b06fc 58693a4 f9b06fc 58693a4 f9b06fc 58693a4 f9b06fc 58693a4 f9b06fc 58693a4 f9b06fc 58693a4 f9b06fc 58693a4 f9b06fc 58693a4 f9b06fc 58693a4 f9b06fc 75c69b5 f9b06fc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
---
base_model: unsloth/Qwen2.5-0.5B-Instruct
library_name: peft
license: mit
datasets:
- HoangHa/pensez-grpo
language:
- en
pipeline_tag: text-generation
tags:
- math
- trl
- unsloth
- grpo
- transformers
---
# Model Card for Math-RL
## Model Details
This model is a fine-tuned version of Qwen2.5-0.5B-Instruct, optimized with Group Relative Policy Optimization (GRPO) on a curated math dataset of 700 problems.
The fine-tuning process aims to enhance the model’s step-by-step reasoning ability in mathematical problem solving, improving its performance on structured reasoning tasks.
### Model Description
- **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model:** Qwen2.5-0.5B-Instruct
- **Fine-tuning Method**: GRPO with LoRa
- **Domain**: Mathematics (problem-solving, reasoning)
- **Dataset Size**: ~700 examples
## Uses
### Direct Use
The model is intended for:
- Educational purposes: assisting students with math problems
- Research on small-scale RLHF-style fine-tuning (GRPO)
- Experiments in reasoning with small instruction-tuned models
- Serving as a lightweight math reasoning assistant in constrained environments
## Bias, Risks, and Limitations
- Small Dataset: Fine-tuned only on 700 math problems, so generalization is limited.
- Reasoning Errors: May produce incorrect or hallucinated answers. Always verify results.
- Not a Math Oracle: Should not be used in high-stakes scenarios (e.g., exams, grading, critical calculations).
- Limited Scope: Performance is strongest on problems similar to the fine-tuning dataset; outside domains may degrade.
- Language: While the base model supports multiple languages, math-specific fine-tuning was primarily English-based.
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
login(token="")
tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-0.5B-Instruct",)
base_model = AutoModelForCausalLM.from_pretrained(
"unsloth/Qwen2.5-0.5B-Instruct",
device_map={"": 0}, token=""
)
model = PeftModel.from_pretrained(base_model,"khazarai/Math-RL")
question = """
Translate the graph of the function $y=\sin 2x$ along the $x$-axis to the left by $\dfrac{\pi }{6}$ units, and stretch the ordinate to twice its original length (the abscissa remains unchanged) to obtain the graph of the function $y=f(x)$. If the minimum value of the function $y=f(x)+a$ on the interval $\left[ 0,\dfrac{\pi }{2} \right]$ is $\sqrt{3}$, then $a=\boxed{\_\_\_\_\_}$.
"""
system = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""
messages = [
{"role" : "system", "content" : system},
{"role" : "user", "content" : question}
]
text = tokenizer.apply_chat_template(
messages,
tokenize = False,
)
from transformers import TextStreamer
_ = model.generate(
**tokenizer(text, return_tensors = "pt").to("cuda"),
max_new_tokens = 2048,
streamer = TextStreamer(tokenizer, skip_prompt = True),
)
```
**For pipeline:**
```python
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-0.5B-Instruct")
base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "khazarai/Math-RL")
question = """
Translate the graph of the function $y=\sin 2x$ along the $x$-axis to the left by $\dfrac{\pi }{6}$ units, and stretch the ordinate to twice its original length (the abscissa remains unchanged) to obtain the graph of the function $y=f(x)$. If the minimum value of the function $y=f(x)+a$ on the interval $\left[ 0,\dfrac{\pi }{2} \right]$ is $\sqrt{3}$, then $a=\boxed{\_\_\_\_\_}$.
"""
system = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
messages = [
{"role" : "system", "content" : system},
{"role": "user", "content": question}
]
pipe(messages)
```
### Framework versions
- PEFT 0.15.2 |