Instructions to use MohammadRafiML/Qwen3-4B-Instruct-2507-Capstone-MathRL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use MohammadRafiML/Qwen3-4B-Instruct-2507-Capstone-MathRL with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
File size: 1,819 Bytes
d33f2b6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | ---
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: peft
tags:
- lora
- sft
- grpo
- reinforcement-learning
- math
- tool-use
---
# Qwen3-4B-Instruct-2507 — Capstone MathRL
Fine-tuned from [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) using a two-stage SFT → GRPO pipeline for mathematical reasoning with calculator tool use.
**Author:** Mohammad Rafi
---
## Base Model
- **Model:** `Qwen/Qwen3-4B-Instruct-2507`
- **Parameters:** 4B
- **Context length:** 32k tokens
---
## SFT Adapter — `sft_adapter/`
| Parameter | Value |
|-----------|-------|
| Method | LoRA (Supervised Fine-Tuning) |
| LoRA rank | 32 |
| Epochs | 2 |
| Training samples | 500 |
| Task | Math reasoning (GSM8K + NuminaMath) |
| Size | 270.92 MB |
---
## GRPO Adapter — `grpo_adapter/`
| Parameter | Value |
|-----------|-------|
| Method | GRPO (Group Relative Policy Optimization) |
| Training samples | 400 |
| Group size | 8 |
| Learning rate | 3e-6 |
| Substeps | 1 |
| Curriculum | easy → intermediate → hard |
| Size | 270.92 MB |
> **Recommended:** Use `grpo_adapter/` — trained through the full SFT + GRPO pipeline.
---
## Usage
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
# Load GRPO adapter (recommended)
model = PeftModel.from_pretrained(base, "MohammadRafiML/Qwen3-4B-Instruct-2507-Capstone-MathRL", subfolder="grpo_adapter")
model = model.merge_and_unload()
# Load SFT adapter only
# model = PeftModel.from_pretrained(base, "MohammadRafiML/Qwen3-4B-Instruct-2507-Capstone-MathRL", subfolder="sft_adapter")
# model = model.merge_and_unload()
```
|