Instructions to use MohammadRafiML/Qwen3-4B-Instruct-2507-Capstone-MathRL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use MohammadRafiML/Qwen3-4B-Instruct-2507-Capstone-MathRL with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
| base_model: Qwen/Qwen3-4B-Instruct-2507 | |
| library_name: peft | |
| tags: | |
| - lora | |
| - sft | |
| - grpo | |
| - reinforcement-learning | |
| - math | |
| - tool-use | |
| # Qwen3-4B-Instruct-2507 β Capstone MathRL | |
| Fine-tuned from [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) using a two-stage SFT β GRPO pipeline for mathematical reasoning with calculator tool use. | |
| **Author:** Mohammad Rafi | |
| --- | |
| ## Base Model | |
| - **Model:** `Qwen/Qwen3-4B-Instruct-2507` | |
| - **Parameters:** 4B | |
| - **Context length:** 32k tokens | |
| --- | |
| ## SFT Adapter β `sft_adapter/` | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Method | LoRA (Supervised Fine-Tuning) | | |
| | LoRA rank | 32 | | |
| | Epochs | 2 | | |
| | Training samples | 500 | | |
| | Task | Math reasoning (GSM8K + NuminaMath) | | |
| | Size | 270.92 MB | | |
| --- | |
| ## GRPO Adapter β `grpo_adapter/` | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Method | GRPO (Group Relative Policy Optimization) | | |
| | Training samples | 400 | | |
| | Group size | 8 | | |
| | Learning rate | 3e-6 | | |
| | Substeps | 1 | | |
| | Curriculum | easy β intermediate β hard | | |
| | Size | 270.92 MB | | |
| > **Recommended:** Use `grpo_adapter/` β trained through the full SFT + GRPO pipeline. | |
| --- | |
| ## Usage | |
| ```python | |
| from peft import PeftModel | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Instruct-2507") | |
| tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507") | |
| # Load GRPO adapter (recommended) | |
| model = PeftModel.from_pretrained(base, "MohammadRafiML/Qwen3-4B-Instruct-2507-Capstone-MathRL", subfolder="grpo_adapter") | |
| model = model.merge_and_unload() | |
| # Load SFT adapter only | |
| # model = PeftModel.from_pretrained(base, "MohammadRafiML/Qwen3-4B-Instruct-2507-Capstone-MathRL", subfolder="sft_adapter") | |
| # model = model.merge_and_unload() | |
| ``` | |