--- base_model: Qwen/Qwen3-4B library_name: peft pipeline_tag: text-generation tags: - base_model:adapter:Qwen/Qwen3-4B - sft - grpo - lora - transformers - trl --- # Combined SFT + GRPO LoRA Adapter for Qwen3-4B This adapter combines two LoRA training stages into a single adapter: 1. **SFT** (Supervised Fine-Tuning) on Qwen/Qwen3-4B 2. **GRPO** (Group Relative Policy Optimization) on the SFT model The two rank-32 adapters were merged into a single **rank-64** adapter (lossless). Apply directly to `Qwen/Qwen3-4B` — no intermediate merged model needed. ## Usage ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B") model = PeftModel.from_pretrained(base_model, "abdul-hannan/qwen3-math-grpo") tokenizer = AutoTokenizer.from_pretrained("abdul-hannan/qwen3-math-grpo") ``` ## Training Details - **Base model:** Qwen/Qwen3-4B - **LoRA rank:** 64 (combined from two rank-32 adapters) - **LoRA alpha:** 128 - **Target modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj - **PEFT version:** 0.18.1 ### Contact Syed Abdul Hannan ### Framework versions - PEFT 0.18.1