How to use HumorR1/policy-e2b-grpo-thinking with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-VL-2B-Thinking") model = PeftModel.from_pretrained(base_model, "HumorR1/policy-e2b-grpo-thinking")
How to fix it?