How to use HumorR1/policy-e2a-grpo-no-thinking with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("/home/ubuntu/code/humor-r1/checkpoints/qwen3vl-2b-sft-instruct-nothink-merged") model = PeftModel.from_pretrained(base_model, "HumorR1/policy-e2a-grpo-no-thinking")