How to use AlignmentResearch/pineapple-policy-annah_grpo with PEFT:
Task type is invalid.
How to fix it?