monkey
Collection
3 items β’ Updated
How to use ada-flo/monkey-grpo-arith_op with PEFT:
Task type is invalid.
GRPO outcome-RL LoRA adapters (one per rule Γ init condition), trained on top of a CPT initialization.
From the project Tell or Show: How Training-Data Format Shapes Implicit vs. Explicit Rule Knowledge.
Adapters are organized as <base-model>/<rule>__from_<init>/:
.
βββ qwen3-4b-instruct-2507/ # base = Qwen/Qwen3-4B-Instruct-2507
βββ max_plus_diff__from_fewshot/
βββ max_plus_diff__from_explicit/
βββ digit_product__from_fewshot/
βββ ... (11 rules Γ 2 inits = up to 22)
Each leaf subdir is a self-contained PEFT-loadable adapter:
adapter_config.jsonadapter_model.safetensorsREADME.md (per-variant details)trainer_state.json (training-time metrics)Future base models (Qwen3-7B etc.) will appear as sibling base-model dirs.
from peft import PeftModel
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Instruct-2507", torch_dtype="bfloat16")
model = PeftModel.from_pretrained(base, "ada-flo/monkey-grpo-arith_op", subfolder="qwen3-4b-instruct-2507/max_plus_diff__from_fewshot")