monkey-grpo-arith_op

GRPO outcome-RL LoRA adapters (one per rule Γ— init condition), trained on top of a CPT initialization.

From the project Tell or Show: How Training-Data Format Shapes Implicit vs. Explicit Rule Knowledge.

Layout

Adapters are organized as <base-model>/<rule>__from_<init>/:

.
└── qwen3-4b-instruct-2507/       # base = Qwen/Qwen3-4B-Instruct-2507
    β”œβ”€β”€ max_plus_diff__from_fewshot/
    β”œβ”€β”€ max_plus_diff__from_explicit/
    β”œβ”€β”€ digit_product__from_fewshot/
    └── ... (11 rules Γ— 2 inits = up to 22)

Each leaf subdir is a self-contained PEFT-loadable adapter:

  • adapter_config.json
  • adapter_model.safetensors
  • README.md (per-variant details)
  • trainer_state.json (training-time metrics)

Future base models (Qwen3-7B etc.) will appear as sibling base-model dirs.

Loading

from peft import PeftModel
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Instruct-2507", torch_dtype="bfloat16")
model = PeftModel.from_pretrained(base, "ada-flo/monkey-grpo-arith_op", subfolder="qwen3-4b-instruct-2507/max_plus_diff__from_fewshot")
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including ada-flo/monkey-grpo-arith_op