monkey-grpo-arith_op

GRPO outcome-RL LoRA adapters (one per rule × init condition), trained on top of a CPT initialization.

From the project Tell or Show: How Training-Data Format Shapes Implicit vs. Explicit Rule Knowledge.

Layout

Adapters are organized as <base-model>/<rule>__from_<init>/:

.
└── qwen3-4b-instruct-2507/       # base = Qwen/Qwen3-4B-Instruct-2507
    ├── max_plus_diff__from_fewshot/
    ├── max_plus_diff__from_explicit/
    ├── digit_product__from_fewshot/
    └── ... (11 rules × 2 inits = up to 22)

Each leaf subdir is a self-contained PEFT-loadable adapter:

adapter_config.json
adapter_model.safetensors
README.md (per-variant details)
trainer_state.json (training-time metrics)

Future base models (Qwen3-7B etc.) will appear as sibling base-model dirs.

Loading

from peft import PeftModel
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Instruct-2507", torch_dtype="bfloat16")
model = PeftModel.from_pretrained(base, "ada-flo/monkey-grpo-arith_op", subfolder="qwen3-4b-instruct-2507/max_plus_diff__from_fewshot")

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including ada-flo/monkey-grpo-arith_op

monkey

Collection

3 items • Updated May 19