Shaer-adapters-grpo-vnext
This repo is the first patched rerun after Shaer-AI/Shaer-adapters-grpo was reclassified as reward hacked.
Place in the story
Project sequence:
Shaer-AI/Shaer-adaptersclean SFT baselineShaer-AI/Shaer-adapters-grpohistorically strong-looking but reward-hacked GRPO runShaer-AI/Shaer-adapters-grpo-vnextstricter anti-template and artifact-filtering GRPO rerunShaer-AI/Shaer-adapters-grpo-friend-v1first judge-centered rerunShaer-AI/Shaer-adapters-grpo-friend-v1-easyfirsteasier judge-centered rerun
What changed here
This run introduced the stricter reward patch that was designed to kill the old hacked behavior:
reward_total = meter * count_adherence * arabic_clean * repeat_penalty
with much stronger internals for:
- artifact-free Arabic filtering
- lexical plausibility
- near-duplicate detection
- opening diversity
- distinct-2 phrase diversity
Run snapshot
- starting adapter:
Shaer-AI/Shaer-adapters - train subset: dropped-trio curated subset, cap
3000per surviving meter - eval bank: full
13-meter eval,104rows
Best tracked checkpoint:
- step:
500 - eval total:
0.1937 - eval meter:
0.5652 - eval count adherence:
0.9099 - eval judge diagnostic:
0.3774 - eval repeat penalty:
0.5577 - eval arabic clean:
0.8750
What this run proved
This stage was important because it showed the patched anti-template reward was much better at rejecting the old hacked outputs.
But it still was not the final answer:
- tracked reward was much lower than the old hacked run
- generation quality was still not strong enough
- semantic quality still needed to be modeled more directly
Why we moved on
This repo motivated the next shift: bring in a focused Arabic semantic judge that scores whether the poem:
- has meaning
- is not garbage
- is relevant to the description
That next stage was published as Shaer-AI/Shaer-adapters-grpo-friend-v1.
Recommended use
Use this repo as the first serious post-hack reward patch, not as the final recommended GRPO model.
Model tree for Shaer-AI/Shaer-adapters-grpo-vnext
Base model
humain-ai/ALLaM-7B-Instruct-preview