Shaer-adapters-grpo-vnext

This repo is the first patched rerun after Shaer-AI/Shaer-adapters-grpo was reclassified as reward hacked.

Place in the story

Project sequence:

Shaer-AI/Shaer-adapters clean SFT baseline
Shaer-AI/Shaer-adapters-grpo historically strong-looking but reward-hacked GRPO run
Shaer-AI/Shaer-adapters-grpo-vnext stricter anti-template and artifact-filtering GRPO rerun
Shaer-AI/Shaer-adapters-grpo-friend-v1 first judge-centered rerun
Shaer-AI/Shaer-adapters-grpo-friend-v1-easyfirst easier judge-centered rerun

This run introduced the stricter reward patch that was designed to kill the old hacked behavior:

reward_total = meter * count_adherence * arabic_clean * repeat_penalty

with much stronger internals for:

Best tracked checkpoint:

This stage was important because it showed the patched anti-template reward was much better at rejecting the old hacked outputs.

But it still was not the final answer:

This repo motivated the next shift: bring in a focused Arabic semantic judge that scores whether the poem:

That next stage was published as Shaer-AI/Shaer-adapters-grpo-friend-v1.

Use this repo as the first serious post-hack reward patch, not as the final recommended GRPO model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

Adapter

(10)

this model