Eval Request

#553
by MuXodious - opened

I'm still trying to get it right with gpt-oss 120b and would like to have this one tested to see where the progress stands. It is a bit of an experimental release after the previous, profusely broken one.

https://huggingface.co/MuXodious/gpt-oss-120b-tainted-heresy

I decided to experiment a little with noslopping in Heretic. I assume it wouldn't profusely improve the model's creative writing capability and capacity; however, there should be some improvements in the sub-metrics.
It's based on the previously UGI Leaderboard-evaluated Marcjoni/QuasiStarSynth-12B.

https://huggingface.co/MuXodious/QuasiStarSynth-12B-noslop

https://huggingface.co/MuXodious/QuasiStarSynth-12B-noslop-absolute-heresy

Well, after some discussion with people at the Heretic HQ, I re-hereticated the GLM 4.7 Flash, previous version of which made the top among its peers in UGI scores. This one is done with a more informed configuration and mature codebase. If possible, I would like to have it thrown along into the UGI evaluation machine.

https://huggingface.co/MuXodious/GLM-4.7-Flash-absolute-heresy

This was very informative in terms of the effects of my current method. At the cost of, well, pretty much everything else, the model is made to bend to user's will. While the latter is desired and was basically the main axis of my approach, we cannot effort the former as preserving model capabilities is essential and signals toward lobotomisation. Thank you for the evaluation.

DontPlanToEnd changed discussion status to closed

Sign up or log in to comment