Feature Suggestion

#559
by darkc0de - opened

This space has become a community staple. You clearly put alot of work into it. I'm especially impressed with the NatInt benchmark which in my experience is more accurate than many mainstream benchmarks.

I'd like to propose a new category or ranking system. Currently xai/grok-4-0709 sits atop the leaderboard, but is it really the best uncensored model? Probably not.

I suggest this leaderboard to everyone but people often tell me that it's not accurate.
I disagree.
I believe its misunderstood.

Newer users are often confused at how to use the leaderboard and struggle with the complexity of the information.

So for the noobs, I suggest a easy ranking system that's easy to implement based on existing leaderboard data that answers the noobs simple question. "Whats the best uncensored LLM for me?"

My answer is: (UGI+NatInt)×(W/10)²

Check out my theory implemented here darkc0de/UGI-Index

And unlike the Open LLM Leaderboard that abandoned us all,
I hope you really DontPlanToEnd

Yeah I really do need to make it clearer what UGI contains. It is a mix of questions measuring models' knowledge in sensitive information (the categories Hazardous, Entertainment, and SocPol), and also willingness questions which are solely focused on if they llm does what you say. Looking at the leaderboard columns doesn't immediately make clear that W/10 is a subset of UGI's questions.

It is very debatable how much to value knowledge of controversial information versus willingness in the UGI ranking. I could probably weight W/10 a bit more. I do think that a stupid but fully willing model should score lower than a medium willing but smart model. I guess it depends on the use case which model would be more valuable to people.

In my experience, anything below a W/10 of 8 is somewhat useless for my use case.
Depends on the specific application.
Difficult to strike the perfect balance of model being actually uncensored, actually obedient, actually knowledgeable, and actually intelligent.
Always trade-offs and sacrifices, no one model is perfect. For me personally the (UGI+NatInt)×(W/10)² method helps me filter out the censored or dumb models.
Kinda like being able to sort all 3 columns at the same time instead of individually.
Dumb and Censored to the bottom of the list, Smart and UNcensored to the top of the list

darkc0de changed discussion status to closed

Sign up or log in to comment