Spaces:

DontPlanToEnd
/

UGI-Leaderboard

Running

App Files Files Community

638

Eval requests

#539

by zelk12 - opened Feb 1

Discussion

zelk12

Feb 1

zai-org/GLM-4.7-Flash

DontPlanToEnd

Owner Feb 1

There's another discussion open for this model. I'll give it another shot today to see if I still get an error when trying to run it.

DontPlanToEnd changed discussion status to closed Feb 1

zelk12

Feb 5

Yes, sorry, I didn’t see that discussion.
The only thing I can say about this model is that, at least in Russian, without forced writing of , the model can be very glitchy and go into a loop.
At the same time, as is standard for models, she has poor logic.

Да простите, не увидел, того обсуждения.
Я разве что могу про эту модель, сказать, что по крайней мере, на русском языке, без принудительного написания модель может сильно глючить и уходить в цикл.
При этом стандартно для моделей, у неё плоховато с логикой.

DontPlanToEnd

Owner Feb 5

Yeah I added it to the leaderboard, and it took a while to figure out the best settings and prompt template for it. It's very sensitive. It's one of the longest thinking models I've tested. I heard there are some finetunes that shorten how long it thinks. I've still got to test those.

zelk12

Feb 6

Probably one of the questions is not how to reduce the number of thinking tokens, but how to increase logic with such costs.
So, let's take the same comparison using NatInt, and even models without thinking, with a smaller size, get a higher score. Although there are some peculiarities there, too.

Вероятно один из вопросов, не то, как понизить количество токенов мышления, а то как повысить логику при таких затратах.
То есть возьмём, то же самое сравнение по NatInt и модели даже без мышления, с меньшим размером, получают оценку выше. Хотя там тоже есть свои особенности.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment