Spaces:

DontPlanToEnd
/

UGI-Leaderboard

Running

App Files Files Community

666

Anomalous testing results for Qwen3.5-27B-heretic (<think> prefill)

#623

by ComputeWisely - opened Mar 23

Discussion

ComputeWisely

Mar 23

Looking at the Huggingface entry, Bobi099/Qwen3.5-27B-heretic ( prefill) (https://huggingface.co/Bobi099/Qwen3.5-27B-heretic)
purports to be a "Duplicate from coder3101/Qwen3.5-27B-heretic Co-authored-by: Ashar coder3101@users.noreply.huggingface.co". So these models are presumably identical?

However, the testing for the original model (coder3101/Qwen3.5-27B-heretic ( prefill)) (https://huggingface.co/coder3101/Qwen3.5-27B-heretic) ostensibly yielded inferior results to its duplicate? Perhaps this is down to different seeds used during testing, etc.? (Or are these models really different?)

DontPlanToEnd

Owner Mar 23

The testing process isn't always deterministic. I try to make it as much as possible, but some models need randomness, especially reasoning models for thinking through ideas. Also I use vllm batching which seems to be inherently non-deterministic. So there is definitely a margin of error to the leaderboard scores.

DontPlanToEnd changed discussion status to closed Mar 23

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment