Hi! I'm evaluating this model based in some inputs similar to the inputs finetuned on this model(GPT-5.1-Codex-Max and GPT-5.1-HighReasoning)
#2
by
AxionLab-official - opened
In this discussion, i'm evaluating the model comparing with similar models at dataset, finetune and parameters and i'm releasing the results here
First, Chat modality:
(There will be 10 q&a like "Hello!")
Qwen2.5-0.5B-Instruct: 79%
GPT-5.1-Codex-Max-0.4B: 50%
(CHAT IS A GPT5.1-0.4B OUT-OF-SCOPE-USE)
Now, Coding>>>
<<< MORE TOMORROW >>>
how is it going ?
evaluating the coding hability with 6 models
it's taking some time
++++++ EVALUTATION PAUSED ++++++