Safetensors
gpt2

Hi! I'm evaluating this model based in some inputs similar to the inputs finetuned on this model(GPT-5.1-Codex-Max and GPT-5.1-HighReasoning)

#2
by AxionLab-official - opened

In this discussion, i'm evaluating the model comparing with similar models at dataset, finetune and parameters and i'm releasing the results here

First, Chat modality:

(There will be 10 q&a like "Hello!")

Qwen2.5-0.5B-Instruct: 79%
GPT-5.1-Codex-Max-0.4B: 50%

(CHAT IS A GPT5.1-0.4B OUT-OF-SCOPE-USE)

<<< MORE TOMORROW >>>

WithIn Us AI org

how is it going ?

evaluating the coding hability with 6 models
it's taking some time

++++++ EVALUTATION PAUSED ++++++

Sign up or log in to comment