Hi! I'm evaluating this model based in some inputs similar to the inputs finetuned on this model(GPT-5.1-Codex-Max and GPT-5.1-HighReasoning)

by AxionLab-official - opened 16 days ago

In this discussion, i'm evaluating the model comparing with similar models at dataset, finetune and parameters and i'm releasing the results here

First, Chat modality:

(There will be 10 q&a like "Hello!")

Qwen2.5-0.5B-Instruct: 79%
GPT-5.1-Codex-Max-0.4B: 50%

(CHAT IS A GPT5.1-0.4B OUT-OF-SCOPE-USE)

Now, Coding>>>

<<< MORE TOMORROW >>>

WithIn Us AI org 13 days ago

how is it going ?

evaluating the coding hability with 6 models
it's taking some time

++++++ EVALUTATION PAUSED ++++++

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment