Could more benchmarks be done on Instruction Following / Function Calling?

#2
by TomLucidor - opened

This is coming from BFCL, IFBench, ToolBench etc. and the general robustness of models on reasoning within tool usage. And Qwen3-Coder-Next itself claims to be a reasoner as well.

I agree, testing on these benchmarks is important. But unfortunately I don't have the bandwidth to do it at the moment as these benchmarks seem untrivial to run.

Would be ideal if the existing REAM code becomes FOSS for public experimentation https://huggingface.co/lovedheart/Qwen3-Coder-Next-REAP-48B-A3B-GGUF/discussions/8#698dc9bc9dfe52a71e849068

Yes, we are considering releasing the code some time soon since there is a lot of interest in the community, stay tuned!

Sign up or log in to comment