Evaluation Reproduction

#2
by Dogacel - opened

Hello,

I am trying to reproduce the results in terms of acceptance length using SGLang, I use the latest version, 0.5.10. The results I get is lower than the reported for mt-bench, my acceptance length is 3.25 on medium thinking for max 2048 tokens, 2 turns for 80 prompts.

I wonder what is the reproduction setting for the acceptance length reported? I use SpecForge to run benchmarks.

Thanks.

Sign up or log in to comment