PPL test

#2
by bullerwins - opened

Hi!

How do you run the PPL test to test quality? Have you found PPL test to be the better test instead of regular benchmarks for example?

Thanks :)

cyankiwi org

Thank you for your interest in my model!

My PPL test measures byte perplexity on the wikitext dataset, derived from EleutherAI/lm-evaluation-harness.

I use the PPL test to measure how the quantized model differs from the original model facing general text i.e., the wikitext dataset. Using benchmarks of specific domains e.g., GPQA Diamond, AIME25, livecodebench, etc also works, but it takes significantly more time, and especially when following Artificial Analysis standards i.e., evaluating GPQA Diamond 5 times, AIME25 10 times, etc.

In the future, the evaluations for my models would be more inclusive, so stay tuned :)

Sign up or log in to comment