Add MMLU-200 (93.5%), speed/memory benchmarks, fix variants table sizes (47→56 GB), expand topic tags

#1
by dealignai - opened

MMLU-200 measured with thinking ON, q_per_subject=20, 10 subjects = 200 questions total. Median speed ~37 tok/s on M4 Max 128 GB, MLX 0.31. JANGTQ size corrected from 47 GB to actual 56 GB.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment