docs: add MMLU-Pro best-per-category table (46.4%), 5-shot results cc4b6e4 Snider Virgil commited on 12 days ago
data: add Lemer(1,1) full 14-category results with transcripts 4afef24 Snider commited on 12 days ago
data: add benchmark result stock-e2b-bf16-math-think-temp1.json 835b8a2 verified lthn commited on 12 days ago
data: add benchmark result stock-e2b-bf16-math-temp0.json 86673fd verified lthn commited on 12 days ago
data: add benchmark result stock-e2b-bf16-math-nothink-temp1.json 4e3885c verified lthn commited on 12 days ago
data: add benchmark result stock-e2b-bf16-math-nothink-temp0.json 1a9b96a verified lthn commited on 12 days ago
data: add benchmark result stock-e2b-bf16-biology-temp0.json fa15fbe verified lthn commited on 12 days ago
data: add benchmark result lemer-bf16-math-think-temp1.json 8af8909 verified lthn commited on 12 days ago
data: add benchmark result lemer-bf16-math-nothink-temp1.json 29f9886 verified lthn commited on 12 days ago
data: add benchmark result lemer-bf16-math-nothink-temp0.json a740c0b verified lthn commited on 12 days ago
data: add benchmark result lemer-bf16-biology-temp0.json 54d7fb0 verified lthn commited on 12 days ago