eleusis-benchmark

Running

dlouapre HF Staff commited on Feb 9

Commit

a2c0d8a

1 Parent(s): d67e69e

Title

Files changed (2) hide show

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: 'Are LLMs any good at the Science Game?'
 short_desc: 'Evaluating scientific reasoning using the card game Eleusis'
 emoji: 📝
 colorFrom: blue

 ---
+title: 'Can LLMs Play the Game of Science?'
 short_desc: 'Evaluating scientific reasoning using the card game Eleusis'
 emoji: 📝
 colorFrom: blue

app/src/content/article.mdx CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: "Are LLMs any good at the Game of Science?"
 subtitle: "Evaluating scientific reasoning and metacognition using the card game Eleusis reveals distinct scientist personalities in large language models"
 description: "A benchmark for evaluating LLM scientific reasoning using the card game Eleusis, testing iterative hypothesis formation, calibration, and strategic experimentation."
 authors:

 ---
+title: "Can LLMs Play the Game of Science?"
 subtitle: "Evaluating scientific reasoning and metacognition using the card game Eleusis reveals distinct scientist personalities in large language models"
 description: "A benchmark for evaluating LLM scientific reasoning using the card game Eleusis, testing iterative hypothesis formation, calibration, and strategic experimentation."
 authors: