Clémentine
commited on
Commit
·
fdd0940
1
Parent(s):
ffdff5d
mini fix
Browse files
app/src/content/chapters/automated-benchmarks/designing-your-automatic-evaluation.mdx
CHANGED
|
@@ -552,7 +552,7 @@ You can also compute these with prompt variations, by asking the same questions
|
|
| 552 |
### Cost and efficiency
|
| 553 |
|
| 554 |
|
| 555 |
-
When designing and reporting evaluation results, we need to start collectively reporting results against model running costs! A reasoning model which requires 10 minutes of thinking and 10K tokens to answer 10 +
|
| 556 |
|
| 557 |
<div className="card" style="height: fit-content; max-width: 75%; margin: 40px auto;">
|
| 558 |
<img src={envImage.src} alt="Environmental impact metrics for model evaluation" style="height: auto !important; object-fit: contain !important; display: block; margin: 0 auto;" />
|
|
|
|
| 552 |
### Cost and efficiency
|
| 553 |
|
| 554 |
|
| 555 |
+
When designing and reporting evaluation results, we need to start collectively reporting results against model running costs! A reasoning model which requires 10 minutes of thinking and 10K tokens to answer 10 + 1 (because it decides to make an entire segue on binary vs decimal arithmetics) is considerably less efficient than a smol model answering 30 in a handful of tokens.
|
| 556 |
|
| 557 |
<div className="card" style="height: fit-content; max-width: 75%; margin: 40px auto;">
|
| 558 |
<img src={envImage.src} alt="Environmental impact metrics for model evaluation" style="height: auto !important; object-fit: contain !important; display: block; margin: 0 auto;" />
|