Clémentine commited on
Commit
fdd0940
·
1 Parent(s): ffdff5d
app/src/content/chapters/automated-benchmarks/designing-your-automatic-evaluation.mdx CHANGED
@@ -552,7 +552,7 @@ You can also compute these with prompt variations, by asking the same questions
552
  ### Cost and efficiency
553
 
554
 
555
- When designing and reporting evaluation results, we need to start collectively reporting results against model running costs! A reasoning model which requires 10 minutes of thinking and 10K tokens to answer 10 + 20 (because it decides to make an entire segue on binary vs decimal arithmetics) is considerably less efficient than a smol model answering 30 in a handful of tokens.
556
 
557
  <div className="card" style="height: fit-content; max-width: 75%; margin: 40px auto;">
558
  <img src={envImage.src} alt="Environmental impact metrics for model evaluation" style="height: auto !important; object-fit: contain !important; display: block; margin: 0 auto;" />
 
552
  ### Cost and efficiency
553
 
554
 
555
+ When designing and reporting evaluation results, we need to start collectively reporting results against model running costs! A reasoning model which requires 10 minutes of thinking and 10K tokens to answer 10 + 1 (because it decides to make an entire segue on binary vs decimal arithmetics) is considerably less efficient than a smol model answering 30 in a handful of tokens.
556
 
557
  <div className="card" style="height: fit-content; max-width: 75%; margin: 40px auto;">
558
  <img src={envImage.src} alt="Environmental impact metrics for model evaluation" style="height: auto !important; object-fit: contain !important; display: block; margin: 0 auto;" />