foo-barrr commited on
Commit
54cd96d
·
verified ·
1 Parent(s): 1ea9097

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +1 -1
app.py CHANGED
@@ -48,7 +48,7 @@ with gr.Blocks(title="LLM Propensity Evaluation Leaderboard") as demo:
48
 
49
  ## Evaluation Details:
50
  - **Instruction Following Score**: Measures a model's tendency to follow instructions accurately. Measured using the IFEval dataset.
51
- - **Factual Hallucination Rate**: Evaluates how often a model hallucinates when questioned on facts. Measured using a subset of the SimpleQA dataset, which explicitly asks uncommon facts. We calculated the rate using this formula : (1 - (correct + not_attempted)), where correct = when the model answered a question correctly and not_attempted = when a model admits to not knowing the answer to a question.*
52
 
53
  ## How to Interpret the Scores:
54
  * Instruction Following Score: Higher scores indicate better adherence to instructions.
 
48
 
49
  ## Evaluation Details:
50
  - **Instruction Following Score**: Measures a model's tendency to follow instructions accurately. Measured using the IFEval dataset.
51
+ - **Uncommon Facts Hallucination Rate**: Evaluates how often a model hallucinates when questioned on facts. Measured using a subset of the SimpleQA dataset, which explicitly asks uncommon facts. We calculated the rate using this formula : (1 - (correct + not_attempted)), where correct = when the model answered a question correctly and not_attempted = when a model admits to not knowing the answer to a question.*
52
 
53
  ## How to Interpret the Scores:
54
  * Instruction Following Score: Higher scores indicate better adherence to instructions.