Spaces:
Running
Running
new
Browse files
app.py
CHANGED
|
@@ -2209,9 +2209,6 @@ with block:
|
|
| 2209 |
- **Number of Models**: {NUM_MODELS}
|
| 2210 |
- **Mode of Evaluation**: Zero-Shot, Five-Shot
|
| 2211 |
|
| 2212 |
-
### Possible Issues:
|
| 2213 |
-
- For base models, the output of base model is not truncated as no EOS detected. Evaluation could be affected, especially with length-aware metrics.
|
| 2214 |
-
|
| 2215 |
### The following table shows the performance of the models on the SeaEval benchmark.
|
| 2216 |
- For **Zero-shot** performance, it is the median value from 5 distinct prompts shown on the above leaderboard to mitigate the influence of random variations induced by prompts.
|
| 2217 |
- (-1) value indicates the results are ready yet.
|
|
|
|
| 2209 |
- **Number of Models**: {NUM_MODELS}
|
| 2210 |
- **Mode of Evaluation**: Zero-Shot, Five-Shot
|
| 2211 |
|
|
|
|
|
|
|
|
|
|
| 2212 |
### The following table shows the performance of the models on the SeaEval benchmark.
|
| 2213 |
- For **Zero-shot** performance, it is the median value from 5 distinct prompts shown on the above leaderboard to mitigate the influence of random variations induced by prompts.
|
| 2214 |
- (-1) value indicates the results are ready yet.
|