Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -11,4 +11,34 @@ license: mit
|
|
| 11 |
short_description: LLM Many-Model-As-Judge Benchmark
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
short_description: LLM Many-Model-As-Judge Benchmark
|
| 12 |
---
|
| 13 |
|
| 14 |
+
|
| 15 |
+
# AutoBench
|
| 16 |
+
|
| 17 |
+
This Space runs a benchmark to compare different language models using Hugging Face's Inference API.
|
| 18 |
+
|
| 19 |
+
## Features
|
| 20 |
+
|
| 21 |
+
- Benchmark multiple models side by side (models evaluate models)
|
| 22 |
+
- Test models across various topics and difficulty levels
|
| 23 |
+
- Evaluate question quality and answer quality
|
| 24 |
+
- Generate detailed performance reports
|
| 25 |
+
|
| 26 |
+
## How to Use
|
| 27 |
+
|
| 28 |
+
1. Enter your Hugging Face API token (needed to access models)
|
| 29 |
+
2. Select the models you want to benchmark
|
| 30 |
+
3. Choose topics and number of iterations
|
| 31 |
+
4. Click "Start Benchmark"
|
| 32 |
+
5. View and download results when complete
|
| 33 |
+
|
| 34 |
+
## Models
|
| 35 |
+
|
| 36 |
+
The benchmark supports any model available through Hugging Face's Inference API, including:
|
| 37 |
+
- Meta Llama models
|
| 38 |
+
- Google Gemma models
|
| 39 |
+
- Mistral models
|
| 40 |
+
- And many more!
|
| 41 |
+
|
| 42 |
+
## Note
|
| 43 |
+
|
| 44 |
+
Running a full benchmark might take some time depending on the number of models and iterations.
|