Update README.md
Browse files
README.md
CHANGED
|
@@ -7,7 +7,26 @@ sdk: static
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
| 11 |
|
| 12 |
-
Our goal
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# [LLM Arena](https://llmarena.ru/): Benchmarking LLM Models in Russian
|
| 11 |
|
| 12 |
+
**LLM Arena** ([https://llmarena.ru/](https://llmarena.ru/)) is an open crowd-sourced platform designed for evaluating large language models (LLMs) in the Russian language. Our goal is to provide an objective, open, and up-to-date benchmark of LLM models tailored for the Russian language.
|
| 13 |
+
|
| 14 |
+
## Features
|
| 15 |
+
|
| 16 |
+
The platform offers three main features:
|
| 17 |
+
|
| 18 |
+
1. **Anonymous Chat ([https://llmarena.ru/?arena](https://llmarena.ru/?arena)):** Users can engage in a blind test, posing questions to two anonymous models simultaneously. This allows for unbiased comparisons and helps us gather valuable user feedback for building our leaderboard.
|
| 19 |
+
2. **Named Chat ([https://llmarena.ru/?compare](https://llmarena.ru/?compare)):** Users can select specific models to chat with side-by-side, allowing for direct comparisons and a more controlled evaluation experience.
|
| 20 |
+
3. **Leaderboard ([https://llmarena.ru/?leaderboard](https://llmarena.ru/?leaderboard)):** The platform displays a leaderboard based on the Elo rating system and the Bradley-Terry model, showcasing the rankings of different LLMs based on the aggregated user evaluations.
|
| 21 |
+
|
| 22 |
+
- **Arena Hard Benchmark:** This offline benchmark utilizes a more potent model to evaluate the performance of other models based on a curated set of 500 prompts.
|
| 23 |
+
- **Data Visualization:** The leaderboard section also features four plots visualizing the performance of models across different categories.
|
| 24 |
+
|
| 25 |
+
## How It Works
|
| 26 |
+
|
| 27 |
+
1. **User Interaction:** Users interact with LLMs in either the anonymous or named chat setting.
|
| 28 |
+
2. **User Evaluation:** Users rate which model provided a better response.
|
| 29 |
+
3. **Data Collection:** The platform collects these user evaluations.
|
| 30 |
+
4. **Leaderboard Generation:** The Elo rating system and Bradley-Terry model use the collected data to rank the LLMs and generate the leaderboard.
|
| 31 |
+
|
| 32 |
+

|