LLMArena commited on
Commit
22f1a63
·
verified ·
1 Parent(s): d0b8b59

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -3
README.md CHANGED
@@ -7,7 +7,26 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- The LLM Arena platform is an open crowdsourcing platform for evaluating large language models (LLM) in Russian.
11
 
12
- Our goal: to make an objective, open and relevant benchmark of LLM models in Russian.
13
- Website: [llmarena.ru](https://llmarena.ru/?utm_source=huggingface_main)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # [LLM Arena](https://llmarena.ru/): Benchmarking LLM Models in Russian
11
 
12
+ **LLM Arena** ([https://llmarena.ru/](https://llmarena.ru/)) is an open crowd-sourced platform designed for evaluating large language models (LLMs) in the Russian language. Our goal is to provide an objective, open, and up-to-date benchmark of LLM models tailored for the Russian language.
13
+
14
+ ## Features
15
+
16
+ The platform offers three main features:
17
+
18
+ 1. **Anonymous Chat ([https://llmarena.ru/?arena](https://llmarena.ru/?arena)):** Users can engage in a blind test, posing questions to two anonymous models simultaneously. This allows for unbiased comparisons and helps us gather valuable user feedback for building our leaderboard.
19
+ 2. **Named Chat ([https://llmarena.ru/?compare](https://llmarena.ru/?compare)):** Users can select specific models to chat with side-by-side, allowing for direct comparisons and a more controlled evaluation experience.
20
+ 3. **Leaderboard ([https://llmarena.ru/?leaderboard](https://llmarena.ru/?leaderboard)):** The platform displays a leaderboard based on the Elo rating system and the Bradley-Terry model, showcasing the rankings of different LLMs based on the aggregated user evaluations.
21
+
22
+ - **Arena Hard Benchmark:** This offline benchmark utilizes a more potent model to evaluate the performance of other models based on a curated set of 500 prompts.
23
+ - **Data Visualization:** The leaderboard section also features four plots visualizing the performance of models across different categories.
24
+
25
+ ## How It Works
26
+
27
+ 1. **User Interaction:** Users interact with LLMs in either the anonymous or named chat setting.
28
+ 2. **User Evaluation:** Users rate which model provided a better response.
29
+ 3. **Data Collection:** The platform collects these user evaluations.
30
+ 4. **Leaderboard Generation:** The Elo rating system and Bradley-Terry model use the collected data to rank the LLMs and generate the leaderboard.
31
+
32
+ ![Screenshot of the LLM Arena menu](https://storage.yandexcloud.net/llmarena/arena_screenshot.png)