tareknaser commited on
Commit
06038cc
·
unverified ·
1 Parent(s): b5db20b

docs: add a guide on how to add new results

Browse files

Signed-off-by: Tarek <tareknaser360@gmail.com>

Files changed (2) hide show
  1. CONTRIBUTING.md +56 -0
  2. README.md +2 -34
CONTRIBUTING.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Contributing
2
+
3
+ To run the leaderboard locally:
4
+
5
+ ```bash
6
+ python app.py
7
+ ```
8
+
9
+ ## Adding Experiments to the Leaderboard
10
+
11
+ Follow these steps to add new experiments to the leaderboard:
12
+
13
+ ### 1. Adding a New Dataset Variant
14
+
15
+ If your experiment uses a new dataset variant (not already in the leaderboard):
16
+
17
+ 1. Add an entry to [data/dataset_info.json](data/dataset_info.json) with the variant name and description:
18
+
19
+ ```json
20
+ "my-variant": {
21
+ "name": "My Variant",
22
+ "description": "Description of the variant"
23
+ }
24
+ ```
25
+
26
+ 2. Add the variant to [src/display/dataset_config.py](src/display/dataset_config.py) in the `DATASET_VARIANTS` dictionary:
27
+
28
+ ### 2. Adding a New Model
29
+
30
+ If your experiment uses a new model (not already in the leaderboard):
31
+
32
+ - Update [src/metrics/data_utils.py](src/metrics/data_utils.py):
33
+ - Add the model to `MODEL_NAMES` dictionary (mapping the model ID to display name)
34
+ - Add the display name to `MODEL_ORDER` list (controls display order)
35
+
36
+ ### 3. Adding the Experiment Data
37
+
38
+ 1. Copy your LLM processed results folder to `data/experiments/` in a new folder
39
+
40
+ - The folder should follow the format from `mizan-cli` evaluation (After running `process_experiments` script)
41
+ - Expected files: `results.json`, `processed_results.csv`, and `metadata.json`
42
+
43
+ 2. Add an entry to [data/experiments.json](data/experiments.json):
44
+ - Key: the model name (matching the key in `MODEL_NAMES`)
45
+ - Value: object mapping dataset variant to experiment folder name (relative to `data/experiments/`)
46
+
47
+ Example:
48
+
49
+ ```json
50
+ {
51
+ "my-model-id": {
52
+ "vanilla": "my_experiment_folder_name",
53
+ "neutral": "another_experiment_folder_name"
54
+ }
55
+ }
56
+ ```
README.md CHANGED
@@ -12,38 +12,6 @@ tags:
12
  - leaderboard
13
  ---
14
 
15
- # Start the configuration
16
 
17
- Most of the variables to change for a default leaderboard are in `src/env.py` (replace the path for your leaderboard) and `src/about.py` (for tasks).
18
-
19
- Results files should have the following format and be stored as json files:
20
-
21
- ```json
22
- {
23
- "config": {
24
- "model_dtype": "torch.float16", # or torch.bfloat16 or 8bit or 4bit
25
- "model_name": "path of the model on the hub: org/model",
26
- "model_sha": "revision on the hub",
27
- },
28
- "results": {
29
- "task_name": {
30
- "metric_name": score,
31
- },
32
- "task_name2": {
33
- "metric_name": score,
34
- }
35
- }
36
- }
37
- ```
38
-
39
- Request files are created automatically by this tool.
40
-
41
- If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.
42
-
43
- # Code logic for more complex edits
44
-
45
- You'll find
46
-
47
- - the main table' columns names and properties in `src/display/utils.py`
48
- - the logic to read all results and request files, then convert them in dataframe lines, in `src/leaderboard/read_evals.py`, and `src/populate.py`
49
- - the logic to allow or filter submissions in `src/submission/submit.py` and `src/submission/check_validity.py`
 
12
  - leaderboard
13
  ---
14
 
15
+ ## Contributing
16
 
17
+ To add new experiments to the leaderboard, see [CONTRIBUTING.md](CONTRIBUTING.md).