Commit ·
e0c9db3
1
Parent(s): 9711067
Update README.md
Browse files
README.md
CHANGED
|
@@ -41,6 +41,7 @@ According to the leaderboard description, here are the benchmarks used for the e
|
|
| 41 |
*Based on a [leaderboard clone](https://huggingface.co/spaces/gsaivinay/open_llm_leaderboard) with GPT-3.5 and GPT-4 included.
|
| 42 |
|
| 43 |
### Reproducing Evaluation Results
|
|
|
|
| 44 |
|
| 45 |
Install LM Evaluation Harness:
|
| 46 |
```
|
|
@@ -53,26 +54,25 @@ git checkout b281b0921b636bc36ad05c0b0b0763bd6dd43463
|
|
| 53 |
# install
|
| 54 |
pip install -e .
|
| 55 |
```
|
| 56 |
-
Each task was evaluated on a single A100 80GB GPU.
|
| 57 |
|
| 58 |
ARC:
|
| 59 |
```
|
| 60 |
-
python main.py --model hf-causal-experimental --model_args pretrained=
|
| 61 |
```
|
| 62 |
|
| 63 |
HellaSwag:
|
| 64 |
```
|
| 65 |
-
python main.py --model hf-causal-experimental --model_args pretrained=
|
| 66 |
```
|
| 67 |
|
| 68 |
MMLU:
|
| 69 |
```
|
| 70 |
-
python main.py --model hf-causal-experimental --model_args pretrained=
|
| 71 |
```
|
| 72 |
|
| 73 |
TruthfulQA:
|
| 74 |
```
|
| 75 |
-
python main.py --model hf-causal-experimental --model_args pretrained=
|
| 76 |
```
|
| 77 |
|
| 78 |
### Prompt Template
|
|
|
|
| 41 |
*Based on a [leaderboard clone](https://huggingface.co/spaces/gsaivinay/open_llm_leaderboard) with GPT-3.5 and GPT-4 included.
|
| 42 |
|
| 43 |
### Reproducing Evaluation Results
|
| 44 |
+
*Instruction template taken from [Platypus 2 70B instruct](https://huggingface.co/garage-bAInd/Platypus2-70B-instruct).
|
| 45 |
|
| 46 |
Install LM Evaluation Harness:
|
| 47 |
```
|
|
|
|
| 54 |
# install
|
| 55 |
pip install -e .
|
| 56 |
```
|
|
|
|
| 57 |
|
| 58 |
ARC:
|
| 59 |
```
|
| 60 |
+
python main.py --model hf-causal-experimental --model_args pretrained=MayaPH/GodziLLa2-70B --tasks arc_challenge --batch_size 1 --no_cache --write_out --output_path results/G270B/arc_challenge_25shot.json --device cuda --num_fewshot 25
|
| 61 |
```
|
| 62 |
|
| 63 |
HellaSwag:
|
| 64 |
```
|
| 65 |
+
python main.py --model hf-causal-experimental --model_args pretrained=MayaPH/GodziLLa2-70B --tasks hellaswag --batch_size 1 --no_cache --write_out --output_path results/G270B/hellaswag_10shot.json --device cuda --num_fewshot 10
|
| 66 |
```
|
| 67 |
|
| 68 |
MMLU:
|
| 69 |
```
|
| 70 |
+
python main.py --model hf-causal-experimental --model_args pretrained=MayaPH/GodziLLa2-70B --tasks hendrycksTest-* --batch_size 1 --no_cache --write_out --output_path results/G270B/mmlu_5shot.json --device cuda --num_fewshot 5
|
| 71 |
```
|
| 72 |
|
| 73 |
TruthfulQA:
|
| 74 |
```
|
| 75 |
+
python main.py --model hf-causal-experimental --model_args pretrained=MayaPH/GodziLLa2-70B --tasks truthfulqa_mc --batch_size 1 --no_cache --write_out --output_path results/G270B/truthfulqa_0shot.json --device cuda
|
| 76 |
```
|
| 77 |
|
| 78 |
### Prompt Template
|