Xin Dong
commited on
Commit
·
5c73b29
1
Parent(s):
6abbf5e
add eval
Browse files
README.md
CHANGED
|
@@ -106,6 +106,38 @@ print(f"Model response: {response}")
|
|
| 106 |
|
| 107 |
```
|
| 108 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
|
| 110 |
## Limitations
|
| 111 |
|
|
|
|
| 106 |
|
| 107 |
```
|
| 108 |
|
| 109 |
+
## Evaluation
|
| 110 |
+
We use [`LM Evaluation Harness`](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the model. The evaluation commands are as follows:
|
| 111 |
+
|
| 112 |
+
```bash
|
| 113 |
+
git clone --depth 1 https://github.com/EleutherAI/lm-evaluation-harness
|
| 114 |
+
git fetch --all --tags
|
| 115 |
+
git checkout tags/v0.4.4 # squad completion task is not compatible with the latest version
|
| 116 |
+
cd lm-evaluation-harness
|
| 117 |
+
pip install -e .
|
| 118 |
+
|
| 119 |
+
lm_eval --model hf --model_args pretrained=nvidia/Hymba-1.5B-Base,dtype=bfloat16,trust_remote_code=True \
|
| 120 |
+
--tasks mmlu \
|
| 121 |
+
--num_fewshot 5 \
|
| 122 |
+
--batch_size 1 \
|
| 123 |
+
--output_path ./hymba_HF_base_lm-results \
|
| 124 |
+
--log_samples
|
| 125 |
+
|
| 126 |
+
lm_eval --model hf --model_args pretrained=nvidia/Hymba-1.5B-Base,dtype=bfloat16,trust_remote_code=True \
|
| 127 |
+
--tasks arc_easy,arc_challenge,piqa,winogrande,hellaswag \
|
| 128 |
+
--num_fewshot 0 \
|
| 129 |
+
--batch_size 1 \
|
| 130 |
+
--output_path ./hymba_HF_base_lm-results \
|
| 131 |
+
--log_samples
|
| 132 |
+
|
| 133 |
+
lm_eval --model hf --model_args pretrained=nvidia/Hymba-1.5B-Base,dtype=bfloat16,trust_remote_code=True \
|
| 134 |
+
--tasks squad_completion \
|
| 135 |
+
--num_fewshot 1 \
|
| 136 |
+
--batch_size 1 \
|
| 137 |
+
--output_path ./hymba_HF_base_lm-results \
|
| 138 |
+
--log_samples
|
| 139 |
+
```
|
| 140 |
+
|
| 141 |
|
| 142 |
## Limitations
|
| 143 |
|