shubhrapandit commited on
Commit
e4eb410
·
verified ·
1 Parent(s): d62bb6d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -1
README.md CHANGED
@@ -138,16 +138,57 @@ oneshot(
138
 
139
  ## Evaluation
140
 
141
- The model was evaluated on OpenLLM Leaderboard [V1](https://huggingface.co/spaces/open-llm-leaderboard-old/open_llm_leaderboard), OpenLLM Leaderboard [V2](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/) and on [HumanEval](https://github.com/neuralmagic/evalplus), using the following commands:
142
 
143
  <details>
144
  <summary>Evaluation Commands</summary>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
145
 
146
  ```
 
 
 
147
  ```
 
 
 
 
 
 
 
148
 
 
149
  </details>
150
 
 
151
  ### Accuracy
152
 
153
  <table>
 
138
 
139
  ## Evaluation
140
 
141
+ The model was evaluated using [mistral-evals](https://github.com/neuralmagic/mistral-evals) for vision-related tasks and using [lm_evaluation_harness](https://github.com/neuralmagic/lm-evaluation-harness) for select text-based benchmarks. The evaluations were conducted using the following commands:
142
 
143
  <details>
144
  <summary>Evaluation Commands</summary>
145
+
146
+ ### Vision Tasks
147
+ - vqav2
148
+ - docvqa
149
+ - mathvista
150
+ - mmmu
151
+ - chartqa
152
+
153
+ ```
154
+ vllm serve neuralmagic/pixtral-12b-quantized.w8a8 --tensor_parallel_size 1 --max_model_len 25000 --trust_remote_code --max_num_seqs 8 --gpu_memory_utilization 0.9 --dtype float16 --limit_mm_per_prompt image=7
155
+
156
+ python -m eval.run eval_vllm \
157
+ --model_name neuralmagic/pixtral-12b-quantized.w8a8 \
158
+ --url http://0.0.0.0:8000 \
159
+ --output_dir ~/tmp \
160
+ --eval_name <vision_task_name>
161
+ ```
162
+
163
+ ### Text-based Tasks
164
+ #### MMLU
165
+
166
+ ```
167
+ lm_eval \
168
+ --model vllm \
169
+ --model_args pretrained="<model_name>",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=<n>,gpu_memory_utilization=0.8,enable_chunked_prefill=True,trust_remote_code=True \
170
+ --tasks mmlu \
171
+ --num_fewshot 5 \
172
+ --batch_size auto \
173
+ --output_path output_dir
174
 
175
  ```
176
+
177
+ #### MGSM
178
+
179
  ```
180
+ lm_eval \
181
+ --model vllm \
182
+ --model_args pretrained="<model_name>",dtype=auto,max_model_len=4096,max_gen_toks=2048,max_num_seqs=128,tensor_parallel_size=<n>,gpu_memory_utilization=0.9 \
183
+ --tasks mgsm_cot_native \
184
+ --num_fewshot 0 \
185
+ --batch_size auto \
186
+ --output_path output_dir
187
 
188
+ ```
189
  </details>
190
 
191
+
192
  ### Accuracy
193
 
194
  <table>