shubhrapandit commited on
Commit
d618b87
·
verified ·
1 Parent(s): a70c24d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -1
README.md CHANGED
@@ -532,7 +532,60 @@ outputs = llm.chat(messages, sampling_params=sampling_params)
532
  print(outputs[0].outputs[0].text)
533
  ```
534
 
535
- ## Accuracy
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
536
  <table>
537
  <thead>
538
  <tr>
 
532
  print(outputs[0].outputs[0].text)
533
  ```
534
 
535
+ ## Evaluation
536
+
537
+ The model was evaluated using [mistral-evals](https://github.com/neuralmagic/mistral-evals) for vision-related tasks and using [lm_evaluation_harness](https://github.com/neuralmagic/lm-evaluation-harness) for select text-based benchmarks. The evaluations were conducted using the following commands:
538
+
539
+ <details>
540
+ <summary>Evaluation Commands</summary>
541
+
542
+ ### Vision Tasks
543
+ - vqav2
544
+ - docvqa
545
+ - mathvista
546
+ - mmmu
547
+ - chartqa
548
+
549
+ ```
550
+ vllm serve neuralmagic/pixtral-12b-quantized.w8a8 --tensor_parallel_size 1 --max_model_len 25000 --trust_remote_code --max_num_seqs 8 --gpu_memory_utilization 0.9 --dtype float16 --limit_mm_per_prompt image=7
551
+
552
+ python -m eval.run eval_vllm \
553
+ --model_name neuralmagic/pixtral-12b-quantized.w8a8 \
554
+ --url http://0.0.0.0:8000 \
555
+ --output_dir ~/tmp \
556
+ --eval_name <vision_task_name>
557
+ ```
558
+
559
+ ### Text-based Tasks
560
+ #### MMLU
561
+
562
+ ```
563
+ lm_eval \
564
+ --model vllm \
565
+ --model_args pretrained="<model_name>",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=<n>,gpu_memory_utilization=0.8,enable_chunked_prefill=True,trust_remote_code=True \
566
+ --tasks mmlu \
567
+ --num_fewshot 5 \
568
+ --batch_size auto \
569
+ --output_path output_dir
570
+
571
+ ```
572
+
573
+ #### MGSM
574
+
575
+ ```
576
+ lm_eval \
577
+ --model vllm \
578
+ --model_args pretrained="<model_name>",dtype=auto,max_model_len=4096,max_gen_toks=2048,max_num_seqs=128,tensor_parallel_size=<n>,gpu_memory_utilization=0.9 \
579
+ --tasks mgsm_cot_native \
580
+ --num_fewshot 0 \
581
+ --batch_size auto \
582
+ --output_path output_dir
583
+
584
+ ```
585
+ </details>
586
+
587
+
588
+ ### Accuracy
589
  <table>
590
  <thead>
591
  <tr>