cowWhySo
/

Phi-3-mini-4k-instruct-Friendly

@@ -94,6 +94,87 @@ resize_token_embeddings_to_32x: true
 GGUF: https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly-gguf
 ## Training Summary
 ```json

 GGUF: https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly-gguf
+## Benchmarks
+|                                              Model                                               |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
+|--------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
+|[Phi-3-mini-4k-instruct-Friendly](https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly)|     41|  67.56|     46.36|    39.3|  48.56|
+### AGIEval
+|             Task             |Version| Metric |Value|   |Stderr|
+|------------------------------|------:|--------|----:|---|-----:|
+|agieval_aqua_rat              |      0|acc     |22.05|±  |  2.61|
+|                              |       |acc_norm|22.05|±  |  2.61|
+|agieval_logiqa_en             |      0|acc     |41.01|±  |  1.93|
+|                              |       |acc_norm|41.32|±  |  1.93|
+|agieval_lsat_ar               |      0|acc     |22.17|±  |  2.75|
+|                              |       |acc_norm|22.17|±  |  2.75|
+|agieval_lsat_lr               |      0|acc     |45.69|±  |  2.21|
+|                              |       |acc_norm|45.88|±  |  2.21|
+|agieval_lsat_rc               |      0|acc     |59.48|±  |  3.00|
+|                              |       |acc_norm|56.51|±  |  3.03|
+|agieval_sat_en                |      0|acc     |75.24|±  |  3.01|
+|                              |       |acc_norm|70.39|±  |  3.19|
+|agieval_sat_en_without_passage|      0|acc     |39.81|±  |  3.42|
+|                              |       |acc_norm|37.86|±  |  3.39|
+|agieval_sat_math              |      0|acc     |33.64|±  |  3.19|
+|                              |       |acc_norm|31.82|±  |  3.15|
+Average: 41.0%
+### GPT4All
+|    Task     |Version| Metric |Value|   |Stderr|
+|-------------|------:|--------|----:|---|-----:|
+|arc_challenge|      0|acc     |49.74|±  |  1.46|
+|             |       |acc_norm|50.43|±  |  1.46|
+|arc_easy     |      0|acc     |76.68|±  |  0.87|
+|             |       |acc_norm|73.23|±  |  0.91|
+|boolq        |      1|acc     |79.27|±  |  0.71|
+|hellaswag    |      0|acc     |57.91|±  |  0.49|
+|             |       |acc_norm|77.13|±  |  0.42|
+|openbookqa   |      0|acc     |35.00|±  |  2.14|
+|             |       |acc_norm|43.80|±  |  2.22|
+|piqa         |      0|acc     |77.86|±  |  0.97|
+|             |       |acc_norm|79.54|±  |  0.94|
+|winogrande   |      0|acc     |69.53|±  |  1.29|
+Average: 67.56%
+### TruthfulQA
+|    Task     |Version|Metric|Value|   |Stderr|
+|-------------|------:|------|----:|---|-----:|
+|truthfulqa_mc|      1|mc1   |31.21|±  |  1.62|
+|             |       |mc2   |46.36|±  |  1.55|
+Average: 46.36%
+### Bigbench
+|                      Task                      |Version|       Metric        |Value|   |Stderr|
+|------------------------------------------------|------:|---------------------|----:|---|-----:|
+|bigbench_causal_judgement                       |      0|multiple_choice_grade|54.74|±  |  3.62|
+|bigbench_date_understanding                     |      0|multiple_choice_grade|66.67|±  |  2.46|
+|bigbench_disambiguation_qa                      |      0|multiple_choice_grade|29.46|±  |  2.84|
+|bigbench_geometric_shapes                       |      0|multiple_choice_grade|11.98|±  |  1.72|
+|                                                |       |exact_str_match      | 0.00|±  |  0.00|
+|bigbench_logical_deduction_five_objects         |      0|multiple_choice_grade|28.00|±  |  2.01|
+|bigbench_logical_deduction_seven_objects        |      0|multiple_choice_grade|17.14|±  |  1.43|
+|bigbench_logical_deduction_three_objects        |      0|multiple_choice_grade|45.67|±  |  2.88|
+|bigbench_movie_recommendation                   |      0|multiple_choice_grade|24.40|±  |  1.92|
+|bigbench_navigate                               |      0|multiple_choice_grade|53.70|±  |  1.58|
+|bigbench_reasoning_about_colored_objects        |      0|multiple_choice_grade|68.10|±  |  1.04|
+|bigbench_ruin_names                             |      0|multiple_choice_grade|31.03|±  |  2.19|
+|bigbench_salient_translation_error_detection    |      0|multiple_choice_grade|15.93|±  |  1.16|
+|bigbench_snarks                                 |      0|multiple_choice_grade|77.35|±  |  3.12|
+|bigbench_sports_understanding                   |      0|multiple_choice_grade|52.64|±  |  1.59|
+|bigbench_temporal_sequences                     |      0|multiple_choice_grade|51.50|±  |  1.58|
+|bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|19.52|±  |  1.12|
+|bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|13.89|±  |  0.83|
+|bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|45.67|±  |  2.88|
+Average: 39.3%
+Average score: 48.56%
 ## Training Summary
 ```json