steampunque commited on
Commit
75ff7c6
·
1 Parent(s): 2eb1fab

Add GLM-4.7-Flash Q4_P_H Math results

Browse files
Files changed (1) hide show
  1. README.md +97 -97
README.md CHANGED
@@ -557,59 +557,59 @@ CODE MODELS:
557
 
558
  MATH MODELS:
559
 
560
- MODEL | Deepseek-R1-Distill-Llama-8B | Deepseek-R1-Distill-Llama-8B | Deepseek-R1-Distill-Qwen-1.5B | Deepseek-R1-Distill-Qwen-7B | Deepseek-R1-Distill-Qwen-7B | Deepseek-R1-Distill-Qwen-14B | Deepseek-R1-Distill-Qwen-14B | Deepseek-R1-Distill-Qwen-32B | Deepseek-R1-Distill-Qwen-32B | GLM-4.7-Flash | GLM-Z1-9B-0414 | GLM-Z1-9B-0414 | GLM-Z1-9B-0414 | GLM-Z1-32B-0414 | Qwen2.5-Math-1.5B-Instruct | Qwen2.5-Math-7B-Instruct | Qwen3-32B | QwQ-32B | QwQ-32B |
561
- ---------------------------------------------|------------------------------|------------------------------|-------------------------------|-----------------------------|-----------------------------|------------------------------|------------------------------|------------------------------|------------------------------|---------------|----------------|----------------|----------------|-----------------|----------------------------|--------------------------|-----------|---------|---------|
562
- params | 8.03B | 8.03B | 1.78B | 7.62B | 7.62B | 14.77B | 14.77B | 32.76B | 32.76B | 29.94B | 9.40B | 9.40B | 9.40B | 32.57B | 1.54B | 7.62B | 32.8B | 32.76B | 32.76B |
563
- quant | Q6_K | Q6_K_H | Q8_0 | IQ4_XS | Q6_K_H | IQ4_XS | Q4_K_H | IQ4_XS | Q4_K_H | Q4_K_H | Q4_K_H | Q4_P_H | Q6_K_H | Q4_K_H | IQ4_XS | Q6_K | Q4_K_H | IQ4_XS | Q4_K_H |
564
- engine | llama.cpp version: 4707 | llama.cpp version: 5898 | llama.cpp version: 4763 | llama.cpp version: 4644 | llama.cpp version: 7699 | llama.cpp version: 4657 | llama.cpp version: 7710 | llama.cpp version: 4559 | llama.cpp version: 7719 | llama.cpp version: 7885 | llama.cpp version: 7230 | llama.cpp version: 7268 | llama.cpp version: 5935 | llama.cpp version: 7607 | llama.cpp version: 4406 | llama.cpp version: 4394 | llama.cpp version: 5633 | llama.cpp version: 4820 | llama.cpp version: 6026 |
565
- **TEST** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** |
566
- GSM8K | - | _0.888_ | - | - | _0.888_ | - | _0.944_ | - | _0.956_ | _0.992_ | _0.968_ | _0.968_ | _0.964_ | _0.988_ | - | - | - | - | _0.964_ |
567
- APPLE | - | 0.870 | - | - | 0.810 | - | 0.790 | - | 0.810 | 0.960 | 0.880 | 0.920 | 0.880 | 0.910 | - | - | - | - | 0.880 |
568
- GPQA_diamond | - | 0.308 | - | - | 0.323 | - | 0.489 | - | 0.494 | 0.444 | 0.555 | 0.540 | 0.434 | 0.585 | - | - | - | - | 0.555 |
569
- GPQA | - | 0.308 | - | - | 0.323 | - | 0.489 | - | 0.494 | 0.444 | 0.555 | 0.540 | 0.434 | 0.585 | - | - | - | - | 0.555 |
570
- MATH1_algebra | 0.933 | 0.977 | 0.918 | 0.962 | 1.000 | 0.925 | 0.970 | 0.962 | 0.985 | 1.000 | 0.985 | 0.992 | 0.992 | 0.992 | 0.859 | 0.955 | 0.992 | 0.992 | 1.000 |
571
- MATH1_counting_and_probability | 0.820 | 0.948 | 0.794 | 0.948 | 1.000 | 0.923 | 0.948 | 0.948 | 0.948 | 1.000 | - | 0.974 | 1.000 | 1.000 | 0.897 | 0.974 | 1.000 | 0.974 | 1.000 |
572
- MATH1_geometry | 0.842 | 0.868 | 0.710 | 0.736 | 0.921 | 0.868 | 0.921 | 0.921 | 0.894 | 1.000 | - | - | 0.947 | 0.947 | 0.710 | 0.842 | 0.842 | 0.921 | 0.921 |
573
- MATH1_intermediate_algebra | 0.923 | 0.980 | 0.730 | 0.903 | 0.961 | 0.865 | 0.980 | 0.961 | 0.942 | 1.000 | - | - | 0.961 | 0.980 | 0.730 | 0.711 | 0.923 | 1.000 | 0.980 |
574
- MATH1_number_theory | 0.700 | 0.900 | 0.866 | 0.800 | 1.000 | 0.700 | 0.966 | 0.933 | 0.966 | 1.000 | - | - | 0.900 | 1.000 | 0.766 | 1.000 | 0.666 | 0.800 | 0.833 |
575
- MATH1_prealgebra | 0.813 | 0.941 | 0.883 | 0.965 | 0.976 | 0.883 | 0.988 | 0.953 | 0.988 | 1.000 | - | - | 0.988 | 1.000 | 0.837 | 0.883 | 0.930 | 0.953 | 0.976 |
576
- MATH1_precalculus | 0.684 | 1.000 | 0.596 | 0.859 | 1.000 | 0.842 | 0.947 | 1.000 | 1.000 | 1.000 | - | - | 0.947 | 0.929 | 0.631 | 0.789 | 0.947 | 0.982 | 0.929 |
577
- MATH1 | 0.842 | 0.956 | 0.814 | 0.910 | 0.983 | 0.878 | 0.965 | 0.958 | 0.970 | 1.000 | - | - | 0.972 | 0.981 | 0.794 | 0.885 | 0.931 | 0.963 | 0.965 |
578
- MATH2_algebra | 0.845 | 0.975 | 0.825 | 0.930 | 0.985 | 0.900 | 0.965 | 0.995 | 0.985 | 1.000 | - | - | 0.980 | 0.995 | 0.910 | 0.860 | 0.970 | 0.975 | 0.990 |
579
- MATH2_counting_and_probability | 0.831 | 0.930 | 0.782 | 0.851 | 0.891 | 0.841 | 0.881 | 0.950 | 0.900 | 1.000 | - | - | 0.980 | 0.980 | 0.683 | 0.861 | 0.970 | 0.990 | 0.970 |
580
- MATH2_geometry | 0.841 | 0.963 | 0.743 | 0.914 | 0.951 | 0.792 | 0.939 | 0.914 | 0.987 | 1.000 | - | - | 0.987 | 0.939 | 0.621 | 0.743 | 0.792 | 0.963 | 0.963 |
581
- MATH2_intermediate_algebra | 0.859 | 0.953 | 0.664 | 0.875 | 0.992 | 0.835 | 0.960 | 0.968 | 0.976 | 1.000 | - | - | 0.968 | 0.984 | 0.671 | 0.710 | 0.953 | 0.960 | 0.984 |
582
- MATH2_number_theory | 0.826 | 0.913 | 0.782 | 0.826 | 0.967 | 0.891 | 0.934 | 0.934 | 0.913 | 0.989 | - | - | 0.956 | 0.989 | 0.695 | 0.880 | 0.891 | 0.945 | 0.967 |
583
- MATH2_prealgebra | 0.898 | 0.966 | 0.887 | 0.909 | 0.977 | 0.875 | 0.949 | 0.971 | 0.971 | 1.000 | - | - | 0.988 | 0.988 | 0.836 | 0.881 | 0.932 | 0.971 | 0.960 |
584
- MATH2_precalculus | 0.787 | 0.964 | 0.663 | 0.902 | 1.000 | 0.805 | 0.964 | 0.955 | 0.991 | 1.000 | - | - | 0.955 | 0.823 | 0.557 | 0.725 | 0.858 | 0.964 | 0.964 |
585
- MATH2 | 0.846 | 0.956 | 0.777 | 0.893 | 0.970 | 0.856 | 0.946 | 0.963 | 0.965 | 0.998 | - | - | 0.975 | 0.963 | 0.742 | 0.817 | 0.921 | 0.968 | 0.973 |
586
- MATH3_algebra | 0.873 | 0.938 | 0.854 | 0.934 | 0.984 | 0.911 | 0.961 | 0.992 | 0.980 | 0.988 | - | - | 0.980 | 0.988 | 0.881 | 0.850 | 0.969 | 0.996 | 0.984 |
587
- MATH3_counting_and_probability | 0.800 | 0.890 | 0.730 | 0.770 | 0.920 | 0.830 | 0.850 | 0.930 | 0.940 | 1.000 | - | - | 0.970 | 0.980 | 0.710 | 0.880 | 0.950 | 1.000 | 1.000 |
588
- MATH3_geometry | 0.794 | 0.901 | 0.627 | 0.911 | 0.960 | 0.794 | 0.911 | 0.901 | 0.941 | 0.970 | - | - | 0.970 | 0.950 | 0.696 | 0.764 | 0.833 | 0.970 | 0.931 |
589
- MATH3_intermediate_algebra | 0.825 | 0.969 | 0.635 | 0.902 | 0.964 | 0.882 | 0.953 | 0.964 | 0.984 | 0.989 | - | - | 0.969 | 0.938 | 0.574 | 0.738 | 0.933 | 0.969 | 0.938 |
590
- MATH3_number_theory | 0.819 | 0.934 | 0.696 | 0.754 | 0.942 | 0.770 | 0.844 | 0.926 | 0.909 | 0.991 | - | - | 0.926 | 0.942 | 0.655 | 0.819 | 0.811 | 0.942 | 0.918 |
591
- MATH3_prealgebra | 0.875 | 0.950 | 0.763 | 0.883 | 0.959 | 0.892 | 0.941 | 0.946 | 0.950 | 0.995 | - | - | 0.986 | 0.986 | 0.816 | 0.883 | 0.946 | 0.982 | 0.977 |
592
- MATH3_precalculus | 0.661 | 0.929 | 0.582 | 0.874 | 0.952 | 0.818 | 0.960 | 0.968 | 0.929 | 1.000 | - | - | 0.968 | 0.850 | 0.480 | 0.685 | 0.858 | 0.897 | 0.905 |
593
- MATH3 | 0.822 | 0.937 | 0.719 | 0.876 | 0.960 | 0.859 | 0.929 | 0.954 | 0.954 | 0.991 | - | - | 0.970 | 0.954 | 0.714 | 0.810 | 0.915 | 0.969 | 0.955 |
594
- MATH4_algebra | 0.848 | 0.950 | 0.805 | 0.897 | 0.982 | 0.922 | 0.964 | 0.957 | 0.985 | 0.992 | - | - | 0.989 | 0.978 | 0.851 | 0.865 | 0.968 | 0.992 | 0.985 |
595
- MATH4_counting_and_probability | 0.729 | 0.882 | 0.639 | 0.738 | 0.900 | 0.711 | 0.864 | 0.945 | 0.945 | 0.990 | - | - | 0.963 | 0.945 | 0.558 | 0.783 | 0.945 | 0.981 | 0.981 |
596
- MATH4_geometry | 0.792 | 0.896 | 0.576 | 0.776 | 0.896 | 0.768 | 0.832 | 0.832 | 0.856 | 0.976 | - | - | 0.920 | 0.888 | 0.432 | 0.616 | 0.712 | 0.872 | 0.840 |
597
- MATH4_intermediate_algebra | 0.778 | 0.947 | 0.588 | 0.858 | 0.987 | 0.850 | 0.931 | 0.935 | 0.915 | 0.987 | - | - | 0.939 | 0.895 | 0.512 | 0.649 | 0.911 | 0.947 | 0.907 |
598
- MATH4_number_theory | 0.795 | 0.950 | 0.697 | 0.809 | 0.915 | 0.725 | 0.859 | 0.894 | 0.929 | 0.992 | - | - | 0.929 | 0.964 | 0.619 | 0.823 | 0.823 | 0.943 | 0.936 |
599
- MATH4_prealgebra | 0.806 | 0.931 | 0.785 | 0.874 | 0.958 | 0.827 | 0.942 | 0.921 | 0.931 | 0.989 | - | - | 0.958 | 0.963 | 0.748 | 0.801 | 0.879 | 0.926 | 0.942 |
600
- MATH4_precalculus | 0.719 | 0.956 | 0.570 | 0.868 | 0.956 | 0.728 | 0.921 | 0.947 | 0.912 | 0.991 | - | - | 0.947 | 0.807 | 0.333 | 0.578 | 0.859 | 0.973 | 0.868 |
601
- MATH4 | 0.792 | 0.935 | 0.684 | 0.845 | 0.953 | 0.816 | 0.915 | 0.925 | 0.932 | 0.989 | - | - | 0.953 | 0.929 | 0.620 | 0.746 | 0.887 | 0.952 | 0.930 |
602
- MATH5_algebra | 0.768 | 0.947 | 0.752 | 0.899 | 0.970 | 0.853 | 0.934 | 0.970 | 0.967 | 0.986 | - | - | 0.960 | 0.967 | 0.674 | 0.762 | 0.964 | 0.964 | 0.977 |
603
- MATH5_counting_and_probability | 0.699 | 0.910 | 0.569 | 0.756 | 0.829 | 0.699 | 0.788 | 0.910 | 0.861 | 0.959 | - | - | 0.934 | 0.967 | 0.495 | 0.642 | 0.910 | 0.934 | 0.902 |
604
- MATH5_geometry | 0.712 | 0.886 | 0.545 | 0.810 | 0.840 | 0.727 | 0.818 | 0.840 | 0.810 | 0.946 | - | - | 0.878 | 0.810 | 0.348 | 0.507 | 0.734 | 0.833 | 0.742 |
605
- MATH5_intermediate_algebra | 0.682 | 0.900 | 0.453 | 0.821 | 0.875 | 0.778 | 0.810 | 0.810 | 0.846 | 0.932 | - | - | 0.889 | 0.832 | 0.253 | 0.389 | 0.807 | 0.860 | 0.800 |
606
- MATH5_number_theory | 0.811 | 0.909 | 0.707 | 0.727 | 0.935 | 0.792 | 0.915 | 0.935 | 0.935 | 0.967 | - | - | 0.961 | 0.954 | 0.525 | 0.753 | 0.870 | 0.941 | 0.935 |
607
- MATH5_prealgebra | 0.777 | 0.849 | 0.720 | 0.808 | 0.896 | 0.782 | 0.823 | 0.875 | 0.891 | 0.958 | - | - | 0.953 | 0.937 | 0.580 | 0.797 | 0.911 | 0.927 | 0.948 |
608
- MATH5_precalculus | 0.562 | 0.903 | 0.437 | 0.851 | 0.851 | 0.792 | 0.800 | 0.814 | 0.851 | 0.918 | - | - | 0.888 | 0.666 | 0.259 | 0.429 | 0.777 | 0.851 | 0.770 |
609
- MATH5 | 0.723 | 0.904 | 0.609 | 0.822 | 0.897 | 0.787 | 0.851 | 0.884 | 0.889 | 0.955 | - | - | 0.926 | 0.886 | 0.462 | 0.617 | 0.865 | 0.907 | 0.879 |
610
- MATHCOT | 0.795 | _0.930_ | 0.700 | 0.860 | _0.940_ | 0.831 | _0.910_ | 0.930 | _0.934_ | _0.983_ | - | - | _0.954_ | _0.936_ | 0.637 | 0.751 | 0.897 | 0.948 | _0.933_ |
611
  COMPOSITE AVERAGE
612
- AVG | 0.795 | _0.907_ | 0.700 | 0.860 | _0.918_ | 0.831 | _0.895_ | 0.930 | _0.918_ | _0.964_ | - | - | _0.936_ | _0.923_ | 0.637 | 0.751 | 0.897 | 0.948 | _0.920_ |
613
 
614
  VISION MODELS:
615
 
@@ -730,49 +730,49 @@ AUDIO MODELS:
730
 
731
  MT MODELS:
732
 
733
- MODEL | HY-MT1.5-7B | madlad400-10b-mt | madlad400-10b-mt | plamo-2-translate | translategemma-4b-it | translategemma-12b-it | translategemma-12b-it | translategemma-27b-it |
734
- ---------------------------------------------|-------------|------------------|------------------|-------------------|----------------------|-----------------------|-----------------------|-----------------------|
735
- params | 7.50B | 10.71B | 10.71B | 9.53B | 3.88B | 11.77B | 11.77B | 27.01B |
736
- quant | Q6_K_H | Q4_K_H | Q6_K_H | Q6_K_H | Q6_K_H | Q4_K_H | Q6_K_H | Q4_K_H |
737
- engine | llama.cpp version: 7789 | llama.cpp version: 7830 | llama.cpp version: 7772 | llama.cpp version: 7762 | llama.cpp version: 7760 | llama.cpp version: 7779 | llama.cpp version: 7779 | llama.cpp version: 7789 |
738
- **TEST** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** |
739
- FLORES200_de_en | 32.2 | 42.3 | 42.3 | 35.5 | 37.8 | 40.3 | 40.1 | 42.2 |
740
- FLORES200_en_de | 3.0 | 36.6 | 36.9 | 24.1 | 31.0 | 34.5 | 34.2 | 35.3 |
741
- FLORES200_en_es | 25.5 | 26.5 | 26.5 | 21.3 | 25.1 | 27.0 | 27.3 | 27.2 |
742
- FLORES200_en_fr | 37.2 | 49.7 | 49.8 | 36.5 | 40.9 | 44.7 | 44.2 | 45.2 |
743
- FLORES200_en_ja | 31.0 | 22.2 | 22.3 | 28.2 | 27.7 | 30.7 | 30.8 | 31.0 |
744
- FLORES200_en_ru | 22.6 | 28.7 | 28.8 | 18.2 | 25.4 | 27.5 | 28.1 | 28.2 |
745
- FLORES200_en_zh | 37.0 | 37.2 | 37.3 | 33.6 | 36.6 | 39.7 | 39.5 | 41.9 |
746
- FLORES200_es_en | 23.9 | 29.1 | 29.2 | 24.7 | 28.0 | 28.7 | 29.0 | 29.9 |
747
- FLORES200_fr_en | 32.6 | 44.1 | 44.3 | 36.4 | 38.8 | 41.2 | 41.7 | 43.4 |
748
- FLORES200_ja_en | 20.5 | 25.9 | 25.9 | 23.4 | 21.5 | 24.1 | 23.9 | 26.6 |
749
- FLORES200_ru_en | 26.7 | 34.7 | 34.9 | 29.0 | 30.7 | 33.0 | 32.7 | 34.8 |
750
- FLORES200_zh_en | 22.9 | 27.3 | 27.3 | 23.4 | 23.5 | 25.7 | 25.6 | 27.8 |
751
- FLORES200 | 26.3 | 33.7 | 33.8 | 27.9 | 30.6 | 33.1 | 33.1 | 34.4 |
752
- OPUS_de_en | 25.9 | 35.6 | 27.1 | 21.3 | 28.2 | 29.8 | 29.2 | 30.5 |
753
- OPUS_en_de | 11.0 | 32.9 | 30.7 | 21.1 | 25.5 | 26.5 | 25.5 | 27.1 |
754
- OPUS_en_es | 29.0 | 37.1 | 37.1 | 27.9 | 31.1 | 32.0 | 31.3 | 32.4 |
755
- OPUS_en_fr | 24.7 | 34.2 | 34.3 | 26.7 | 27.0 | 28.8 | 28.8 | 29.0 |
756
- OPUS_en_ja | 10.2 | 16.0 | 15.9 | 11.9 | 9.8 | 10.5 | 10.5 | 11.7 |
757
- OPUS_en_ru | 22.0 | 31.7 | 31.6 | 19.7 | 23.1 | 24.9 | 24.4 | 25.5 |
758
- OPUS_en_zh | 28.8 | 41.1 | 41.3 | 26.7 | 26.4 | 28.9 | 28.8 | 29.9 |
759
- OPUS_es_en | 28.4 | 40.2 | 40.3 | 26.7 | 33.2 | 35.0 | 34.5 | 36.4 |
760
- OPUS_fr_en | 25.8 | 35.8 | 35.8 | 29.6 | 29.2 | 30.9 | 30.6 | 32.3 |
761
- OPUS_ja_en | 14.8 | 18.9 | 18.8 | 16.0 | 15.1 | 17.0 | 16.2 | 16.8 |
762
- OPUS_ru_en | 24.5 | 34.7 | 34.8 | 27.1 | 27.1 | 29.0 | 28.3 | 30.2 |
763
- OPUS_zh_en | 26.1 | 38.5 | 38.8 | 23.8 | 22.1 | 25.5 | 24.9 | 27.6 |
764
- OPUS | 22.6 | 33.1 | 32.2 | 23.2 | 24.8 | 26.6 | 26.1 | 27.4 |
765
- DE_EN | 28.0 | 37.8 | 32.1 | 26.0 | 31.4 | 33.3 | 32.8 | 34.4 |
766
- EN_DE | 8.3 | 34.1 | 32.7 | 22.1 | 27.3 | 29.2 | 28.3 | 29.8 |
767
- ES_EN | 26.8 | 36.4 | 36.5 | 26.0 | 31.4 | 32.8 | 32.6 | 34.2 |
768
- EN_ES | 27.8 | 33.5 | 33.5 | 25.6 | 29.0 | 30.2 | 29.9 | 30.6 |
769
- FR_EN | 28.0 | 38.6 | 38.6 | 31.9 | 32.4 | 34.3 | 34.3 | 36.0 |
770
- EN_FR | 28.9 | 39.4 | 39.5 | 30.0 | 31.6 | 34.1 | 33.9 | 34.4 |
771
- RU_EN | 25.2 | 34.7 | 34.8 | 27.7 | 28.2 | 30.3 | 29.8 | 31.7 |
772
- EN_RU | 22.2 | 30.7 | 30.6 | 19.1 | 23.8 | 25.7 | 25.6 | 26.3 |
773
- JA_EN | 16.6 | 21.2 | 21.2 | 18.4 | 17.2 | 19.3 | 18.8 | 20.1 |
774
- EN_JA | 17.2 | 18.0 | 18.0 | 17.3 | 15.8 | 17.2 | 17.3 | 18.1 |
775
- ZH_EN | 25.0 | 34.7 | 34.9 | 23.6 | 22.5 | 25.5 | 25.1 | 27.6 |
776
- EN_ZH | 31.5 | 39.8 | 39.9 | 29.0 | 29.8 | 32.4 | 32.3 | 33.9 |
777
  COMPOSITE AVERAGE
778
- AVG | 23.8 | 33.2 | 32.7 | 24.7 | 26.7 | 28.7 | 28.4 | 29.8 |
 
557
 
558
  MATH MODELS:
559
 
560
+ MODEL | Deepseek-R1-Distill-Llama-8B | Deepseek-R1-Distill-Llama-8B | Deepseek-R1-Distill-Qwen-1.5B | Deepseek-R1-Distill-Qwen-7B | Deepseek-R1-Distill-Qwen-7B | Deepseek-R1-Distill-Qwen-14B | Deepseek-R1-Distill-Qwen-14B | Deepseek-R1-Distill-Qwen-32B | Deepseek-R1-Distill-Qwen-32B | GLM-4.7-Flash | GLM-4.7-Flash | GLM-Z1-9B-0414 | GLM-Z1-9B-0414 | GLM-Z1-9B-0414 | GLM-Z1-32B-0414 | Qwen2.5-Math-1.5B-Instruct | Qwen2.5-Math-7B-Instruct | Qwen3-32B | QwQ-32B | QwQ-32B |
561
+ ---------------------------------------------|------------------------------|------------------------------|-------------------------------|-----------------------------|-----------------------------|------------------------------|------------------------------|------------------------------|------------------------------|---------------|---------------|----------------|----------------|----------------|-----------------|----------------------------|--------------------------|-----------|---------|---------|
562
+ params | 8.03B | 8.03B | 1.78B | 7.62B | 7.62B | 14.77B | 14.77B | 32.76B | 32.76B | 29.94B | 29.94B | 9.40B | 9.40B | 9.40B | 32.57B | 1.54B | 7.62B | 32.8B | 32.76B | 32.76B |
563
+ quant | Q6_K | Q6_K_H | Q8_0 | IQ4_XS | Q6_K_H | IQ4_XS | Q4_K_H | IQ4_XS | Q4_K_H | Q4_K_H | Q6_K_H | Q4_K_H | Q4_P_H | Q6_K_H | Q4_K_H | IQ4_XS | Q6_K | Q4_K_H | IQ4_XS | Q4_K_H |
564
+ engine | llama.cpp version: 4707 | llama.cpp version: 5898 | llama.cpp version: 4763 | llama.cpp version: 4644 | llama.cpp version: 7699 | llama.cpp version: 4657 | llama.cpp version: 7710 | llama.cpp version: 4559 | llama.cpp version: 7719 | llama.cpp version: 7885 | llama.cpp version: 7845 | llama.cpp version: 7230 | llama.cpp version: 7268 | llama.cpp version: 5935 | llama.cpp version: 7607 | llama.cpp version: 4406 | llama.cpp version: 4394 | llama.cpp version: 5633 | llama.cpp version: 4820 | llama.cpp version: 6026 |
565
+ **TEST** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** |
566
+ GSM8K | - | _0.888_ | - | - | _0.888_ | - | _0.944_ | - | _0.956_ | _0.992_ | _0.984_ | _0.968_ | _0.968_ | _0.964_ | _0.988_ | - | - | - | - | _0.964_ |
567
+ APPLE | - | 0.870 | - | - | 0.810 | - | 0.790 | - | 0.810 | 0.960 | 1.000 | 0.880 | 0.920 | 0.880 | 0.910 | - | - | - | - | 0.880 |
568
+ GPQA_diamond | - | 0.308 | - | - | 0.323 | - | 0.489 | - | 0.494 | 0.444 | 0.444 | 0.555 | 0.540 | 0.434 | 0.585 | - | - | - | - | 0.555 |
569
+ GPQA | - | 0.308 | - | - | 0.323 | - | 0.489 | - | 0.494 | 0.444 | 0.444 | 0.555 | 0.540 | 0.434 | 0.585 | - | - | - | - | 0.555 |
570
+ MATH1_algebra | 0.933 | 0.977 | 0.918 | 0.962 | 1.000 | 0.925 | 0.970 | 0.962 | 0.985 | 1.000 | 1.000 | 0.985 | 0.992 | 0.992 | 0.992 | 0.859 | 0.955 | 0.992 | 0.992 | 1.000 |
571
+ MATH1_counting_and_probability | 0.820 | 0.948 | 0.794 | 0.948 | 1.000 | 0.923 | 0.948 | 0.948 | 0.948 | 1.000 | 1.000 | - | 0.974 | 1.000 | 1.000 | 0.897 | 0.974 | 1.000 | 0.974 | 1.000 |
572
+ MATH1_geometry | 0.842 | 0.868 | 0.710 | 0.736 | 0.921 | 0.868 | 0.921 | 0.921 | 0.894 | 1.000 | 0.973 | - | - | 0.947 | 0.947 | 0.710 | 0.842 | 0.842 | 0.921 | 0.921 |
573
+ MATH1_intermediate_algebra | 0.923 | 0.980 | 0.730 | 0.903 | 0.961 | 0.865 | 0.980 | 0.961 | 0.942 | 1.000 | 1.000 | - | - | 0.961 | 0.980 | 0.730 | 0.711 | 0.923 | 1.000 | 0.980 |
574
+ MATH1_number_theory | 0.700 | 0.900 | 0.866 | 0.800 | 1.000 | 0.700 | 0.966 | 0.933 | 0.966 | 1.000 | 1.000 | - | - | 0.900 | 1.000 | 0.766 | 1.000 | 0.666 | 0.800 | 0.833 |
575
+ MATH1_prealgebra | 0.813 | 0.941 | 0.883 | 0.965 | 0.976 | 0.883 | 0.988 | 0.953 | 0.988 | 1.000 | 1.000 | - | - | 0.988 | 1.000 | 0.837 | 0.883 | 0.930 | 0.953 | 0.976 |
576
+ MATH1_precalculus | 0.684 | 1.000 | 0.596 | 0.859 | 1.000 | 0.842 | 0.947 | 1.000 | 1.000 | 1.000 | 1.000 | - | - | 0.947 | 0.929 | 0.631 | 0.789 | 0.947 | 0.982 | 0.929 |
577
+ MATH1 | 0.842 | 0.956 | 0.814 | 0.910 | 0.983 | 0.878 | 0.965 | 0.958 | 0.970 | 1.000 | 0.997 | - | - | 0.972 | 0.981 | 0.794 | 0.885 | 0.931 | 0.963 | 0.965 |
578
+ MATH2_algebra | 0.845 | 0.975 | 0.825 | 0.930 | 0.985 | 0.900 | 0.965 | 0.995 | 0.985 | 1.000 | 1.000 | - | - | 0.980 | 0.995 | 0.910 | 0.860 | 0.970 | 0.975 | 0.990 |
579
+ MATH2_counting_and_probability | 0.831 | 0.930 | 0.782 | 0.851 | 0.891 | 0.841 | 0.881 | 0.950 | 0.900 | 1.000 | 1.000 | - | - | 0.980 | 0.980 | 0.683 | 0.861 | 0.970 | 0.990 | 0.970 |
580
+ MATH2_geometry | 0.841 | 0.963 | 0.743 | 0.914 | 0.951 | 0.792 | 0.939 | 0.914 | 0.987 | 1.000 | 1.000 | - | - | 0.987 | 0.939 | 0.621 | 0.743 | 0.792 | 0.963 | 0.963 |
581
+ MATH2_intermediate_algebra | 0.859 | 0.953 | 0.664 | 0.875 | 0.992 | 0.835 | 0.960 | 0.968 | 0.976 | 1.000 | 1.000 | - | - | 0.968 | 0.984 | 0.671 | 0.710 | 0.953 | 0.960 | 0.984 |
582
+ MATH2_number_theory | 0.826 | 0.913 | 0.782 | 0.826 | 0.967 | 0.891 | 0.934 | 0.934 | 0.913 | 0.989 | 0.989 | - | - | 0.956 | 0.989 | 0.695 | 0.880 | 0.891 | 0.945 | 0.967 |
583
+ MATH2_prealgebra | 0.898 | 0.966 | 0.887 | 0.909 | 0.977 | 0.875 | 0.949 | 0.971 | 0.971 | 1.000 | 1.000 | - | - | 0.988 | 0.988 | 0.836 | 0.881 | 0.932 | 0.971 | 0.960 |
584
+ MATH2_precalculus | 0.787 | 0.964 | 0.663 | 0.902 | 1.000 | 0.805 | 0.964 | 0.955 | 0.991 | 1.000 | 1.000 | - | - | 0.955 | 0.823 | 0.557 | 0.725 | 0.858 | 0.964 | 0.964 |
585
+ MATH2 | 0.846 | 0.956 | 0.777 | 0.893 | 0.970 | 0.856 | 0.946 | 0.963 | 0.965 | 0.998 | 0.998 | - | - | 0.975 | 0.963 | 0.742 | 0.817 | 0.921 | 0.968 | 0.973 |
586
+ MATH3_algebra | 0.873 | 0.938 | 0.854 | 0.934 | 0.984 | 0.911 | 0.961 | 0.992 | 0.980 | 0.988 | 1.000 | - | - | 0.980 | 0.988 | 0.881 | 0.850 | 0.969 | 0.996 | 0.984 |
587
+ MATH3_counting_and_probability | 0.800 | 0.890 | 0.730 | 0.770 | 0.920 | 0.830 | 0.850 | 0.930 | 0.940 | 1.000 | 1.000 | - | - | 0.970 | 0.980 | 0.710 | 0.880 | 0.950 | 1.000 | 1.000 |
588
+ MATH3_geometry | 0.794 | 0.901 | 0.627 | 0.911 | 0.960 | 0.794 | 0.911 | 0.901 | 0.941 | 0.970 | 0.990 | - | - | 0.970 | 0.950 | 0.696 | 0.764 | 0.833 | 0.970 | 0.931 |
589
+ MATH3_intermediate_algebra | 0.825 | 0.969 | 0.635 | 0.902 | 0.964 | 0.882 | 0.953 | 0.964 | 0.984 | 0.989 | 1.000 | - | - | 0.969 | 0.938 | 0.574 | 0.738 | 0.933 | 0.969 | 0.938 |
590
+ MATH3_number_theory | 0.819 | 0.934 | 0.696 | 0.754 | 0.942 | 0.770 | 0.844 | 0.926 | 0.909 | 0.991 | 1.000 | - | - | 0.926 | 0.942 | 0.655 | 0.819 | 0.811 | 0.942 | 0.918 |
591
+ MATH3_prealgebra | 0.875 | 0.950 | 0.763 | 0.883 | 0.959 | 0.892 | 0.941 | 0.946 | 0.950 | 0.995 | 1.000 | - | - | 0.986 | 0.986 | 0.816 | 0.883 | 0.946 | 0.982 | 0.977 |
592
+ MATH3_precalculus | 0.661 | 0.929 | 0.582 | 0.874 | 0.952 | 0.818 | 0.960 | 0.968 | 0.929 | 1.000 | 1.000 | - | - | 0.968 | 0.850 | 0.480 | 0.685 | 0.858 | 0.897 | 0.905 |
593
+ MATH3 | 0.822 | 0.937 | 0.719 | 0.876 | 0.960 | 0.859 | 0.929 | 0.954 | 0.954 | 0.991 | 0.999 | - | - | 0.970 | 0.954 | 0.714 | 0.810 | 0.915 | 0.969 | 0.955 |
594
+ MATH4_algebra | 0.848 | 0.950 | 0.805 | 0.897 | 0.982 | 0.922 | 0.964 | 0.957 | 0.985 | 0.992 | 1.000 | - | - | 0.989 | 0.978 | 0.851 | 0.865 | 0.968 | 0.992 | 0.985 |
595
+ MATH4_counting_and_probability | 0.729 | 0.882 | 0.639 | 0.738 | 0.900 | 0.711 | 0.864 | 0.945 | 0.945 | 0.990 | 0.990 | - | - | 0.963 | 0.945 | 0.558 | 0.783 | 0.945 | 0.981 | 0.981 |
596
+ MATH4_geometry | 0.792 | 0.896 | 0.576 | 0.776 | 0.896 | 0.768 | 0.832 | 0.832 | 0.856 | 0.976 | 0.976 | - | - | 0.920 | 0.888 | 0.432 | 0.616 | 0.712 | 0.872 | 0.840 |
597
+ MATH4_intermediate_algebra | 0.778 | 0.947 | 0.588 | 0.858 | 0.987 | 0.850 | 0.931 | 0.935 | 0.915 | 0.987 | 0.995 | - | - | 0.939 | 0.895 | 0.512 | 0.649 | 0.911 | 0.947 | 0.907 |
598
+ MATH4_number_theory | 0.795 | 0.950 | 0.697 | 0.809 | 0.915 | 0.725 | 0.859 | 0.894 | 0.929 | 0.992 | 0.992 | - | - | 0.929 | 0.964 | 0.619 | 0.823 | 0.823 | 0.943 | 0.936 |
599
+ MATH4_prealgebra | 0.806 | 0.931 | 0.785 | 0.874 | 0.958 | 0.827 | 0.942 | 0.921 | 0.931 | 0.989 | 0.989 | - | - | 0.958 | 0.963 | 0.748 | 0.801 | 0.879 | 0.926 | 0.942 |
600
+ MATH4_precalculus | 0.719 | 0.956 | 0.570 | 0.868 | 0.956 | 0.728 | 0.921 | 0.947 | 0.912 | 0.991 | 0.991 | - | - | 0.947 | 0.807 | 0.333 | 0.578 | 0.859 | 0.973 | 0.868 |
601
+ MATH4 | 0.792 | 0.935 | 0.684 | 0.845 | 0.953 | 0.816 | 0.915 | 0.925 | 0.932 | 0.989 | 0.992 | - | - | 0.953 | 0.929 | 0.620 | 0.746 | 0.887 | 0.952 | 0.930 |
602
+ MATH5_algebra | 0.768 | 0.947 | 0.752 | 0.899 | 0.970 | 0.853 | 0.934 | 0.970 | 0.967 | 0.986 | 0.996 | - | - | 0.960 | 0.967 | 0.674 | 0.762 | 0.964 | 0.964 | 0.977 |
603
+ MATH5_counting_and_probability | 0.699 | 0.910 | 0.569 | 0.756 | 0.829 | 0.699 | 0.788 | 0.910 | 0.861 | 0.959 | 0.983 | - | - | 0.934 | 0.967 | 0.495 | 0.642 | 0.910 | 0.934 | 0.902 |
604
+ MATH5_geometry | 0.712 | 0.886 | 0.545 | 0.810 | 0.840 | 0.727 | 0.818 | 0.840 | 0.810 | 0.946 | 0.984 | - | - | 0.878 | 0.810 | 0.348 | 0.507 | 0.734 | 0.833 | 0.742 |
605
+ MATH5_intermediate_algebra | 0.682 | 0.900 | 0.453 | 0.821 | 0.875 | 0.778 | 0.810 | 0.810 | 0.846 | 0.932 | 0.964 | - | - | 0.889 | 0.832 | 0.253 | 0.389 | 0.807 | 0.860 | 0.800 |
606
+ MATH5_number_theory | 0.811 | 0.909 | 0.707 | 0.727 | 0.935 | 0.792 | 0.915 | 0.935 | 0.935 | 0.967 | 0.993 | - | - | 0.961 | 0.954 | 0.525 | 0.753 | 0.870 | 0.941 | 0.935 |
607
+ MATH5_prealgebra | 0.777 | 0.849 | 0.720 | 0.808 | 0.896 | 0.782 | 0.823 | 0.875 | 0.891 | 0.958 | 0.984 | - | - | 0.953 | 0.937 | 0.580 | 0.797 | 0.911 | 0.927 | 0.948 |
608
+ MATH5_precalculus | 0.562 | 0.903 | 0.437 | 0.851 | 0.851 | 0.792 | 0.800 | 0.814 | 0.851 | 0.918 | 0.955 | - | - | 0.888 | 0.666 | 0.259 | 0.429 | 0.777 | 0.851 | 0.770 |
609
+ MATH5 | 0.723 | 0.904 | 0.609 | 0.822 | 0.897 | 0.787 | 0.851 | 0.884 | 0.889 | 0.955 | 0.981 | - | - | 0.926 | 0.886 | 0.462 | 0.617 | 0.865 | 0.907 | 0.879 |
610
+ MATHCOT | 0.795 | _0.930_ | 0.700 | 0.860 | _0.940_ | 0.831 | _0.910_ | 0.930 | _0.934_ | _0.983_ | _0.992_ | - | - | _0.954_ | _0.936_ | 0.637 | 0.751 | 0.897 | 0.948 | _0.933_ |
611
  COMPOSITE AVERAGE
612
+ AVG | 0.795 | _0.907_ | 0.700 | 0.860 | _0.918_ | 0.831 | _0.895_ | 0.930 | _0.918_ | _0.964_ | _0.972_ | - | - | _0.936_ | _0.923_ | 0.637 | 0.751 | 0.897 | 0.948 | _0.920_ |
613
 
614
  VISION MODELS:
615
 
 
730
 
731
  MT MODELS:
732
 
733
+ MODEL | HY-MT1.5-7B | madlad400-7b-mt | madlad400-10b-mt | madlad400-10b-mt | plamo-2-translate | translategemma-4b-it | translategemma-12b-it | translategemma-12b-it | translategemma-27b-it |
734
+ ---------------------------------------------|-------------|-----------------|------------------|------------------|-------------------|----------------------|-----------------------|-----------------------|-----------------------|
735
+ params | 7.50B | 8.30B | 10.71B | 10.71B | 9.53B | 3.88B | 11.77B | 11.77B | 27.01B |
736
+ quant | Q6_K_H | Q4_K_H | Q4_K_H | Q6_K_H | Q6_K_H | Q6_K_H | Q4_K_H | Q6_K_H | Q4_K_H |
737
+ engine | llama.cpp version: 7789 | llama.cpp version: 7885 | llama.cpp version: 7830 | llama.cpp version: 7772 | llama.cpp version: 7762 | llama.cpp version: 7760 | llama.cpp version: 7779 | llama.cpp version: 7779 | llama.cpp version: 7789 |
738
+ **TEST** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** |
739
+ FLORES200_de_en | 32.2 | 42.8 | 42.3 | 42.3 | 35.5 | 37.8 | 40.3 | 40.1 | 42.2 |
740
+ FLORES200_en_de | 3.0 | 37.8 | 36.6 | 36.9 | 24.1 | 31.0 | 34.5 | 34.2 | 35.3 |
741
+ FLORES200_en_es | 25.5 | 26.8 | 26.5 | 26.5 | 21.3 | 25.1 | 27.0 | 27.3 | 27.2 |
742
+ FLORES200_en_fr | 37.2 | 50.0 | 49.7 | 49.8 | 36.5 | 40.9 | 44.7 | 44.2 | 45.2 |
743
+ FLORES200_en_ja | 31.0 | 22.8 | 22.2 | 22.3 | 28.2 | 27.7 | 30.7 | 30.8 | 31.0 |
744
+ FLORES200_en_ru | 22.6 | 29.5 | 28.7 | 28.8 | 18.2 | 25.4 | 27.5 | 28.1 | 28.2 |
745
+ FLORES200_en_zh | 37.0 | 37.2 | 37.2 | 37.3 | 33.6 | 36.6 | 39.7 | 39.5 | 41.9 |
746
+ FLORES200_es_en | 23.9 | 29.4 | 29.1 | 29.2 | 24.7 | 28.0 | 28.7 | 29.0 | 29.9 |
747
+ FLORES200_fr_en | 32.6 | 44.3 | 44.1 | 44.3 | 36.4 | 38.8 | 41.2 | 41.7 | 43.4 |
748
+ FLORES200_ja_en | 20.5 | 26.8 | 25.9 | 25.9 | 23.4 | 21.5 | 24.1 | 23.9 | 26.6 |
749
+ FLORES200_ru_en | 26.7 | 34.9 | 34.7 | 34.9 | 29.0 | 30.7 | 33.0 | 32.7 | 34.8 |
750
+ FLORES200_zh_en | 22.9 | 28.3 | 27.3 | 27.3 | 23.4 | 23.5 | 25.7 | 25.6 | 27.8 |
751
+ FLORES200 | 26.3 | 34.2 | 33.7 | 33.8 | 27.9 | 30.6 | 33.1 | 33.1 | 34.4 |
752
+ OPUS_de_en | 25.9 | 35.6 | 35.6 | 27.1 | 21.3 | 28.2 | 29.8 | 29.2 | 30.5 |
753
+ OPUS_en_de | 11.0 | 33.2 | 32.9 | 30.7 | 21.1 | 25.5 | 26.5 | 25.5 | 27.1 |
754
+ OPUS_en_es | 29.0 | 37.7 | 37.1 | 37.1 | 27.9 | 31.1 | 32.0 | 31.3 | 32.4 |
755
+ OPUS_en_fr | 24.7 | 34.3 | 34.2 | 34.3 | 26.7 | 27.0 | 28.8 | 28.8 | 29.0 |
756
+ OPUS_en_ja | 10.2 | 16.0 | 16.0 | 15.9 | 11.9 | 9.8 | 10.5 | 10.5 | 11.7 |
757
+ OPUS_en_ru | 22.0 | 31.4 | 31.7 | 31.6 | 19.7 | 23.1 | 24.9 | 24.4 | 25.5 |
758
+ OPUS_en_zh | 28.8 | 41.2 | 41.1 | 41.3 | 26.7 | 26.4 | 28.9 | 28.8 | 29.9 |
759
+ OPUS_es_en | 28.4 | 40.6 | 40.2 | 40.3 | 26.7 | 33.2 | 35.0 | 34.5 | 36.4 |
760
+ OPUS_fr_en | 25.8 | 36.4 | 35.8 | 35.8 | 29.6 | 29.2 | 30.9 | 30.6 | 32.3 |
761
+ OPUS_ja_en | 14.8 | 19.4 | 18.9 | 18.8 | 16.0 | 15.1 | 17.0 | 16.2 | 16.8 |
762
+ OPUS_ru_en | 24.5 | 35.2 | 34.7 | 34.8 | 27.1 | 27.1 | 29.0 | 28.3 | 30.2 |
763
+ OPUS_zh_en | 26.1 | 39.4 | 38.5 | 38.8 | 23.8 | 22.1 | 25.5 | 24.9 | 27.6 |
764
+ OPUS | 22.6 | 33.4 | 33.1 | 32.2 | 23.2 | 24.8 | 26.6 | 26.1 | 27.4 |
765
+ DE_EN | 28.0 | 38.0 | 37.8 | 32.1 | 26.0 | 31.4 | 33.3 | 32.8 | 34.4 |
766
+ EN_DE | 8.3 | 34.7 | 34.1 | 32.7 | 22.1 | 27.3 | 29.2 | 28.3 | 29.8 |
767
+ ES_EN | 26.8 | 36.8 | 36.4 | 36.5 | 26.0 | 31.4 | 32.8 | 32.6 | 34.2 |
768
+ EN_ES | 27.8 | 34.0 | 33.5 | 33.5 | 25.6 | 29.0 | 30.2 | 29.9 | 30.6 |
769
+ FR_EN | 28.0 | 39.0 | 38.6 | 38.6 | 31.9 | 32.4 | 34.3 | 34.3 | 36.0 |
770
+ EN_FR | 28.9 | 39.5 | 39.4 | 39.5 | 30.0 | 31.6 | 34.1 | 33.9 | 34.4 |
771
+ RU_EN | 25.2 | 35.1 | 34.7 | 34.8 | 27.7 | 28.2 | 30.3 | 29.8 | 31.7 |
772
+ EN_RU | 22.2 | 30.7 | 30.7 | 30.6 | 19.1 | 23.8 | 25.7 | 25.6 | 26.3 |
773
+ JA_EN | 16.6 | 21.8 | 21.2 | 21.2 | 18.4 | 17.2 | 19.3 | 18.8 | 20.1 |
774
+ EN_JA | 17.2 | 18.2 | 18.0 | 18.0 | 17.3 | 15.8 | 17.2 | 17.3 | 18.1 |
775
+ ZH_EN | 25.0 | 35.6 | 34.7 | 34.9 | 23.6 | 22.5 | 25.5 | 25.1 | 27.6 |
776
+ EN_ZH | 31.5 | 39.8 | 39.8 | 39.9 | 29.0 | 29.8 | 32.4 | 32.3 | 33.9 |
777
  COMPOSITE AVERAGE
778
+ AVG | 23.8 | 33.6 | 33.2 | 32.7 | 24.7 | 26.7 | 28.7 | 28.4 | 29.8 |