steampunque commited on
Commit
2eb1fab
·
1 Parent(s): e48dee5

Add GLM-4.7-Flash Q4_P_H Math results

Browse files
Files changed (1) hide show
  1. README.md +52 -52
README.md CHANGED
@@ -557,59 +557,59 @@ CODE MODELS:
557
 
558
  MATH MODELS:
559
 
560
- MODEL | Deepseek-R1-Distill-Llama-8B | Deepseek-R1-Distill-Llama-8B | Deepseek-R1-Distill-Qwen-1.5B | Deepseek-R1-Distill-Qwen-7B | Deepseek-R1-Distill-Qwen-7B | Deepseek-R1-Distill-Qwen-14B | Deepseek-R1-Distill-Qwen-14B | Deepseek-R1-Distill-Qwen-32B | Deepseek-R1-Distill-Qwen-32B | GLM-Z1-9B-0414 | GLM-Z1-9B-0414 | GLM-Z1-9B-0414 | GLM-Z1-32B-0414 | Qwen2.5-Math-1.5B-Instruct | Qwen2.5-Math-7B-Instruct | Qwen3-32B | QwQ-32B | QwQ-32B |
561
- ---------------------------------------------|------------------------------|------------------------------|-------------------------------|-----------------------------|-----------------------------|------------------------------|------------------------------|------------------------------|------------------------------|----------------|----------------|----------------|-----------------|----------------------------|--------------------------|-----------|---------|---------|
562
- params | 8.03B | 8.03B | 1.78B | 7.62B | 7.62B | 14.77B | 14.77B | 32.76B | 32.76B | 9.40B | 9.40B | 9.40B | 32.57B | 1.54B | 7.62B | 32.8B | 32.76B | 32.76B |
563
- quant | Q6_K | Q6_K_H | Q8_0 | IQ4_XS | Q6_K_H | IQ4_XS | Q4_K_H | IQ4_XS | Q4_K_H | Q4_K_H | Q4_P_H | Q6_K_H | Q4_K_H | IQ4_XS | Q6_K | Q4_K_H | IQ4_XS | Q4_K_H |
564
- engine | llama.cpp version: 4707 | llama.cpp version: 5898 | llama.cpp version: 4763 | llama.cpp version: 4644 | llama.cpp version: 7699 | llama.cpp version: 4657 | llama.cpp version: 7710 | llama.cpp version: 4559 | llama.cpp version: 7719 | llama.cpp version: 7230 | llama.cpp version: 7268 | llama.cpp version: 5935 | llama.cpp version: 7607 | llama.cpp version: 4406 | llama.cpp version: 4394 | llama.cpp version: 5633 | llama.cpp version: 4820 | llama.cpp version: 6026 |
565
- **TEST** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** |
566
- GSM8K | - | _0.888_ | - | - | _0.888_ | - | _0.944_ | - | _0.956_ | _0.968_ | _0.968_ | _0.964_ | _0.988_ | - | - | - | - | _0.964_ |
567
- APPLE | - | 0.870 | - | - | 0.810 | - | 0.790 | - | 0.810 | 0.880 | 0.920 | 0.880 | 0.910 | - | - | - | - | 0.880 |
568
- GPQA_diamond | - | 0.308 | - | - | 0.323 | - | 0.489 | - | 0.494 | 0.555 | 0.540 | 0.434 | 0.585 | - | - | - | - | 0.555 |
569
- GPQA | - | 0.308 | - | - | 0.323 | - | 0.489 | - | 0.494 | 0.555 | 0.540 | 0.434 | 0.585 | - | - | - | - | 0.555 |
570
- MATH1_algebra | 0.933 | 0.977 | 0.918 | 0.962 | 1.000 | 0.925 | 0.970 | 0.962 | 0.985 | 0.985 | 0.992 | 0.992 | 0.992 | 0.859 | 0.955 | 0.992 | 0.992 | 1.000 |
571
- MATH1_counting_and_probability | 0.820 | 0.948 | 0.794 | 0.948 | 1.000 | 0.923 | 0.948 | 0.948 | 0.948 | - | 0.974 | 1.000 | 1.000 | 0.897 | 0.974 | 1.000 | 0.974 | 1.000 |
572
- MATH1_geometry | 0.842 | 0.868 | 0.710 | 0.736 | 0.921 | 0.868 | 0.921 | 0.921 | 0.894 | - | - | 0.947 | 0.947 | 0.710 | 0.842 | 0.842 | 0.921 | 0.921 |
573
- MATH1_intermediate_algebra | 0.923 | 0.980 | 0.730 | 0.903 | 0.961 | 0.865 | 0.980 | 0.961 | 0.942 | - | - | 0.961 | 0.980 | 0.730 | 0.711 | 0.923 | 1.000 | 0.980 |
574
- MATH1_number_theory | 0.700 | 0.900 | 0.866 | 0.800 | 1.000 | 0.700 | 0.966 | 0.933 | 0.966 | - | - | 0.900 | 1.000 | 0.766 | 1.000 | 0.666 | 0.800 | 0.833 |
575
- MATH1_prealgebra | 0.813 | 0.941 | 0.883 | 0.965 | 0.976 | 0.883 | 0.988 | 0.953 | 0.988 | - | - | 0.988 | 1.000 | 0.837 | 0.883 | 0.930 | 0.953 | 0.976 |
576
- MATH1_precalculus | 0.684 | 1.000 | 0.596 | 0.859 | 1.000 | 0.842 | 0.947 | 1.000 | 1.000 | - | - | 0.947 | 0.929 | 0.631 | 0.789 | 0.947 | 0.982 | 0.929 |
577
- MATH1 | 0.842 | 0.956 | 0.814 | 0.910 | 0.983 | 0.878 | 0.965 | 0.958 | 0.970 | - | - | 0.972 | 0.981 | 0.794 | 0.885 | 0.931 | 0.963 | 0.965 |
578
- MATH2_algebra | 0.845 | 0.975 | 0.825 | 0.930 | 0.985 | 0.900 | 0.965 | 0.995 | 0.985 | - | - | 0.980 | 0.995 | 0.910 | 0.860 | 0.970 | 0.975 | 0.990 |
579
- MATH2_counting_and_probability | 0.831 | 0.930 | 0.782 | 0.851 | 0.891 | 0.841 | 0.881 | 0.950 | 0.900 | - | - | 0.980 | 0.980 | 0.683 | 0.861 | 0.970 | 0.990 | 0.970 |
580
- MATH2_geometry | 0.841 | 0.963 | 0.743 | 0.914 | 0.951 | 0.792 | 0.939 | 0.914 | 0.987 | - | - | 0.987 | 0.939 | 0.621 | 0.743 | 0.792 | 0.963 | 0.963 |
581
- MATH2_intermediate_algebra | 0.859 | 0.953 | 0.664 | 0.875 | 0.992 | 0.835 | 0.960 | 0.968 | 0.976 | - | - | 0.968 | 0.984 | 0.671 | 0.710 | 0.953 | 0.960 | 0.984 |
582
- MATH2_number_theory | 0.826 | 0.913 | 0.782 | 0.826 | 0.967 | 0.891 | 0.934 | 0.934 | 0.913 | - | - | 0.956 | 0.989 | 0.695 | 0.880 | 0.891 | 0.945 | 0.967 |
583
- MATH2_prealgebra | 0.898 | 0.966 | 0.887 | 0.909 | 0.977 | 0.875 | 0.949 | 0.971 | 0.971 | - | - | 0.988 | 0.988 | 0.836 | 0.881 | 0.932 | 0.971 | 0.960 |
584
- MATH2_precalculus | 0.787 | 0.964 | 0.663 | 0.902 | 1.000 | 0.805 | 0.964 | 0.955 | 0.991 | - | - | 0.955 | 0.823 | 0.557 | 0.725 | 0.858 | 0.964 | 0.964 |
585
- MATH2 | 0.846 | 0.956 | 0.777 | 0.893 | 0.970 | 0.856 | 0.946 | 0.963 | 0.965 | - | - | 0.975 | 0.963 | 0.742 | 0.817 | 0.921 | 0.968 | 0.973 |
586
- MATH3_algebra | 0.873 | 0.938 | 0.854 | 0.934 | 0.984 | 0.911 | 0.961 | 0.992 | 0.980 | - | - | 0.980 | 0.988 | 0.881 | 0.850 | 0.969 | 0.996 | 0.984 |
587
- MATH3_counting_and_probability | 0.800 | 0.890 | 0.730 | 0.770 | 0.920 | 0.830 | 0.850 | 0.930 | 0.940 | - | - | 0.970 | 0.980 | 0.710 | 0.880 | 0.950 | 1.000 | 1.000 |
588
- MATH3_geometry | 0.794 | 0.901 | 0.627 | 0.911 | 0.960 | 0.794 | 0.911 | 0.901 | 0.941 | - | - | 0.970 | 0.950 | 0.696 | 0.764 | 0.833 | 0.970 | 0.931 |
589
- MATH3_intermediate_algebra | 0.825 | 0.969 | 0.635 | 0.902 | 0.964 | 0.882 | 0.953 | 0.964 | 0.984 | - | - | 0.969 | 0.938 | 0.574 | 0.738 | 0.933 | 0.969 | 0.938 |
590
- MATH3_number_theory | 0.819 | 0.934 | 0.696 | 0.754 | 0.942 | 0.770 | 0.844 | 0.926 | 0.909 | - | - | 0.926 | 0.942 | 0.655 | 0.819 | 0.811 | 0.942 | 0.918 |
591
- MATH3_prealgebra | 0.875 | 0.950 | 0.763 | 0.883 | 0.959 | 0.892 | 0.941 | 0.946 | 0.950 | - | - | 0.986 | 0.986 | 0.816 | 0.883 | 0.946 | 0.982 | 0.977 |
592
- MATH3_precalculus | 0.661 | 0.929 | 0.582 | 0.874 | 0.952 | 0.818 | 0.960 | 0.968 | 0.929 | - | - | 0.968 | 0.850 | 0.480 | 0.685 | 0.858 | 0.897 | 0.905 |
593
- MATH3 | 0.822 | 0.937 | 0.719 | 0.876 | 0.960 | 0.859 | 0.929 | 0.954 | 0.954 | - | - | 0.970 | 0.954 | 0.714 | 0.810 | 0.915 | 0.969 | 0.955 |
594
- MATH4_algebra | 0.848 | 0.950 | 0.805 | 0.897 | 0.982 | 0.922 | 0.964 | 0.957 | 0.985 | - | - | 0.989 | 0.978 | 0.851 | 0.865 | 0.968 | 0.992 | 0.985 |
595
- MATH4_counting_and_probability | 0.729 | 0.882 | 0.639 | 0.738 | 0.900 | 0.711 | 0.864 | 0.945 | 0.945 | - | - | 0.963 | 0.945 | 0.558 | 0.783 | 0.945 | 0.981 | 0.981 |
596
- MATH4_geometry | 0.792 | 0.896 | 0.576 | 0.776 | 0.896 | 0.768 | 0.832 | 0.832 | 0.856 | - | - | 0.920 | 0.888 | 0.432 | 0.616 | 0.712 | 0.872 | 0.840 |
597
- MATH4_intermediate_algebra | 0.778 | 0.947 | 0.588 | 0.858 | 0.987 | 0.850 | 0.931 | 0.935 | 0.915 | - | - | 0.939 | 0.895 | 0.512 | 0.649 | 0.911 | 0.947 | 0.907 |
598
- MATH4_number_theory | 0.795 | 0.950 | 0.697 | 0.809 | 0.915 | 0.725 | 0.859 | 0.894 | 0.929 | - | - | 0.929 | 0.964 | 0.619 | 0.823 | 0.823 | 0.943 | 0.936 |
599
- MATH4_prealgebra | 0.806 | 0.931 | 0.785 | 0.874 | 0.958 | 0.827 | 0.942 | 0.921 | 0.931 | - | - | 0.958 | 0.963 | 0.748 | 0.801 | 0.879 | 0.926 | 0.942 |
600
- MATH4_precalculus | 0.719 | 0.956 | 0.570 | 0.868 | 0.956 | 0.728 | 0.921 | 0.947 | 0.912 | - | - | 0.947 | 0.807 | 0.333 | 0.578 | 0.859 | 0.973 | 0.868 |
601
- MATH4 | 0.792 | 0.935 | 0.684 | 0.845 | 0.953 | 0.816 | 0.915 | 0.925 | 0.932 | - | - | 0.953 | 0.929 | 0.620 | 0.746 | 0.887 | 0.952 | 0.930 |
602
- MATH5_algebra | 0.768 | 0.947 | 0.752 | 0.899 | 0.970 | 0.853 | 0.934 | 0.970 | 0.967 | - | - | 0.960 | 0.967 | 0.674 | 0.762 | 0.964 | 0.964 | 0.977 |
603
- MATH5_counting_and_probability | 0.699 | 0.910 | 0.569 | 0.756 | 0.829 | 0.699 | 0.788 | 0.910 | 0.861 | - | - | 0.934 | 0.967 | 0.495 | 0.642 | 0.910 | 0.934 | 0.902 |
604
- MATH5_geometry | 0.712 | 0.886 | 0.545 | 0.810 | 0.840 | 0.727 | 0.818 | 0.840 | 0.810 | - | - | 0.878 | 0.810 | 0.348 | 0.507 | 0.734 | 0.833 | 0.742 |
605
- MATH5_intermediate_algebra | 0.682 | 0.900 | 0.453 | 0.821 | 0.875 | 0.778 | 0.810 | 0.810 | 0.846 | - | - | 0.889 | 0.832 | 0.253 | 0.389 | 0.807 | 0.860 | 0.800 |
606
- MATH5_number_theory | 0.811 | 0.909 | 0.707 | 0.727 | 0.935 | 0.792 | 0.915 | 0.935 | 0.935 | - | - | 0.961 | 0.954 | 0.525 | 0.753 | 0.870 | 0.941 | 0.935 |
607
- MATH5_prealgebra | 0.777 | 0.849 | 0.720 | 0.808 | 0.896 | 0.782 | 0.823 | 0.875 | 0.891 | - | - | 0.953 | 0.937 | 0.580 | 0.797 | 0.911 | 0.927 | 0.948 |
608
- MATH5_precalculus | 0.562 | 0.903 | 0.437 | 0.851 | 0.851 | 0.792 | 0.800 | 0.814 | 0.851 | - | - | 0.888 | 0.666 | 0.259 | 0.429 | 0.777 | 0.851 | 0.770 |
609
- MATH5 | 0.723 | 0.904 | 0.609 | 0.822 | 0.897 | 0.787 | 0.851 | 0.884 | 0.889 | - | - | 0.926 | 0.886 | 0.462 | 0.617 | 0.865 | 0.907 | 0.879 |
610
- MATHCOT | 0.795 | _0.930_ | 0.700 | 0.860 | _0.940_ | 0.831 | _0.910_ | 0.930 | _0.934_ | - | - | _0.954_ | _0.936_ | 0.637 | 0.751 | 0.897 | 0.948 | _0.933_ |
611
  COMPOSITE AVERAGE
612
- AVG | 0.795 | _0.907_ | 0.700 | 0.860 | _0.918_ | 0.831 | _0.895_ | 0.930 | _0.918_ | - | - | _0.936_ | _0.923_ | 0.637 | 0.751 | 0.897 | 0.948 | _0.920_ |
613
 
614
  VISION MODELS:
615
 
 
557
 
558
  MATH MODELS:
559
 
560
+ MODEL | Deepseek-R1-Distill-Llama-8B | Deepseek-R1-Distill-Llama-8B | Deepseek-R1-Distill-Qwen-1.5B | Deepseek-R1-Distill-Qwen-7B | Deepseek-R1-Distill-Qwen-7B | Deepseek-R1-Distill-Qwen-14B | Deepseek-R1-Distill-Qwen-14B | Deepseek-R1-Distill-Qwen-32B | Deepseek-R1-Distill-Qwen-32B | GLM-4.7-Flash | GLM-Z1-9B-0414 | GLM-Z1-9B-0414 | GLM-Z1-9B-0414 | GLM-Z1-32B-0414 | Qwen2.5-Math-1.5B-Instruct | Qwen2.5-Math-7B-Instruct | Qwen3-32B | QwQ-32B | QwQ-32B |
561
+ ---------------------------------------------|------------------------------|------------------------------|-------------------------------|-----------------------------|-----------------------------|------------------------------|------------------------------|------------------------------|------------------------------|---------------|----------------|----------------|----------------|-----------------|----------------------------|--------------------------|-----------|---------|---------|
562
+ params | 8.03B | 8.03B | 1.78B | 7.62B | 7.62B | 14.77B | 14.77B | 32.76B | 32.76B | 29.94B | 9.40B | 9.40B | 9.40B | 32.57B | 1.54B | 7.62B | 32.8B | 32.76B | 32.76B |
563
+ quant | Q6_K | Q6_K_H | Q8_0 | IQ4_XS | Q6_K_H | IQ4_XS | Q4_K_H | IQ4_XS | Q4_K_H | Q4_K_H | Q4_K_H | Q4_P_H | Q6_K_H | Q4_K_H | IQ4_XS | Q6_K | Q4_K_H | IQ4_XS | Q4_K_H |
564
+ engine | llama.cpp version: 4707 | llama.cpp version: 5898 | llama.cpp version: 4763 | llama.cpp version: 4644 | llama.cpp version: 7699 | llama.cpp version: 4657 | llama.cpp version: 7710 | llama.cpp version: 4559 | llama.cpp version: 7719 | llama.cpp version: 7885 | llama.cpp version: 7230 | llama.cpp version: 7268 | llama.cpp version: 5935 | llama.cpp version: 7607 | llama.cpp version: 4406 | llama.cpp version: 4394 | llama.cpp version: 5633 | llama.cpp version: 4820 | llama.cpp version: 6026 |
565
+ **TEST** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** | **acc** |
566
+ GSM8K | - | _0.888_ | - | - | _0.888_ | - | _0.944_ | - | _0.956_ | _0.992_ | _0.968_ | _0.968_ | _0.964_ | _0.988_ | - | - | - | - | _0.964_ |
567
+ APPLE | - | 0.870 | - | - | 0.810 | - | 0.790 | - | 0.810 | 0.960 | 0.880 | 0.920 | 0.880 | 0.910 | - | - | - | - | 0.880 |
568
+ GPQA_diamond | - | 0.308 | - | - | 0.323 | - | 0.489 | - | 0.494 | 0.444 | 0.555 | 0.540 | 0.434 | 0.585 | - | - | - | - | 0.555 |
569
+ GPQA | - | 0.308 | - | - | 0.323 | - | 0.489 | - | 0.494 | 0.444 | 0.555 | 0.540 | 0.434 | 0.585 | - | - | - | - | 0.555 |
570
+ MATH1_algebra | 0.933 | 0.977 | 0.918 | 0.962 | 1.000 | 0.925 | 0.970 | 0.962 | 0.985 | 1.000 | 0.985 | 0.992 | 0.992 | 0.992 | 0.859 | 0.955 | 0.992 | 0.992 | 1.000 |
571
+ MATH1_counting_and_probability | 0.820 | 0.948 | 0.794 | 0.948 | 1.000 | 0.923 | 0.948 | 0.948 | 0.948 | 1.000 | - | 0.974 | 1.000 | 1.000 | 0.897 | 0.974 | 1.000 | 0.974 | 1.000 |
572
+ MATH1_geometry | 0.842 | 0.868 | 0.710 | 0.736 | 0.921 | 0.868 | 0.921 | 0.921 | 0.894 | 1.000 | - | - | 0.947 | 0.947 | 0.710 | 0.842 | 0.842 | 0.921 | 0.921 |
573
+ MATH1_intermediate_algebra | 0.923 | 0.980 | 0.730 | 0.903 | 0.961 | 0.865 | 0.980 | 0.961 | 0.942 | 1.000 | - | - | 0.961 | 0.980 | 0.730 | 0.711 | 0.923 | 1.000 | 0.980 |
574
+ MATH1_number_theory | 0.700 | 0.900 | 0.866 | 0.800 | 1.000 | 0.700 | 0.966 | 0.933 | 0.966 | 1.000 | - | - | 0.900 | 1.000 | 0.766 | 1.000 | 0.666 | 0.800 | 0.833 |
575
+ MATH1_prealgebra | 0.813 | 0.941 | 0.883 | 0.965 | 0.976 | 0.883 | 0.988 | 0.953 | 0.988 | 1.000 | - | - | 0.988 | 1.000 | 0.837 | 0.883 | 0.930 | 0.953 | 0.976 |
576
+ MATH1_precalculus | 0.684 | 1.000 | 0.596 | 0.859 | 1.000 | 0.842 | 0.947 | 1.000 | 1.000 | 1.000 | - | - | 0.947 | 0.929 | 0.631 | 0.789 | 0.947 | 0.982 | 0.929 |
577
+ MATH1 | 0.842 | 0.956 | 0.814 | 0.910 | 0.983 | 0.878 | 0.965 | 0.958 | 0.970 | 1.000 | - | - | 0.972 | 0.981 | 0.794 | 0.885 | 0.931 | 0.963 | 0.965 |
578
+ MATH2_algebra | 0.845 | 0.975 | 0.825 | 0.930 | 0.985 | 0.900 | 0.965 | 0.995 | 0.985 | 1.000 | - | - | 0.980 | 0.995 | 0.910 | 0.860 | 0.970 | 0.975 | 0.990 |
579
+ MATH2_counting_and_probability | 0.831 | 0.930 | 0.782 | 0.851 | 0.891 | 0.841 | 0.881 | 0.950 | 0.900 | 1.000 | - | - | 0.980 | 0.980 | 0.683 | 0.861 | 0.970 | 0.990 | 0.970 |
580
+ MATH2_geometry | 0.841 | 0.963 | 0.743 | 0.914 | 0.951 | 0.792 | 0.939 | 0.914 | 0.987 | 1.000 | - | - | 0.987 | 0.939 | 0.621 | 0.743 | 0.792 | 0.963 | 0.963 |
581
+ MATH2_intermediate_algebra | 0.859 | 0.953 | 0.664 | 0.875 | 0.992 | 0.835 | 0.960 | 0.968 | 0.976 | 1.000 | - | - | 0.968 | 0.984 | 0.671 | 0.710 | 0.953 | 0.960 | 0.984 |
582
+ MATH2_number_theory | 0.826 | 0.913 | 0.782 | 0.826 | 0.967 | 0.891 | 0.934 | 0.934 | 0.913 | 0.989 | - | - | 0.956 | 0.989 | 0.695 | 0.880 | 0.891 | 0.945 | 0.967 |
583
+ MATH2_prealgebra | 0.898 | 0.966 | 0.887 | 0.909 | 0.977 | 0.875 | 0.949 | 0.971 | 0.971 | 1.000 | - | - | 0.988 | 0.988 | 0.836 | 0.881 | 0.932 | 0.971 | 0.960 |
584
+ MATH2_precalculus | 0.787 | 0.964 | 0.663 | 0.902 | 1.000 | 0.805 | 0.964 | 0.955 | 0.991 | 1.000 | - | - | 0.955 | 0.823 | 0.557 | 0.725 | 0.858 | 0.964 | 0.964 |
585
+ MATH2 | 0.846 | 0.956 | 0.777 | 0.893 | 0.970 | 0.856 | 0.946 | 0.963 | 0.965 | 0.998 | - | - | 0.975 | 0.963 | 0.742 | 0.817 | 0.921 | 0.968 | 0.973 |
586
+ MATH3_algebra | 0.873 | 0.938 | 0.854 | 0.934 | 0.984 | 0.911 | 0.961 | 0.992 | 0.980 | 0.988 | - | - | 0.980 | 0.988 | 0.881 | 0.850 | 0.969 | 0.996 | 0.984 |
587
+ MATH3_counting_and_probability | 0.800 | 0.890 | 0.730 | 0.770 | 0.920 | 0.830 | 0.850 | 0.930 | 0.940 | 1.000 | - | - | 0.970 | 0.980 | 0.710 | 0.880 | 0.950 | 1.000 | 1.000 |
588
+ MATH3_geometry | 0.794 | 0.901 | 0.627 | 0.911 | 0.960 | 0.794 | 0.911 | 0.901 | 0.941 | 0.970 | - | - | 0.970 | 0.950 | 0.696 | 0.764 | 0.833 | 0.970 | 0.931 |
589
+ MATH3_intermediate_algebra | 0.825 | 0.969 | 0.635 | 0.902 | 0.964 | 0.882 | 0.953 | 0.964 | 0.984 | 0.989 | - | - | 0.969 | 0.938 | 0.574 | 0.738 | 0.933 | 0.969 | 0.938 |
590
+ MATH3_number_theory | 0.819 | 0.934 | 0.696 | 0.754 | 0.942 | 0.770 | 0.844 | 0.926 | 0.909 | 0.991 | - | - | 0.926 | 0.942 | 0.655 | 0.819 | 0.811 | 0.942 | 0.918 |
591
+ MATH3_prealgebra | 0.875 | 0.950 | 0.763 | 0.883 | 0.959 | 0.892 | 0.941 | 0.946 | 0.950 | 0.995 | - | - | 0.986 | 0.986 | 0.816 | 0.883 | 0.946 | 0.982 | 0.977 |
592
+ MATH3_precalculus | 0.661 | 0.929 | 0.582 | 0.874 | 0.952 | 0.818 | 0.960 | 0.968 | 0.929 | 1.000 | - | - | 0.968 | 0.850 | 0.480 | 0.685 | 0.858 | 0.897 | 0.905 |
593
+ MATH3 | 0.822 | 0.937 | 0.719 | 0.876 | 0.960 | 0.859 | 0.929 | 0.954 | 0.954 | 0.991 | - | - | 0.970 | 0.954 | 0.714 | 0.810 | 0.915 | 0.969 | 0.955 |
594
+ MATH4_algebra | 0.848 | 0.950 | 0.805 | 0.897 | 0.982 | 0.922 | 0.964 | 0.957 | 0.985 | 0.992 | - | - | 0.989 | 0.978 | 0.851 | 0.865 | 0.968 | 0.992 | 0.985 |
595
+ MATH4_counting_and_probability | 0.729 | 0.882 | 0.639 | 0.738 | 0.900 | 0.711 | 0.864 | 0.945 | 0.945 | 0.990 | - | - | 0.963 | 0.945 | 0.558 | 0.783 | 0.945 | 0.981 | 0.981 |
596
+ MATH4_geometry | 0.792 | 0.896 | 0.576 | 0.776 | 0.896 | 0.768 | 0.832 | 0.832 | 0.856 | 0.976 | - | - | 0.920 | 0.888 | 0.432 | 0.616 | 0.712 | 0.872 | 0.840 |
597
+ MATH4_intermediate_algebra | 0.778 | 0.947 | 0.588 | 0.858 | 0.987 | 0.850 | 0.931 | 0.935 | 0.915 | 0.987 | - | - | 0.939 | 0.895 | 0.512 | 0.649 | 0.911 | 0.947 | 0.907 |
598
+ MATH4_number_theory | 0.795 | 0.950 | 0.697 | 0.809 | 0.915 | 0.725 | 0.859 | 0.894 | 0.929 | 0.992 | - | - | 0.929 | 0.964 | 0.619 | 0.823 | 0.823 | 0.943 | 0.936 |
599
+ MATH4_prealgebra | 0.806 | 0.931 | 0.785 | 0.874 | 0.958 | 0.827 | 0.942 | 0.921 | 0.931 | 0.989 | - | - | 0.958 | 0.963 | 0.748 | 0.801 | 0.879 | 0.926 | 0.942 |
600
+ MATH4_precalculus | 0.719 | 0.956 | 0.570 | 0.868 | 0.956 | 0.728 | 0.921 | 0.947 | 0.912 | 0.991 | - | - | 0.947 | 0.807 | 0.333 | 0.578 | 0.859 | 0.973 | 0.868 |
601
+ MATH4 | 0.792 | 0.935 | 0.684 | 0.845 | 0.953 | 0.816 | 0.915 | 0.925 | 0.932 | 0.989 | - | - | 0.953 | 0.929 | 0.620 | 0.746 | 0.887 | 0.952 | 0.930 |
602
+ MATH5_algebra | 0.768 | 0.947 | 0.752 | 0.899 | 0.970 | 0.853 | 0.934 | 0.970 | 0.967 | 0.986 | - | - | 0.960 | 0.967 | 0.674 | 0.762 | 0.964 | 0.964 | 0.977 |
603
+ MATH5_counting_and_probability | 0.699 | 0.910 | 0.569 | 0.756 | 0.829 | 0.699 | 0.788 | 0.910 | 0.861 | 0.959 | - | - | 0.934 | 0.967 | 0.495 | 0.642 | 0.910 | 0.934 | 0.902 |
604
+ MATH5_geometry | 0.712 | 0.886 | 0.545 | 0.810 | 0.840 | 0.727 | 0.818 | 0.840 | 0.810 | 0.946 | - | - | 0.878 | 0.810 | 0.348 | 0.507 | 0.734 | 0.833 | 0.742 |
605
+ MATH5_intermediate_algebra | 0.682 | 0.900 | 0.453 | 0.821 | 0.875 | 0.778 | 0.810 | 0.810 | 0.846 | 0.932 | - | - | 0.889 | 0.832 | 0.253 | 0.389 | 0.807 | 0.860 | 0.800 |
606
+ MATH5_number_theory | 0.811 | 0.909 | 0.707 | 0.727 | 0.935 | 0.792 | 0.915 | 0.935 | 0.935 | 0.967 | - | - | 0.961 | 0.954 | 0.525 | 0.753 | 0.870 | 0.941 | 0.935 |
607
+ MATH5_prealgebra | 0.777 | 0.849 | 0.720 | 0.808 | 0.896 | 0.782 | 0.823 | 0.875 | 0.891 | 0.958 | - | - | 0.953 | 0.937 | 0.580 | 0.797 | 0.911 | 0.927 | 0.948 |
608
+ MATH5_precalculus | 0.562 | 0.903 | 0.437 | 0.851 | 0.851 | 0.792 | 0.800 | 0.814 | 0.851 | 0.918 | - | - | 0.888 | 0.666 | 0.259 | 0.429 | 0.777 | 0.851 | 0.770 |
609
+ MATH5 | 0.723 | 0.904 | 0.609 | 0.822 | 0.897 | 0.787 | 0.851 | 0.884 | 0.889 | 0.955 | - | - | 0.926 | 0.886 | 0.462 | 0.617 | 0.865 | 0.907 | 0.879 |
610
+ MATHCOT | 0.795 | _0.930_ | 0.700 | 0.860 | _0.940_ | 0.831 | _0.910_ | 0.930 | _0.934_ | _0.983_ | - | - | _0.954_ | _0.936_ | 0.637 | 0.751 | 0.897 | 0.948 | _0.933_ |
611
  COMPOSITE AVERAGE
612
+ AVG | 0.795 | _0.907_ | 0.700 | 0.860 | _0.918_ | 0.831 | _0.895_ | 0.930 | _0.918_ | _0.964_ | - | - | _0.936_ | _0.923_ | 0.637 | 0.751 | 0.897 | 0.948 | _0.920_ |
613
 
614
  VISION MODELS:
615