diff --git "a/results.html" "b/results.html" new file mode 100644--- /dev/null +++ "b/results.html" @@ -0,0 +1,2164 @@ +<\!DOCTYPE html> + + + + + FLaME Results + + + + + + + +
+
+

FLaME: Financial Language Model Evaluation Results

+
+

+ This page presents the results of the FLaME evaluation across various financial NLP tasks. + Each tab shows performance metrics for different task categories. +

+
+ +
+ +
+

Overall Performance Across All Tasks

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelInformation Retrieval*Sentiment AnalysisCausal AnalysisText ClassificationQuestion AnsweringSummarization
DatasetFiNERFRRDFNXLFEFiQASQAFPBCDCCB77FBFOMCNCHLCFQAFinQATQAECTSumEDTSum
Metric UsedF1 ScoreMSEF1 ScoreAccuracyBERTScore F1
Llama 3 70B Instruct.701.332.883.020.469.123.535.902.142.192.645.309.652.386.811.709.809.772.754.817
Llama 3 8B Instruct.565.289.705.003.350.161.600.698.049.234.512.659.497.511.763.268.767.706.757.811
DBRX Instruct.489.304.778.009.006.160.436.499.087.231.574.483.193.319.746.252.738.633.729.806
DeepSeek LLM (67B).745.334.879.007.416.118.462.811.025.193.578.492.407.151.778.174.742.355.681.807
Gemma 2 27B.761.356.902.006.298.100.515.884.133.242.621.538.620.408.808.268.768.734.723.814
Gemma 2 9B.651.331.892.005.367.189.491.940.105.207.609.541.519.365.856.292.779.750.585.817
Mistral (7B) Instruct v0.3.526.276.771.004.368.135.522.841.052.227.528.503.542.412.779.199.655.553.750.811
Mixtral-8x22B Instruct.635.367.811.009.435.221.510.776.125.308.602.221.465.513.835.285.766.666.758.815
Mixtral-8x7B Instruct.598.282.845.009.267.208.498.893.055.229.547.396.603.583.805.315.611.501.747.810
Qwen 2 Instruct (72B).748.348.854.012.483.205.576.901.190.184.627.495.605.639.830.269.819.715.752.811
WizardLM-2 8x22B.744.355.852.008.226.129.566.779.114.201.648.500.505.272.797.247.796.725.735.808
DeepSeek-V3.790.437.934.045.549.150.583.814.198.170.714.487.578.675.729.261.840.779.750.815
DeepSeek R1.807.393.952.057.587.110.499.902.337.202.763.419.670.688.769.853.836.858.759.804
QwQ-32B-Preview.685.270.656.001.005.141.550.815.131.220.613.784.555.020.744.282.793.796.696.817
Jamba 1.5 Mini.552.284.844.005.132.119.418.765.043.270.508.898.499.151.682.218.666.586.741.816
Jamba 1.5 Large.693.341.862.005.397.183.582.798.074.176.628.618.550.541.782.225.790.660.734.818
Claude 3.5 Sonnet.799.439.891.047.655.101.553.944.196.197.668.634.674.692.827.402.844.700.767.813
Claude 3 Haiku.711.285.883.015.494.167.463.908.081.200.622.022.631.558.781.421.803.733.646.808
Cohere Command R 7B.748.194.845.018.441.164.532.840.057.255.516.762.459.068.770.212.709.716.750.815
Cohere Command R +.756.333.922.021.452.106.533.699.080.238.651.684.393.118.812.259.776.698.751.810
Google Gemini 1.5 Pro.712.374.944.019.393.144.593.885.196.217.418.336.579.525.837.280.829.763.777.817
OpenAI gpt-4o.766.399.942.037.523.184.541.928.130.222.710.524.664.750.824.749.836.754.773.816
OpenAI o1-mini.761.403.876.010.662.120.542.917.289.209.670.612.635.720.769.840.799.698.763.816
+
+
+ + +
+

Causal Analysis Results

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelCausal DetectionCausal Classification
AccuracyPrecisionRecallF1PrecisionRecallF1Accuracy
Llama 3 70B Instruct0.1480.4290.1480.1420.2410.3290.1920.198
Llama 3 8B Instruct0.0970.3410.0970.0490.2320.2410.2340.380
DBRX Instruct0.0780.5210.0780.0870.2760.3130.2310.235
DeepSeek LLM (67B)0.0260.2140.0260.0250.1410.3280.1930.221
Gemma 2 27B0.1150.5100.1150.1330.3090.3100.2420.262
Gemma 2 9B0.1150.3940.1150.1050.2750.2940.2070.258
Mistral (7B) Instruct v0.30.0780.4550.0780.0520.3390.3610.2270.258
Mixtral-8x22B Instruct0.1310.4860.1310.1250.3440.3100.3080.318
Mixtral-8x7B Instruct0.0880.5100.0880.0550.3080.3140.2290.273
Qwen 2 Instruct (72B)0.1390.4890.1390.1900.2080.3300.1840.188
WizardLM-2 8x22B0.0760.4530.0760.1140.2630.3470.2010.237
DeepSeek-V30.1640.5280.1640.1980.1940.3270.1700.248
DeepSeek R10.2450.6430.2450.3370.3850.3180.2020.221
QwQ-32B-Preview0.1100.4730.1100.1310.1930.2620.2200.465
Jamba 1.5 Mini0.0500.2800.0500.0430.3230.2830.2700.295
Jamba 1.5 Large0.0760.5170.0760.0740.2680.2480.1760.200
Claude 3.5 Sonnet0.1540.5640.1540.1960.2590.3360.1970.235
Claude 3 Haiku0.0820.3880.0820.0810.3690.3470.2000.203
Cohere Command R 7B0.0890.3630.0890.0570.3790.3560.2550.275
Cohere Command R +0.0900.4530.0900.0800.3530.3360.2380.265
Google Gemini 1.5 Pro0.1650.5140.1650.1960.2650.3570.2170.258
OpenAI gpt-4o0.0820.5760.0820.1300.2540.3270.2220.235
OpenAI o1-mini0.2060.6480.2060.2890.3250.3160.2090.233
+
+

Note: Color highlighting indicates performance ranking: +  Best , +  Strong  +

+
+
+
+ + +
+

Information Retrieval Task Results

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelFiNERFinRedReFiNDFNXLFinEntity
PrecisionRecallF1AccuracyAccuracyPrecisionRecallF1AccuracyPrecisionRecallF1PrecisionRecallF1AccuracyPrecisionRecallAccuracyF1
Llama 3 70B Instruct0.7150.6930.7010.9110.3140.4540.3140.3320.8790.9040.8790.8830.0150.0300.0200.0100.4740.4850.4850.469
Llama 3 8B Instruct0.5810.5580.5650.8540.2960.3570.2960.2890.7230.7550.7230.7050.0030.0040.0030.0020.3010.4780.4780.350
DBRX Instruct0.5160.4760.4890.8020.3290.3710.3290.3040.7660.8250.7660.7780.0080.0110.0090.0050.0040.0140.0140.006
DeepSeek LLM (67B)0.7520.7420.7450.9170.3440.4030.3440.3340.8740.8900.8740.8790.0050.0090.0070.0030.4560.4050.4050.416
Gemma 2 27B0.7720.7540.7610.9230.3520.4370.3520.3560.8970.9140.8970.9020.0050.0080.0060.0030.3200.2950.2950.298
Gemma 2 9B0.6650.6430.6510.8860.3360.3730.3360.3310.8850.9020.8850.8920.0040.0080.0050.0030.3480.4190.4190.367
Mistral (7B) Instruct v0.30.5400.5220.5260.8060.2780.3830.2780.2760.7670.8170.7670.7710.0040.0060.0040.0020.3370.4770.4770.368
Mixtral-8x22B Instruct0.6530.6250.6350.8700.3810.4140.3810.3670.8070.8470.8070.8110.0100.0080.0090.0050.4280.4810.4810.435
Mixtral-8x7B Instruct0.6130.5910.5980.8750.2910.3760.2910.2820.8400.8630.8400.8450.0070.0120.0090.0050.2510.3240.3240.267
Qwen 2 Instruct (72B)0.7660.7420.7480.8990.3650.4070.3650.3480.8500.8810.8500.8540.0100.0160.0120.0060.4680.5300.5300.483
WizardLM-2 8x22B0.7550.7410.7440.9200.3620.3970.3620.3550.8460.8740.8460.8520.0080.0090.0080.0040.2220.2470.2470.226
DeepSeek-V30.7980.7870.7900.9450.4500.4630.4500.4370.9270.9430.9270.9340.0340.0670.0450.0230.5630.5440.5440.549
DeepSeek R10.8130.8050.8070.9440.4120.4240.4120.3930.9460.9600.9460.9520.0440.0820.0570.0290.6000.5860.5860.587
QwQ-32B-Preview0.6950.6810.6850.9070.2780.3960.2780.2700.6800.7700.6800.6560.0010.0010.0010.0000.0050.0050.0050.005
Jamba 1.5 Mini0.5640.5560.5520.8180.3080.4500.3080.2840.8300.8640.8300.8440.0040.0060.0050.0030.1190.1820.1820.132
Jamba 1.5 Large0.7070.6870.6930.8830.3410.4520.3410.3410.8560.8900.8560.8620.0040.0050.0050.0020.4030.4140.4140.397
Claude 3.5 Sonnet0.8110.7940.7990.9220.4550.4650.4550.4390.8730.9270.8730.8910.0340.0800.0470.0240.6580.6680.6680.655
Claude 3 Haiku0.7320.7000.7110.8950.2940.3300.2940.2850.8790.9170.8790.8830.0110.0220.0150.0080.4980.5170.5170.494
Cohere Command R +0.7690.7500.7560.9020.3530.4050.3530.3330.9170.9300.9170.9220.0160.0320.0210.0110.4620.4590.4590.452
Google Gemini 1.5 Pro0.7280.7050.7120.8910.3730.4360.3730.3740.9340.9550.9340.9440.0140.0280.0190.0100.3990.4000.4000.393
OpenAI gpt-4o0.7780.7600.7660.9110.4020.4450.4020.3990.9310.9550.9310.9420.0270.0560.0370.0190.5370.5170.5170.523
OpenAI o1-mini0.7720.7550.7610.9220.4070.4440.4070.4030.8670.9000.8670.8760.0070.0150.0100.0050.6610.6810.6810.662
+
+

Note: Color highlighting indicates performance ranking: +  Best , +  Strong  +

+
+
+
+ + + +
+

Question Answering Task Results

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelDatasets (Accuracy)
FinQAConvFinQATATQA
Llama 3 70B Instruct0.8090.7090.772
Llama 3 8B Instruct0.7670.2680.706
DBRX Instruct0.7380.2520.633
DeepSeek LLM (67B)0.7420.1740.355
Gemma 2 27B0.7680.2680.734
Gemma 2 9B0.7790.2920.750
Mistral (7B) Instruct v0.30.6550.1990.553
Mixtral-8x22B Instruct0.7660.2850.666
Mixtral-8x7B Instruct0.6110.3150.501
Qwen 2 Instruct (72B)0.8190.2690.715
WizardLM-2 8x22B0.7960.2470.725
DeepSeek-V30.8400.2610.779
DeepSeek R10.8360.8530.858
QwQ-32B-Preview0.7930.2820.796
Jamba 1.5 Mini0.6660.2180.586
Jamba 1.5 Large0.7900.2250.660
Claude 3.5 Sonnet0.8440.4020.700
Claude 3 Haiku0.8030.4210.733
Cohere Command R 7B0.7090.2120.716
Cohere Command R +0.7760.2590.698
Google Gemini 1.5 Pro0.8290.2800.763
OpenAI gpt-4o0.8360.7490.754
OpenAI o1-mini0.7990.8400.698
+
+

Note: Color highlighting indicates performance ranking: +  Best , +  Strong , +  Good  +

+
+
+
+
+

Sentiment Analysis Task Results

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelFiQA Task 1Financial Phrase Bank (FPB)SubjECTive-QA
MSEMAEr² ScoreAccuracyPrecisionRecallF1PrecisionRecallF1Accuracy
Llama 3 70B Instruct0.1230.2900.2720.9010.9040.9010.9020.6520.5730.5350.573
Llama 3 8B Instruct0.1610.3440.0450.7380.8010.7380.6980.6350.6250.6000.625
DBRX Instruct0.1600.3210.0520.5240.7270.5240.4990.6540.5410.4360.541
DeepSeek LLM (67B)0.1180.2780.3020.8150.8670.8150.8110.6760.5440.4620.544
Gemma 2 27B0.1000.2660.4060.8900.8960.8900.8840.5620.5240.5150.524
Gemma 2 9B0.1890.352-0.1200.9400.9410.9400.9400.5700.4990.4910.499
Mistral (7B) Instruct v0.30.1350.2780.2000.8470.8540.8470.8410.6070.5420.5220.542
Mixtral-8x22B Instruct0.2210.364-0.3100.7680.8450.7680.7760.6140.5380.5100.538
Mixtral-8x7B Instruct0.2080.307-0.2290.8960.8980.8960.8930.6110.5180.4980.518
Qwen 2 Instruct (72B)0.2050.409-0.2120.9040.9080.9040.9010.6440.6010.5760.601
WizardLM-2 8x22B0.1290.2830.2390.7650.8530.7650.7790.6110.5700.5660.570
DeepSeek-V30.1500.3110.1110.8280.8510.8280.8140.6400.5720.5830.572
DeepSeek R10.1100.2890.3480.9040.9070.9040.9020.6440.4890.4990.489
QwQ-32B-Preview0.1410.2900.1650.8120.8270.8120.8150.6290.5340.5500.534
Jamba 1.5 Mini0.1190.2820.2930.7840.8140.7840.7650.3800.5250.4180.525
Jamba 1.5 Large0.1830.363-0.0850.8240.8500.8240.7980.6350.5730.5820.573
Claude 3.5 Sonnet0.1010.2680.4020.9440.9450.9440.9440.6340.5850.5530.585
Claude 3 Haiku0.1670.3490.0080.9070.9130.9070.9080.6190.5380.4630.538
Cohere Command R 7B0.1640.3190.0280.8350.8610.8350.8400.6090.5470.5320.547
Cohere Command R +0.1060.2740.3730.7410.8060.7410.6990.6080.5470.5330.547
Google Gemini 1.5 Pro0.1440.3290.1490.8900.8950.8900.8850.6420.5870.5930.587
OpenAI gpt-4o0.1840.317-0.0890.9290.9310.9290.9280.6390.5150.5410.515
OpenAI o1-mini0.1200.2950.2890.9180.9170.9180.9170.6600.5150.5420.515
+
+

Note: Color highlighting indicates performance ranking: +  Best , +  Strong , +  Good  +

+
+
+
+
+

Text Classification Task Results

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelBanking77FinBenchFOMCNumClaimHeadlines
AccuracyPrecisionRecallF1AccuracyPrecisionRecallF1AccuracyPrecisionRecallF1AccuracyPrecisionRecallF1Accuracy
Llama 3 70B Instruct0.6600.7480.6600.6450.2220.8260.2220.3090.6610.6620.6610.6520.4300.2400.9800.3860.811
Llama 3 8B Instruct0.5340.6720.5340.5120.5430.8570.5430.6590.5650.6180.5650.4970.8010.4630.5710.5110.763
DBRX Instruct0.5780.7060.5780.5740.3590.8510.3590.4830.2850.5720.2850.1930.2220.1901.0000.3190.746
DeepSeek LLM (67B)0.5960.7110.5960.5780.3690.8560.3690.4920.5320.6780.5320.4070.8321.0000.0820.1510.778
Gemma 2 27B0.6390.7300.6390.6210.4100.8490.4100.5380.6510.7040.6510.6200.4710.2571.0000.4080.808
Gemma 2 9B0.6300.7100.6300.6090.4120.8480.4120.5410.5950.6940.5950.5190.3710.2240.9900.3650.856
Mistral (7B) Instruct v0.30.5470.6770.5470.5280.3750.8390.3750.5030.5870.5980.5870.5420.5210.2660.9180.4120.779
Mixtral-8x22B Instruct0.6220.7180.6220.6020.1660.8110.1660.2210.5620.7090.5620.4650.7320.3840.7750.5130.835
Mixtral-8x7B Instruct0.5670.6930.5670.5470.2850.8380.2850.3960.6230.6360.6230.6030.7650.4310.8980.5830.805
Qwen 2 Instruct (72B)0.6440.7300.6440.6270.3700.8480.3700.4950.6230.6390.6230.6050.8210.5060.8670.6390.830
WizardLM-2 8x22B0.6640.7370.6640.6480.3730.8420.3730.5000.5830.7100.5830.5050.8310.6300.1730.2720.797
DeepSeek-V30.7220.7740.7220.7140.3620.8450.3620.4870.6250.7120.6250.5780.8600.5860.7960.6750.729
DeepSeek R10.7720.7890.7720.7630.3060.8460.3060.4190.6790.6820.6790.6700.8510.5570.8980.6880.769
QwQ-32B-Preview0.5770.7470.5770.6130.7160.8710.7160.7840.5910.6300.5910.5550.8191.0000.0100.0200.744
Jamba 1.5 Mini0.5280.6300.5280.5080.9130.8830.9130.8980.5720.6780.5720.4990.8120.4290.0920.1510.682
Jamba 1.5 Large0.6420.7460.6420.6280.4940.8510.4940.6180.5970.6500.5970.5500.8550.6390.4690.5410.782
Claude 3.5 Sonnet0.6820.7550.6820.6680.5130.8540.5130.6340.6750.6770.6750.6740.8790.6460.7450.6920.827
Claude 3 Haiku0.6390.7350.6390.6220.0670.6740.0670.0220.6330.6340.6330.6310.8380.5560.5610.5580.781
Cohere Command R 7B0.5300.6500.5300.5160.6820.8680.6820.7620.5360.5050.5360.4590.7970.2100.0410.0680.770
Cohere Command R +0.6600.7470.6600.6510.5750.8590.5750.6840.5260.6550.5260.3930.8040.3330.0710.1180.812
Google Gemini 1.5 Pro0.4830.4870.4830.4180.2400.8230.2400.3360.6190.6670.6190.5790.7000.3690.9080.5250.837
OpenAI gpt-4o0.7040.7920.7040.7100.3960.8460.3960.5240.6810.7190.6810.6640.8960.6670.8570.7500.824
OpenAI o1-mini0.6810.7600.6810.6700.4870.8510.4870.6120.6510.6700.6510.6350.8880.6640.7860.7200.769
+
+

Note: Color highlighting indicates performance ranking: +  Best , +  Strong , +  Good  +

+
+
+
+
+

Text Summarization Task Results

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelECTSumEDTSum
BERTScore PrecisionBERTScore RecallBERTScore F1BERTScore PrecisionBERTScore RecallBERTScore F1
Llama 3 70B Instruct0.7150.8010.7540.7930.8440.817
Llama 3 8B Instruct0.7240.7960.7570.7850.8410.811
DBRX Instruct0.6800.7860.7290.7740.8430.806
DeepSeek LLM (67B)0.6920.6780.6810.7790.8400.807
Gemma 2 27B0.6800.7770.7230.8010.8290.814
Gemma 2 9B0.6510.5310.5850.8030.8330.817
Mistral (7B) Instruct v0.30.7020.8060.7500.7830.8420.811
Mixtral-8x22B Instruct0.7130.8120.7580.7900.8430.815
Mixtral-8x7B Instruct0.7270.7730.7470.7850.8390.810
Qwen 2 Instruct (72B)0.7090.8040.7520.7810.8460.811
WizardLM-2 8x22B0.6770.8060.7350.7740.8470.808
DeepSeek-V30.7030.8060.7500.7910.8420.815
DeepSeek R10.7240.8000.7590.7700.8430.804
QwQ-32B-Preview0.6530.7510.6960.7970.8410.817
Jamba 1.5 Mini0.6920.7980.7410.7980.8380.816
Jamba 1.5 Large0.6790.8000.7340.7990.8410.818
Claude 3.5 Sonnet0.7370.8020.7670.7860.8430.813
Claude 3 Haiku0.6830.6170.6460.7780.8440.808
Cohere Command R 7B0.7240.7810.7500.7900.8440.815
Cohere Command R +0.7240.7820.7510.7890.8340.810
Google Gemini 1.5 Pro0.7570.8000.7770.8000.8360.817
OpenAI gpt-4o0.7550.7930.7730.7950.8400.816
OpenAI o1-mini0.7310.8010.7630.7950.8400.816
+
+

Note: Color highlighting indicates performance ranking: +  Best , +  Strong , +  Good  +

+
+
+
+
+
+ + + + + +