排名,模型名称,事实性幻觉,忠实性幻觉,最终得分 1,GPT 5(思考模式),72,100,86 2,GPT 5(自动模式),68,100,84 3,Claude 4 Opus(思考模式),73,92,83 4,Claude 4 Opus,64,96,80 5,Grok 4,71,80,76 6,GPT-o3,49,100,75 7,豆包1.5 Pro,57,88,73 8,豆包1.5 Pro(思考模式),60,84,72 9,Gemini 2.5 Pro,57,84,71 10,GPT-o4 mini,44,96,70 11,GPT-4.1,59,80,69 12,GPT-4o,53,80,67 12,Gemini 2.5 Flash,49,84,67 14,文心一言 X1-Turbo,47,84,65 14,通义千问3(思考模式),55,76,65 14,DeepSeek-V3,49,80,65 14,混元-T1,49,80,65 18,Kimi,47,80,63 18,通义千问3,51,76,63 20,DeepSeek-R1,52,68,60 20,Grok 3,36,84,60 20,混元-TurboS,44,76,60 23,日日新 V6 Pro,41,76,59 24,GLM-4-plus,35,80,57 25,MiniMax-01,31,80,55 25,360智脑2-o1,49,60,55 27,Yi- Lightning,28,80,54 28,Grok 3(思考模式),29,76,53 29,Kimi-k1.5,36,68,52 30,文心一言4.5-Turbo,31,72,51 30,日日新 V6推理,37,64,51 32,Step 2,32,68,50 33,Step R1-V-Mini,36,60,48 34,Baichuan4-Turbo,33,60,47 35,GLM-Z1-Air,32,60,46 36,Llama 3.3 70B,33,56,45 37,Spark 4.0 Ultra,19,64,41