Spaces:
Sleeping
Sleeping
| 排名,模型名称,事实性幻觉,忠实性幻觉,最终得分 | |
| 1,GPT 5(思考模式),72,100,86 | |
| 2,GPT 5(自动模式),68,100,84 | |
| 3,Claude 4 Opus(思考模式),73,92,83 | |
| 4,Claude 4 Opus,64,96,80 | |
| 5,Grok 4,71,80,76 | |
| 6,GPT-o3,49,100,75 | |
| 7,豆包1.5 Pro,57,88,73 | |
| 8,豆包1.5 Pro(思考模式),60,84,72 | |
| 9,Gemini 2.5 Pro,57,84,71 | |
| 10,GPT-o4 mini,44,96,70 | |
| 11,GPT-4.1,59,80,69 | |
| 12,GPT-4o,53,80,67 | |
| 12,Gemini 2.5 Flash,49,84,67 | |
| 14,文心一言 X1-Turbo,47,84,65 | |
| 14,通义千问3(思考模式),55,76,65 | |
| 14,DeepSeek-V3,49,80,65 | |
| 14,混元-T1,49,80,65 | |
| 18,Kimi,47,80,63 | |
| 18,通义千问3,51,76,63 | |
| 20,DeepSeek-R1,52,68,60 | |
| 20,Grok 3,36,84,60 | |
| 20,混元-TurboS,44,76,60 | |
| 23,日日新 V6 Pro,41,76,59 | |
| 24,GLM-4-plus,35,80,57 | |
| 25,MiniMax-01,31,80,55 | |
| 25,360智脑2-o1,49,60,55 | |
| 27,Yi- Lightning,28,80,54 | |
| 28,Grok 3(思考模式),29,76,53 | |
| 29,Kimi-k1.5,36,68,52 | |
| 30,文心一言4.5-Turbo,31,72,51 | |
| 30,日日新 V6推理,37,64,51 | |
| 32,Step 2,32,68,50 | |
| 33,Step R1-V-Mini,36,60,48 | |
| 34,Baichuan4-Turbo,33,60,47 | |
| 35,GLM-Z1-Air,32,60,46 | |
| 36,Llama 3.3 70B,33,56,45 | |
| 37,Spark 4.0 Ultra,19,64,41 |