Update README.md
Browse files
README.md
CHANGED
|
@@ -96,6 +96,15 @@ Safety alignment is another aspect we particularly emphasize. The Web3 investmen
|
|
| 96 |
| | DMind Benchmark | - | - | - | - | - |
|
| 97 |
|
| 98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
## Application Scenarios
|
| 100 |
|
| 101 |
### 🎯 Edge-Side Web3 Investment Decision Support
|
|
|
|
| 96 |
| | DMind Benchmark | - | - | - | - | - |
|
| 97 |
|
| 98 |
|
| 99 |
+
|
| 100 |
+
| 模型 | MMLU-Pro (EM) | GPQA-Diamond (Pass@1) | SimpleQA (Correct) | AIME 2024 (Pass@1) | AIME 2025 (Pass@1) | CNMO 2024 (Pass@1) | BFCL_v3 |
|
| 101 |
+
|------|---------------|----------------------|-------------------|-------------------|-------------------|-------------------|---------|
|
| 102 |
+
| **DeepSeek-R1-0528-Qwen3-8B** | 需查找更多信息* | **61.1** | 需查找更多信息* | **86.0** | **76.3** | 需查找更多信息* | 需查找更多信息* |
|
| 103 |
+
| **gpt-oss-20b** | 约85.3%** | **约81.4** | **约6.7*** | **约86.2** | **约68.7** | 无数据 | 需查找更多信息* |
|
| 104 |
+
| **Qwen3-32B** | 需查找更多信息* | **65.6** | 需查找更多信息* | **81.4** | **72.9** | 无数据 | **70.3** |
|
| 105 |
+
| **Qwen3-4B(Thinking)** | **74.0** | **65.8** | 需查找更多信息* | 需查找更多信息* | **81.3** | 需查找更多信息* | **71.2** |
|
| 106 |
+
|
| 107 |
+
|
| 108 |
## Application Scenarios
|
| 109 |
|
| 110 |
### 🎯 Edge-Side Web3 Investment Decision Support
|