Update README.md
Browse files
README.md
CHANGED
|
@@ -41,21 +41,22 @@ It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs oper
|
|
| 41 |
|
| 42 |
### **Evaluation Comparison**
|
| 43 |
|
| 44 |
-
| **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **Llama** | **Qwen-2507** |
|
| 45 |
-
| ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
|
| 46 |
-
| SINGLE TURN β SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 |
|
| 47 |
-
| SINGLE TURN β PARALLEL FUNCTION | 0.660 | 0.620 | 0.16 | 0.51 |
|
| 48 |
-
| MULTI TURN β USER ADJUST | 0.500 | 0.500 | 0.40 | 0.48 |
|
| 49 |
-
| MULTI TURN β USER SWITCH | 0.620 | 0.620 | 0.40 | 0.56 |
|
| 50 |
-
| SIMILAR API CALLS | 0.760 | 0.740 | 0.64 | 0.68 |
|
| 51 |
-
| USER PREFERENCE HANDLING | 0.600 | 0.640 | 0.62 | 0.64 |
|
| 52 |
-
| ATOMIC TASK β BOOLEAN | 0.880 | 0.960 | 0.70 | 0.68 |
|
| 53 |
-
| ATOMIC TASK β ENUM | 0.940 | 0.940 | 0.94 | 0.86 |
|
| 54 |
-
| ATOMIC TASK β NUMBER | 0.940 | 0.960 | 0.90 | 0.82 |
|
| 55 |
-
| ATOMIC TASK β LIST | 0.920 | 0.900 | 0.84 | 0.78 |
|
| 56 |
-
| ATOMIC TASK β OBJECT (DEEP) | 0.580 | 0.520 | 0.32 | 0.36 |
|
| 57 |
-
| ATOMIC TASK β OBJECT (SHORT) | 0.800 | 0.960 | 0.70 | 0.56 |
|
| 58 |
-
| **Overall Accuracy** | **0.750 (75.0%)** | **0.760 (76.0%)** | **0.61** | **0.64** |
|
|
|
|
| 59 |
|
| 60 |
---
|
| 61 |
|
|
|
|
| 41 |
|
| 42 |
### **Evaluation Comparison**
|
| 43 |
|
| 44 |
+
| **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **Llama** | **Qwen-2507** | **MinCoder-4B-Expert** |
|
| 45 |
+
| ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: | :---------------: |
|
| 46 |
+
| SINGLE TURN β SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 | 0.81 |
|
| 47 |
+
| SINGLE TURN β PARALLEL FUNCTION | 0.660 | 0.620 | 0.16 | 0.51 | 0.66 |
|
| 48 |
+
| MULTI TURN β USER ADJUST | 0.500 | 0.500 | 0.40 | 0.48 | 0.50 |
|
| 49 |
+
| MULTI TURN β USER SWITCH | 0.620 | 0.620 | 0.40 | 0.56 | 0.64 |
|
| 50 |
+
| SIMILAR API CALLS | 0.760 | 0.740 | 0.64 | 0.68 | 0.76 |
|
| 51 |
+
| USER PREFERENCE HANDLING | 0.600 | 0.640 | 0.62 | 0.64 | 0.60 |
|
| 52 |
+
| ATOMIC TASK β BOOLEAN | 0.880 | 0.960 | 0.70 | 0.68 | 0.88 |
|
| 53 |
+
| ATOMIC TASK β ENUM | 0.940 | 0.940 | 0.94 | 0.86 | 0.96 |
|
| 54 |
+
| ATOMIC TASK β NUMBER | 0.940 | 0.960 | 0.90 | 0.82 | 0.94 |
|
| 55 |
+
| ATOMIC TASK β LIST | 0.920 | 0.900 | 0.84 | 0.78 | 0.94 |
|
| 56 |
+
| ATOMIC TASK β OBJECT (DEEP) | 0.580 | 0.520 | 0.32 | 0.36 | 0.62 |
|
| 57 |
+
| ATOMIC TASK β OBJECT (SHORT) | 0.800 | 0.960 | 0.70 | 0.56 | 0.82 |
|
| 58 |
+
| **Overall Accuracy** | **0.750 (75.0%)** | **0.760 (76.0%)** | **0.61** | **0.64** | **0.761** |
|
| 59 |
+
|
| 60 |
|
| 61 |
---
|
| 62 |
|