beyoru commited on
Commit
561865e
Β·
verified Β·
1 Parent(s): c78b8a4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -15
README.md CHANGED
@@ -41,21 +41,22 @@ It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs oper
41
 
42
  ### **Evaluation Comparison**
43
 
44
- | **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **Llama** | **Qwen-2507** |
45
- | ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
46
- | SINGLE TURN – SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 |
47
- | SINGLE TURN – PARALLEL FUNCTION | 0.660 | 0.620 | 0.16 | 0.51 |
48
- | MULTI TURN – USER ADJUST | 0.500 | 0.500 | 0.40 | 0.48 |
49
- | MULTI TURN – USER SWITCH | 0.620 | 0.620 | 0.40 | 0.56 |
50
- | SIMILAR API CALLS | 0.760 | 0.740 | 0.64 | 0.68 |
51
- | USER PREFERENCE HANDLING | 0.600 | 0.640 | 0.62 | 0.64 |
52
- | ATOMIC TASK – BOOLEAN | 0.880 | 0.960 | 0.70 | 0.68 |
53
- | ATOMIC TASK – ENUM | 0.940 | 0.940 | 0.94 | 0.86 |
54
- | ATOMIC TASK – NUMBER | 0.940 | 0.960 | 0.90 | 0.82 |
55
- | ATOMIC TASK – LIST | 0.920 | 0.900 | 0.84 | 0.78 |
56
- | ATOMIC TASK – OBJECT (DEEP) | 0.580 | 0.520 | 0.32 | 0.36 |
57
- | ATOMIC TASK – OBJECT (SHORT) | 0.800 | 0.960 | 0.70 | 0.56 |
58
- | **Overall Accuracy** | **0.750 (75.0%)** | **0.760 (76.0%)** | **0.61** | **0.64** |
 
59
 
60
  ---
61
 
 
41
 
42
  ### **Evaluation Comparison**
43
 
44
+ | **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **Llama** | **Qwen-2507** | **MinCoder-4B-Expert** |
45
+ | ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: | :---------------: |
46
+ | SINGLE TURN – SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 | 0.81 |
47
+ | SINGLE TURN – PARALLEL FUNCTION | 0.660 | 0.620 | 0.16 | 0.51 | 0.66 |
48
+ | MULTI TURN – USER ADJUST | 0.500 | 0.500 | 0.40 | 0.48 | 0.50 |
49
+ | MULTI TURN – USER SWITCH | 0.620 | 0.620 | 0.40 | 0.56 | 0.64 |
50
+ | SIMILAR API CALLS | 0.760 | 0.740 | 0.64 | 0.68 | 0.76 |
51
+ | USER PREFERENCE HANDLING | 0.600 | 0.640 | 0.62 | 0.64 | 0.60 |
52
+ | ATOMIC TASK – BOOLEAN | 0.880 | 0.960 | 0.70 | 0.68 | 0.88 |
53
+ | ATOMIC TASK – ENUM | 0.940 | 0.940 | 0.94 | 0.86 | 0.96 |
54
+ | ATOMIC TASK – NUMBER | 0.940 | 0.960 | 0.90 | 0.82 | 0.94 |
55
+ | ATOMIC TASK – LIST | 0.920 | 0.900 | 0.84 | 0.78 | 0.94 |
56
+ | ATOMIC TASK – OBJECT (DEEP) | 0.580 | 0.520 | 0.32 | 0.36 | 0.62 |
57
+ | ATOMIC TASK – OBJECT (SHORT) | 0.800 | 0.960 | 0.70 | 0.56 | 0.82 |
58
+ | **Overall Accuracy** | **0.750 (75.0%)** | **0.760 (76.0%)** | **0.61** | **0.64** | **0.761** |
59
+
60
 
61
  ---
62