Update README.md
Browse files
README.md
CHANGED
|
@@ -38,12 +38,8 @@ It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs oper
|
|
| 38 |
<img src="hyacine-hsr.gif" width="150">
|
| 39 |
</p>
|
| 40 |
|
| 41 |
-
---
|
| 42 |
-
|
| 43 |
### **Evaluation Comparison**
|
| 44 |
|
| 45 |
-
---
|
| 46 |
-
|
| 47 |
| **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **Llama** | **Qwen-2507** |
|
| 48 |
| ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
|
| 49 |
| SINGLE TURN – SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 |
|
|
@@ -66,10 +62,6 @@ It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs oper
|
|
| 66 |
> **We evaluate all models with the same configuration.**
|
| 67 |
> If you find any incorrect or inconsistent result, please report it for verification.
|
| 68 |
> This ensures transparency and reproducibility across benchmarks.
|
| 69 |
-
<p align="center">
|
| 70 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/65905af887944e494e37e09a/XB1XEInyfE3dyUNAGb5zF.webp" width="300">
|
| 71 |
-
</p>
|
| 72 |
-
---
|
| 73 |
|
| 74 |
### **Leaderboard Reference**
|
| 75 |
all model are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** — assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.
|
|
|
|
| 38 |
<img src="hyacine-hsr.gif" width="150">
|
| 39 |
</p>
|
| 40 |
|
|
|
|
|
|
|
| 41 |
### **Evaluation Comparison**
|
| 42 |
|
|
|
|
|
|
|
| 43 |
| **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **Llama** | **Qwen-2507** |
|
| 44 |
| ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
|
| 45 |
| SINGLE TURN – SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 |
|
|
|
|
| 62 |
> **We evaluate all models with the same configuration.**
|
| 63 |
> If you find any incorrect or inconsistent result, please report it for verification.
|
| 64 |
> This ensures transparency and reproducibility across benchmarks.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
### **Leaderboard Reference**
|
| 67 |
all model are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** — assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.
|