Update README.md
Browse files
README.md
CHANGED
|
@@ -1,28 +1,19 @@
|
|
| 1 |
-
---
|
| 2 |
-
library_name: transformers
|
| 3 |
-
tags:
|
| 4 |
-
- tool
|
| 5 |
-
- function-calling
|
| 6 |
-
- agent
|
| 7 |
-
- merge
|
| 8 |
-
base_model:
|
| 9 |
-
- Qwen/Qwen3-4B-Instruct-2507
|
| 10 |
-
- beyoru/Qwen3-4B-I-1209
|
| 11 |
-
- Qwen/Qwen3-4B-Thinking-2507
|
| 12 |
-
datasets:
|
| 13 |
-
- Salesforce/xlam-function-calling-60k
|
| 14 |
-
---
|
| 15 |
library_name: transformers
|
| 16 |
tags:
|
| 17 |
- tool
|
| 18 |
- function-calling
|
| 19 |
- agent
|
|
|
|
| 20 |
base_model:
|
| 21 |
- Qwen/Qwen3-4B-Instruct-2507
|
|
|
|
|
|
|
| 22 |
datasets:
|
| 23 |
- Salesforce/xlam-function-calling-60k
|
| 24 |
---
|
| 25 |
|
|
|
|
| 26 |
# π§ **Model Card β EvolLLM-Linh**
|
| 27 |
|
| 28 |
### **Model Overview**
|
|
@@ -42,11 +33,18 @@ It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs oper
|
|
| 42 |
- Robust multi-turn dialogue consistency
|
| 43 |
- Adaptive understanding of user preferences and intent shifts
|
| 44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
---
|
| 46 |
|
| 47 |
### **Evaluation Comparison**
|
| 48 |
|
| 49 |
-
|
|
|
|
|
|
|
| 50 |
| ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
|
| 51 |
| SINGLE TURN β SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 |
|
| 52 |
| SINGLE TURN β PARALLEL FUNCTION | 0.660 | 0.620 | 0.16 | 0.51 |
|
|
@@ -60,12 +58,21 @@ It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs oper
|
|
| 60 |
| ATOMIC TASK β LIST | 0.920 | 0.900 | 0.84 | 0.78 |
|
| 61 |
| ATOMIC TASK β OBJECT (DEEP) | 0.580 | 0.520 | 0.32 | 0.36 |
|
| 62 |
| ATOMIC TASK β OBJECT (SHORT) | 0.800 | 0.960 | 0.70 | 0.56 |
|
| 63 |
-
| **Overall Accuracy** | **0.750**
|
|
|
|
|
|
|
| 64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
---
|
| 66 |
|
| 67 |
### **Leaderboard Reference**
|
| 68 |
-
|
| 69 |
Results are **internal benchmarks** aligned with ACEBench task categories.
|
| 70 |
|
| 71 |
---
|
|
@@ -83,6 +90,8 @@ Results are **internal benchmarks** aligned with ACEBench task categories.
|
|
| 83 |
</a>
|
| 84 |
</p>
|
| 85 |
|
|
|
|
|
|
|
| 86 |
### **License**
|
| 87 |
**MIT License** β free for research and non-commercial use with attribution.
|
| 88 |
Β© 2025 beyoru.
|
|
|
|
| 1 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
tags:
|
| 4 |
- tool
|
| 5 |
- function-calling
|
| 6 |
- agent
|
| 7 |
+
- merge
|
| 8 |
base_model:
|
| 9 |
- Qwen/Qwen3-4B-Instruct-2507
|
| 10 |
+
- beyoru/Qwen3-4B-I-1209
|
| 11 |
+
- Qwen/Qwen3-4B-Thinking-2507
|
| 12 |
datasets:
|
| 13 |
- Salesforce/xlam-function-calling-60k
|
| 14 |
---
|
| 15 |
|
| 16 |
+
|
| 17 |
# π§ **Model Card β EvolLLM-Linh**
|
| 18 |
|
| 19 |
### **Model Overview**
|
|
|
|
| 33 |
- Robust multi-turn dialogue consistency
|
| 34 |
- Adaptive understanding of user preferences and intent shifts
|
| 35 |
|
| 36 |
+
|
| 37 |
+
<p align="center">
|
| 38 |
+
<img src="hyacine-hsr.gif" width="150">
|
| 39 |
+
</p>
|
| 40 |
+
|
| 41 |
---
|
| 42 |
|
| 43 |
### **Evaluation Comparison**
|
| 44 |
|
| 45 |
+
---
|
| 46 |
+
|
| 47 |
+
| **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **Llama** | **Qwen-2507** |
|
| 48 |
| ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
|
| 49 |
| SINGLE TURN β SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 |
|
| 50 |
| SINGLE TURN β PARALLEL FUNCTION | 0.660 | 0.620 | 0.16 | 0.51 |
|
|
|
|
| 58 |
| ATOMIC TASK β LIST | 0.920 | 0.900 | 0.84 | 0.78 |
|
| 59 |
| ATOMIC TASK β OBJECT (DEEP) | 0.580 | 0.520 | 0.32 | 0.36 |
|
| 60 |
| ATOMIC TASK β OBJECT (SHORT) | 0.800 | 0.960 | 0.70 | 0.56 |
|
| 61 |
+
| **Overall Accuracy** | **0.750 (75.0%)** | **0.760 (76.0%)** | **0.61** | **0.64** |
|
| 62 |
+
|
| 63 |
+
---
|
| 64 |
|
| 65 |
+
> **Note:**
|
| 66 |
+
> **We evaluate all models with the same configuration.**
|
| 67 |
+
> If you find any incorrect or inconsistent result, please report it for verification.
|
| 68 |
+
> This ensures transparency and reproducibility across benchmarks.
|
| 69 |
+
<p align="center">
|
| 70 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/65905af887944e494e37e09a/XB1XEInyfE3dyUNAGb5zF.webp" width="300">
|
| 71 |
+
</p>
|
| 72 |
---
|
| 73 |
|
| 74 |
### **Leaderboard Reference**
|
| 75 |
+
all model are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** β assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.
|
| 76 |
Results are **internal benchmarks** aligned with ACEBench task categories.
|
| 77 |
|
| 78 |
---
|
|
|
|
| 90 |
</a>
|
| 91 |
</p>
|
| 92 |
|
| 93 |
+
## Notes:
|
| 94 |
+
**We evaluate all models with a same configure**. IF there are incorrect result please report.
|
| 95 |
### **License**
|
| 96 |
**MIT License** β free for research and non-commercial use with attribution.
|
| 97 |
Β© 2025 beyoru.
|