beyoru commited on
Commit
d4ed45e
Β·
verified Β·
1 Parent(s): 68a89e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -17
README.md CHANGED
@@ -1,28 +1,19 @@
1
- ---
2
- library_name: transformers
3
- tags:
4
- - tool
5
- - function-calling
6
- - agent
7
- - merge
8
- base_model:
9
- - Qwen/Qwen3-4B-Instruct-2507
10
- - beyoru/Qwen3-4B-I-1209
11
- - Qwen/Qwen3-4B-Thinking-2507
12
- datasets:
13
- - Salesforce/xlam-function-calling-60k
14
- ---
15
  library_name: transformers
16
  tags:
17
  - tool
18
  - function-calling
19
  - agent
 
20
  base_model:
21
  - Qwen/Qwen3-4B-Instruct-2507
 
 
22
  datasets:
23
  - Salesforce/xlam-function-calling-60k
24
  ---
25
 
 
26
  # 🧠 **Model Card β€” EvolLLM-Linh**
27
 
28
  ### **Model Overview**
@@ -42,11 +33,18 @@ It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs oper
42
  - Robust multi-turn dialogue consistency
43
  - Adaptive understanding of user preferences and intent shifts
44
 
 
 
 
 
 
45
  ---
46
 
47
  ### **Evaluation Comparison**
48
 
49
- | **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **xLAM-2-8b-fc-r** | **Qwen3-2507** |
 
 
50
  | ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
51
  | SINGLE TURN – SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 |
52
  | SINGLE TURN – PARALLEL FUNCTION | 0.660 | 0.620 | 0.16 | 0.51 |
@@ -60,12 +58,21 @@ It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs oper
60
  | ATOMIC TASK – LIST | 0.920 | 0.900 | 0.84 | 0.78 |
61
  | ATOMIC TASK – OBJECT (DEEP) | 0.580 | 0.520 | 0.32 | 0.36 |
62
  | ATOMIC TASK – OBJECT (SHORT) | 0.800 | 0.960 | 0.70 | 0.56 |
63
- | **Overall Accuracy** | **0.750** | **0.760** | **0.61** | **0.64** |
 
 
64
 
 
 
 
 
 
 
 
65
  ---
66
 
67
  ### **Leaderboard Reference**
68
- Both **EvolLLM-Linh** and **GPT-OSS-20B** are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** β€” assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.
69
  Results are **internal benchmarks** aligned with ACEBench task categories.
70
 
71
  ---
@@ -83,6 +90,8 @@ Results are **internal benchmarks** aligned with ACEBench task categories.
83
  </a>
84
  </p>
85
 
 
 
86
  ### **License**
87
  **MIT License** β€” free for research and non-commercial use with attribution.
88
  Β© 2025 beyoru.
 
1
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  library_name: transformers
3
  tags:
4
  - tool
5
  - function-calling
6
  - agent
7
+ - merge
8
  base_model:
9
  - Qwen/Qwen3-4B-Instruct-2507
10
+ - beyoru/Qwen3-4B-I-1209
11
+ - Qwen/Qwen3-4B-Thinking-2507
12
  datasets:
13
  - Salesforce/xlam-function-calling-60k
14
  ---
15
 
16
+
17
  # 🧠 **Model Card β€” EvolLLM-Linh**
18
 
19
  ### **Model Overview**
 
33
  - Robust multi-turn dialogue consistency
34
  - Adaptive understanding of user preferences and intent shifts
35
 
36
+
37
+ <p align="center">
38
+ <img src="hyacine-hsr.gif" width="150">
39
+ </p>
40
+
41
  ---
42
 
43
  ### **Evaluation Comparison**
44
 
45
+ ---
46
+
47
+ | **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **Llama** | **Qwen-2507** |
48
  | ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
49
  | SINGLE TURN – SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 |
50
  | SINGLE TURN – PARALLEL FUNCTION | 0.660 | 0.620 | 0.16 | 0.51 |
 
58
  | ATOMIC TASK – LIST | 0.920 | 0.900 | 0.84 | 0.78 |
59
  | ATOMIC TASK – OBJECT (DEEP) | 0.580 | 0.520 | 0.32 | 0.36 |
60
  | ATOMIC TASK – OBJECT (SHORT) | 0.800 | 0.960 | 0.70 | 0.56 |
61
+ | **Overall Accuracy** | **0.750 (75.0%)** | **0.760 (76.0%)** | **0.61** | **0.64** |
62
+
63
+ ---
64
 
65
+ > **Note:**
66
+ > **We evaluate all models with the same configuration.**
67
+ > If you find any incorrect or inconsistent result, please report it for verification.
68
+ > This ensures transparency and reproducibility across benchmarks.
69
+ <p align="center">
70
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/65905af887944e494e37e09a/XB1XEInyfE3dyUNAGb5zF.webp" width="300">
71
+ </p>
72
  ---
73
 
74
  ### **Leaderboard Reference**
75
+ all model are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** β€” assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.
76
  Results are **internal benchmarks** aligned with ACEBench task categories.
77
 
78
  ---
 
90
  </a>
91
  </p>
92
 
93
+ ## Notes:
94
+ **We evaluate all models with a same configure**. IF there are incorrect result please report.
95
  ### **License**
96
  **MIT License** β€” free for research and non-commercial use with attribution.
97
  Β© 2025 beyoru.