beyoru
/

EvolLLM-Linh

@@ -1,28 +1,19 @@
----
-library_name: transformers
-tags:
-- tool
-- function-calling
-- agent
-- merge
-base_model:
-- Qwen/Qwen3-4B-Instruct-2507
-- beyoru/Qwen3-4B-I-1209
-- Qwen/Qwen3-4B-Thinking-2507
-datasets:
-- Salesforce/xlam-function-calling-60k
----
 library_name: transformers
 tags:
 - tool
 - function-calling
 - agent
 base_model:
 - Qwen/Qwen3-4B-Instruct-2507
 datasets:
 - Salesforce/xlam-function-calling-60k
 ---
 # 🧠 **Model Card — EvolLLM-Linh**
 ### **Model Overview**
@@ -42,11 +33,18 @@ It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs oper
 - Robust multi-turn dialogue consistency
 - Adaptive understanding of user preferences and intent shifts
 ---
 ### **Evaluation Comparison**
-| **Category**                    |  **EvolLLM-Linh** |  **GPT-OSS-20B**  | **xLAM-2-8b-fc-r** | **Qwen3-2507** |
 | ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
 | SINGLE TURN – SINGLE FUNCTION   |       0.800       |       0.800       |    0.63   |      0.69     |
 | SINGLE TURN – PARALLEL FUNCTION |       0.660       |       0.620       |    0.16   |      0.51     |
@@ -60,12 +58,21 @@ It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs oper
 | ATOMIC TASK – LIST              |       0.920       |       0.900       |    0.84   |      0.78     |
 | ATOMIC TASK – OBJECT (DEEP)     |       0.580       |       0.520       |    0.32   |      0.36     |
 | ATOMIC TASK – OBJECT (SHORT)    |       0.800       |       0.960       |    0.70   |      0.56     |
-| **Overall Accuracy**            | **0.750**         |     **0.760**     |  **0.61** |    **0.64**   |
 ---
 ### **Leaderboard Reference**
-Both **EvolLLM-Linh** and **GPT-OSS-20B** are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** — assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.
 Results are **internal benchmarks** aligned with ACEBench task categories.
 ---
@@ -83,6 +90,8 @@ Results are **internal benchmarks** aligned with ACEBench task categories.
   </a>
 </p>
 ### **License**
 **MIT License** — free for research and non-commercial use with attribution.
 © 2025 beyoru.

+---
 library_name: transformers
 tags:
 - tool
 - function-calling
 - agent
+- merge
 base_model:
 - Qwen/Qwen3-4B-Instruct-2507
+- beyoru/Qwen3-4B-I-1209
+- Qwen/Qwen3-4B-Thinking-2507
 datasets:
 - Salesforce/xlam-function-calling-60k
 ---
 # 🧠 **Model Card — EvolLLM-Linh**
 ### **Model Overview**
 - Robust multi-turn dialogue consistency
 - Adaptive understanding of user preferences and intent shifts
+<p align="center">
+  <img src="hyacine-hsr.gif" width="150">
+</p>
 ---
 ### **Evaluation Comparison**
+---
+| **Category**                    |  **EvolLLM-Linh** |  **GPT-OSS-20B**  | **Llama** | **Qwen-2507** |
 | ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
 | SINGLE TURN – SINGLE FUNCTION   |       0.800       |       0.800       |    0.63   |      0.69     |
 | SINGLE TURN – PARALLEL FUNCTION |       0.660       |       0.620       |    0.16   |      0.51     |
 | ATOMIC TASK – LIST              |       0.920       |       0.900       |    0.84   |      0.78     |
 | ATOMIC TASK – OBJECT (DEEP)     |       0.580       |       0.520       |    0.32   |      0.36     |
 | ATOMIC TASK – OBJECT (SHORT)    |       0.800       |       0.960       |    0.70   |      0.56     |
+| **Overall Accuracy**            | **0.750 (75.0%)** | **0.760 (76.0%)** |  **0.61** |    **0.64**   |
+---
+> **Note:**
+> **We evaluate all models with the same configuration.**
+> If you find any incorrect or inconsistent result, please report it for verification.
+> This ensures transparency and reproducibility across benchmarks.
+<p align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/65905af887944e494e37e09a/XB1XEInyfE3dyUNAGb5zF.webp" width="300">
+</p>
 ---
 ### **Leaderboard Reference**
+all model are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** — assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.
 Results are **internal benchmarks** aligned with ACEBench task categories.
 ---
   </a>
 </p>
+## Notes:
+**We evaluate all models with a same configure**. IF there are incorrect result please report.
 ### **License**
 **MIT License** — free for research and non-commercial use with attribution.
 © 2025 beyoru.