---
library_name: transformers
tags:
- tool
- function-calling
- agent
- merge
base_model:
- Qwen/Qwen3-4B-Instruct-2507
- beyoru/Qwen3-4B-I-1209
- Qwen/Qwen3-4B-Thinking-2507
datasets:
- Salesforce/xlam-function-calling-60k
- beyoru/xlam-instruct-grpo
---


# 🧠 **Model Card — EvolLLM-Linh**

### **Model Overview**

**Name:** EvolLLM-Linh  
**Version:** v1.0  
**Release Date:** October 23, 2025  
**Base Model:** [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)  
**Library:** 🤗 *Transformers*  

<p align="center">
  <img src="hyacine-hsr.gif" width="150">
</p>

**Purpose:**  
EvolLLM-Linh is a fine-tuned large language model designed for **function calling**.  
It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs operating in **API-driven or tool-using environments**.

**Key Capabilities:**
- Precise and context-aware API invocation  
- Robust multi-turn dialogue consistency  
- Adaptive understanding of user preferences and intent shifts  


### **Evaluation Comparison**

| **Category**                    |  **EvolLLM-Linh** |  **GPT-OSS-20B**  | **Llama** | **Qwen-2507** |      **MinCoder-4B-Expert**     |
| ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: | :---------------: |
| SINGLE TURN – SINGLE FUNCTION   |       0.800       |       0.800       |    0.63   |      0.69     |        0.81       |
| SINGLE TURN – PARALLEL FUNCTION |       0.660       |       0.620       |    0.16   |      0.51     |        0.66       |
| MULTI TURN – USER ADJUST        |       0.500       |       0.500       |    0.40   |      0.48     |        0.50       |
| MULTI TURN – USER SWITCH        |       0.620       |       0.620       |    0.40   |      0.56     |        0.64       |
| SIMILAR API CALLS               |       0.760       |       0.740       |    0.64   |      0.68     |        0.76       |
| USER PREFERENCE HANDLING        |       0.600       |       0.640       |    0.62   |      0.64     |        0.60       |
| ATOMIC TASK – BOOLEAN           |       0.880       |       0.960       |    0.70   |      0.68     |        0.88       |
| ATOMIC TASK – ENUM              |       0.940       |       0.940       |    0.94   |      0.86     |        0.96       |
| ATOMIC TASK – NUMBER            |       0.940       |       0.960       |    0.90   |      0.82     |        0.94       |
| ATOMIC TASK – LIST              |       0.920       |       0.900       |    0.84   |      0.78     |        0.94       |
| ATOMIC TASK – OBJECT (DEEP)     |       0.580       |       0.520       |    0.32   |      0.36     |        0.62       |
| ATOMIC TASK – OBJECT (SHORT)    |       0.800       |       0.960       |    0.70   |      0.56     |        0.82       |
| **Overall Accuracy**            | **0.750 (75.0%)** | **0.760 (76.0%)** |  **0.61** |    **0.64**   |      **0.761**    |


---

> **Note:**
> **We evaluate all models with the same configuration.**
> If you find any incorrect or inconsistent result, please report it for verification.
> This ensures transparency and reproducibility across benchmarks.

### **Leaderboard Reference**
all model are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** — assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.  
Results are **internal benchmarks** aligned with ACEBench task categories.

---

### **Method**
- GRPO (Rule-based reward + self-confidence reward)  
- Evol Merging  

---

## **Support me at**
<p align="center">
  <a href="https://www.buymeacoffee.com/ductransa0g" target="_blank">
    <img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="150px">
  </a>
</p>

### **License**

**MIT License** — free for research and non-commercial use with attribution.  
© 2025 beyoru.
---