--- library_name: transformers tags: - tool - function-calling - agent - merge base_model: - Qwen/Qwen3-4B-Instruct-2507 - beyoru/Qwen3-4B-I-1209 - Qwen/Qwen3-4B-Thinking-2507 datasets: - Salesforce/xlam-function-calling-60k - beyoru/xlam-instruct-grpo --- # 🧠 **Model Card — EvolLLM-Linh** ### **Model Overview** **Name:** EvolLLM-Linh **Version:** v1.0 **Release Date:** October 23, 2025 **Base Model:** [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) **Library:** 🤗 *Transformers*

**Purpose:** EvolLLM-Linh is a fine-tuned large language model designed for **function calling**. It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs operating in **API-driven or tool-using environments**. **Key Capabilities:** - Precise and context-aware API invocation - Robust multi-turn dialogue consistency - Adaptive understanding of user preferences and intent shifts ### **Evaluation Comparison** | **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **Llama** | **Qwen-2507** | **MinCoder-4B-Expert** | | ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: | :---------------: | | SINGLE TURN – SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 | 0.81 | | SINGLE TURN – PARALLEL FUNCTION | 0.660 | 0.620 | 0.16 | 0.51 | 0.66 | | MULTI TURN – USER ADJUST | 0.500 | 0.500 | 0.40 | 0.48 | 0.50 | | MULTI TURN – USER SWITCH | 0.620 | 0.620 | 0.40 | 0.56 | 0.64 | | SIMILAR API CALLS | 0.760 | 0.740 | 0.64 | 0.68 | 0.76 | | USER PREFERENCE HANDLING | 0.600 | 0.640 | 0.62 | 0.64 | 0.60 | | ATOMIC TASK – BOOLEAN | 0.880 | 0.960 | 0.70 | 0.68 | 0.88 | | ATOMIC TASK – ENUM | 0.940 | 0.940 | 0.94 | 0.86 | 0.96 | | ATOMIC TASK – NUMBER | 0.940 | 0.960 | 0.90 | 0.82 | 0.94 | | ATOMIC TASK – LIST | 0.920 | 0.900 | 0.84 | 0.78 | 0.94 | | ATOMIC TASK – OBJECT (DEEP) | 0.580 | 0.520 | 0.32 | 0.36 | 0.62 | | ATOMIC TASK – OBJECT (SHORT) | 0.800 | 0.960 | 0.70 | 0.56 | 0.82 | | **Overall Accuracy** | **0.750 (75.0%)** | **0.760 (76.0%)** | **0.61** | **0.64** | **0.761** | --- > **Note:** > **We evaluate all models with the same configuration.** > If you find any incorrect or inconsistent result, please report it for verification. > This ensures transparency and reproducibility across benchmarks. ### **Leaderboard Reference** all model are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** — assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**. Results are **internal benchmarks** aligned with ACEBench task categories. --- ### **Method** - GRPO (Rule-based reward + self-confidence reward) - Evol Merging --- ## **Support me at**

Buy Me A Coffee

### **License** **MIT License** — free for research and non-commercial use with attribution. © 2025 beyoru. ---