beyoru
/

EvolLLM-Linh

Text Generation

function-calling

text-generation-inference

Model card Files Files and versions

beyoru commited on Oct 30, 2025

Commit

467a404

·

verified ·

1 Parent(s): d4ed45e

Update README.md

Files changed (1) hide show

README.md +0 -8

README.md CHANGED Viewed

@@ -38,12 +38,8 @@ It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs oper
   <img src="hyacine-hsr.gif" width="150">
 </p>
----
 ### **Evaluation Comparison**
----
 | **Category**                    |  **EvolLLM-Linh** |  **GPT-OSS-20B**  | **Llama** | **Qwen-2507** |
 | ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
 | SINGLE TURN – SINGLE FUNCTION   |       0.800       |       0.800       |    0.63   |      0.69     |
@@ -66,10 +62,6 @@ It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs oper
 > **We evaluate all models with the same configuration.**
 > If you find any incorrect or inconsistent result, please report it for verification.
 > This ensures transparency and reproducibility across benchmarks.
-<p align="center">
-  <img src="https://cdn-uploads.huggingface.co/production/uploads/65905af887944e494e37e09a/XB1XEInyfE3dyUNAGb5zF.webp" width="300">
-</p>
----
 ### **Leaderboard Reference**
 all model are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** — assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.

   <img src="hyacine-hsr.gif" width="150">
 </p>
 ### **Evaluation Comparison**
 | **Category**                    |  **EvolLLM-Linh** |  **GPT-OSS-20B**  | **Llama** | **Qwen-2507** |
 | ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
 | SINGLE TURN – SINGLE FUNCTION   |       0.800       |       0.800       |    0.63   |      0.69     |
 > **We evaluate all models with the same configuration.**
 > If you find any incorrect or inconsistent result, please report it for verification.
 > This ensures transparency and reproducibility across benchmarks.
 ### **Leaderboard Reference**
 all model are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** — assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.