beyoru commited on
Commit
467a404
·
verified ·
1 Parent(s): d4ed45e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -8
README.md CHANGED
@@ -38,12 +38,8 @@ It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs oper
38
  <img src="hyacine-hsr.gif" width="150">
39
  </p>
40
 
41
- ---
42
-
43
  ### **Evaluation Comparison**
44
 
45
- ---
46
-
47
  | **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **Llama** | **Qwen-2507** |
48
  | ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
49
  | SINGLE TURN – SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 |
@@ -66,10 +62,6 @@ It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs oper
66
  > **We evaluate all models with the same configuration.**
67
  > If you find any incorrect or inconsistent result, please report it for verification.
68
  > This ensures transparency and reproducibility across benchmarks.
69
- <p align="center">
70
- <img src="https://cdn-uploads.huggingface.co/production/uploads/65905af887944e494e37e09a/XB1XEInyfE3dyUNAGb5zF.webp" width="300">
71
- </p>
72
- ---
73
 
74
  ### **Leaderboard Reference**
75
  all model are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** — assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.
 
38
  <img src="hyacine-hsr.gif" width="150">
39
  </p>
40
 
 
 
41
  ### **Evaluation Comparison**
42
 
 
 
43
  | **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **Llama** | **Qwen-2507** |
44
  | ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
45
  | SINGLE TURN – SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 |
 
62
  > **We evaluate all models with the same configuration.**
63
  > If you find any incorrect or inconsistent result, please report it for verification.
64
  > This ensures transparency and reproducibility across benchmarks.
 
 
 
 
65
 
66
  ### **Leaderboard Reference**
67
  all model are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** — assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.