Image-Text-to-Text
Transformers
Safetensors
qwen3_5
text-generation-inference
unsloth
reasoning
chain-of-thought
lora
sft
agent
tool-use
function-calling
coder
conversational
Jackrong commited on
Commit
e55d2e2
·
verified ·
1 Parent(s): e5a0c52

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -5
README.md CHANGED
@@ -249,10 +249,10 @@ The following shows the comparative performance on **SWE-bench Verified**, which
249
  > - ❤️ **Kyle Hessling** for his generous hardware and equipment support. You can follow him for more updates on X / Twitter: [@KyleHessling1](https://x.com/KyleHessling1).
250
 
251
  ---
252
-
253
  #### Community-Reported Tool Calling Evaluation
254
 
255
- A community user independently evaluated Qwopus3.5-9B-coder on a tool-recall test with up to 31 available tools and an adversarial tool-selection phase containing semantic overlaps and decoy tools.
 
256
 
257
  | Model | Phase 1: Tool Recall | Phase 2: Adversarial Tool Selection |
258
  |---|---:|---:|
@@ -260,9 +260,6 @@ A community user independently evaluated Qwopus3.5-9B-coder on a tool-recall tes
260
  | Claude Opus 4.6 | 100% | 27 / 28 (96%) |
261
  | Qwen3.5-9B | 100% | 26 / 28 (93%) |
262
 
263
- > This is an unofficial community-reported evaluation rather than a standardized benchmark. See the original Hugging Face discussion for the full test logs and additional model comparisons.
264
-
265
-
266
  ---
267
 
268
  ### 🧪 Core Dataset Usage: Trace Inversion and High-Quality Agent Traces
 
249
  > - ❤️ **Kyle Hessling** for his generous hardware and equipment support. You can follow him for more updates on X / Twitter: [@KyleHessling1](https://x.com/KyleHessling1).
250
 
251
  ---
 
252
  #### Community-Reported Tool Calling Evaluation
253
 
254
+ > [!TIP]
255
+ > A community user independently evaluated Qwopus3.5-9B-coder on a tool-recall test with up to 31 available tools and an adversarial tool-selection phase containing semantic overlaps and decoy tools.
256
 
257
  | Model | Phase 1: Tool Recall | Phase 2: Adversarial Tool Selection |
258
  |---|---:|---:|
 
260
  | Claude Opus 4.6 | 100% | 27 / 28 (96%) |
261
  | Qwen3.5-9B | 100% | 26 / 28 (93%) |
262
 
 
 
 
263
  ---
264
 
265
  ### 🧪 Core Dataset Usage: Trace Inversion and High-Quality Agent Traces