davidlms commited on
Commit
0c6f06b
·
verified ·
1 Parent(s): 26df4b9

Add model-index with benchmark evaluations

Browse files

Added structured evaluation results from benchmark research:
- HLE (Humanity's Last Exam): 37.1
- FRAMES: 76.3
- τ²-Bench: 80.2

These benchmarks evaluate the model's performance on:
- General knowledge and reasoning (HLE)
- Factuality and retrieval accuracy in RAG systems (FRAMES)
- Conversational agent capabilities in dual-control environments (τ²-Bench)

Source: https://github.com/NVlabs/ToolOrchestra

This enables the model to appear in leaderboards and makes it easier to compare with other models.

Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -2,6 +2,27 @@
2
  library_name: transformers
3
  base_model:
4
  - Qwen/Qwen3-8B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
  # ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
7
 
 
2
  library_name: transformers
3
  base_model:
4
  - Qwen/Qwen3-8B
5
+ model-index:
6
+ - name: Nemotron-Orchestrator-8B
7
+ results:
8
+ - task:
9
+ type: text-generation
10
+ dataset:
11
+ name: Benchmarks
12
+ type: benchmark
13
+ metrics:
14
+ - name: HLE (Humanity's Last Exam)
15
+ type: hle
16
+ value: 37.1
17
+ - name: FRAMES
18
+ type: frames
19
+ value: 76.3
20
+ - name: τ²-Bench
21
+ type: tau_bench
22
+ value: 80.2
23
+ source:
24
+ name: ToolOrchestra Research
25
+ url: https://github.com/NVlabs/ToolOrchestra
26
  ---
27
  # ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
28