Add model-index with benchmark evaluations

Added structured evaluation results from benchmark research:
- HLE (Humanity's Last Exam): 37.1
- FRAMES: 76.3
- τ²-Bench: 80.2

These benchmarks evaluate the model's performance on:
- General knowledge and reasoning (HLE)
- Factuality and retrieval accuracy in RAG systems (FRAMES)
- Conversational agent capabilities in dual-control environments (τ²-Bench)

Source: https://github.com/NVlabs/ToolOrchestra

This enables the model to appear in leaderboards and makes it easier to compare with other models.

Files changed (1) hide show

README.md +21 -0

README.md CHANGED Viewed

@@ -2,6 +2,27 @@
 library_name: transformers
 base_model:
 - Qwen/Qwen3-8B
 ---
 # ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

 library_name: transformers
 base_model:
 - Qwen/Qwen3-8B
+model-index:
+- name: Nemotron-Orchestrator-8B
+  results:
+  - task:
+      type: text-generation
+    dataset:
+      name: Benchmarks
+      type: benchmark
+    metrics:
+    - name: HLE (Humanity's Last Exam)
+      type: hle
+      value: 37.1
+    - name: FRAMES
+      type: frames
+      value: 76.3
+    - name: τ²-Bench
+      type: tau_bench
+      value: 80.2
+    source:
+      name: ToolOrchestra Research
+      url: https://github.com/NVlabs/ToolOrchestra
 ---
 # ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration