Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -23,7 +23,7 @@ model-index:
|
|
| 23 |
metrics:
|
| 24 |
- name: pass@5
|
| 25 |
type: accuracy
|
| 26 |
-
value: 90
|
| 27 |
- task:
|
| 28 |
type: text-generation
|
| 29 |
dataset:
|
|
@@ -44,7 +44,7 @@ model-index:
|
|
| 44 |
metrics:
|
| 45 |
- name: Pass Rate
|
| 46 |
type: accuracy
|
| 47 |
-
value: 28
|
| 48 |
---
|
| 49 |
|
| 50 |
<div align="center">
|
|
@@ -86,12 +86,12 @@ The model shows strong agentic behavior: it recovers from errors (read-before-wr
|
|
| 86 |
|
| 87 |
<div align="center">
|
| 88 |
|
| 89 |
-
| Benchmark | **OmniCoder-9B** | Qwen3.5-9B | Qwen3-Next-80B | GPT-OSS-120B | GPT-OSS-20B | GLM 4.7 |
|
| 90 |
-
|:---|:---:|:---:|:---:|:---:|:---:|:---:|
|
| 91 |
-
| **AIME 2025** (pass@5) | 90 | | | | | |
|
| 92 |
-
| **GPQA Diamond** (pass@1) | **83.8** | 81.7 | 77.2 | 80.1 | 71.5 | |
|
| 93 |
-
| **GPQA Diamond** (pass@3) | **86.4** | | | | | |
|
| 94 |
-
| **Terminal-Bench 2.0** | **28** | 20 | | | | 33.4 |
|
| 95 |
|
| 96 |
</div>
|
| 97 |
|
|
@@ -164,16 +164,6 @@ See all quantizations: [Tesslate/OmniCoder-9B-GGUF](https://huggingface.co/Tessl
|
|
| 164 |
| **Precision** | bf16 |
|
| 165 |
| **Optimizer** | AdamW (lr=2e-4, cosine schedule) |
|
| 166 |
|
| 167 |
-
### Training Data Sources
|
| 168 |
-
|
| 169 |
-
| Source | Samples | Description |
|
| 170 |
-
|:---|---:|:---|
|
| 171 |
-
| NVIDIA Nemotron-Terminal-Corpus | 226K | Terminal agent trajectories |
|
| 172 |
-
| CoderForge-Preview (reward >= 0.5) | 155K | SWE-bench style coding trajectories |
|
| 173 |
-
| Nemotron Skill-Based | 24K | Skill-based coding tasks |
|
| 174 |
-
| Scale-SWE | 20K | Real GitHub issue patches (synthesized trajectories) |
|
| 175 |
-
| Opus Reasoning | 2.3K | Chain-of-thought reasoning |
|
| 176 |
-
|
| 177 |
---
|
| 178 |
|
| 179 |
## Architecture
|
|
@@ -181,7 +171,7 @@ See all quantizations: [Tesslate/OmniCoder-9B-GGUF](https://huggingface.co/Tessl
|
|
| 181 |
OmniCoder inherits Qwen3.5-9B's hybrid architecture:
|
| 182 |
|
| 183 |
- **Gated Delta Networks** : Linear attention layers interleaved with standard attention for efficient long-range dependencies
|
| 184 |
-
- **VLM Backbone** : Built on `Qwen3_5ForConditionalGeneration`
|
| 185 |
|
| 186 |
---
|
| 187 |
|
|
|
|
| 23 |
metrics:
|
| 24 |
- name: pass@5
|
| 25 |
type: accuracy
|
| 26 |
+
value: 90.0
|
| 27 |
- task:
|
| 28 |
type: text-generation
|
| 29 |
dataset:
|
|
|
|
| 44 |
metrics:
|
| 45 |
- name: Pass Rate
|
| 46 |
type: accuracy
|
| 47 |
+
value: 28.0
|
| 48 |
---
|
| 49 |
|
| 50 |
<div align="center">
|
|
|
|
| 86 |
|
| 87 |
<div align="center">
|
| 88 |
|
| 89 |
+
| Benchmark | **OmniCoder-9B** | Qwen3.5-9B | Qwen3-Next-80B | GPT-OSS-120B | GPT-OSS-20B | GLM-4.7-Flash | GLM 4.7 | Claude Haiku 4.5 |
|
| 90 |
+
|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
| 91 |
+
| **AIME 2025** (pass@5) | 90 | | | | 91.7 | 91.6 | | |
|
| 92 |
+
| **GPQA Diamond** (pass@1) | **83.8** | 81.7 | 77.2 | 80.1 | 71.5 | | | 73 |
|
| 93 |
+
| **GPQA Diamond** (pass@3) | **86.4** | | | | | | | |
|
| 94 |
+
| **Terminal-Bench 2.0** | **28** | 20 | | | | | 33.4 | 27 |
|
| 95 |
|
| 96 |
</div>
|
| 97 |
|
|
|
|
| 164 |
| **Precision** | bf16 |
|
| 165 |
| **Optimizer** | AdamW (lr=2e-4, cosine schedule) |
|
| 166 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 167 |
---
|
| 168 |
|
| 169 |
## Architecture
|
|
|
|
| 171 |
OmniCoder inherits Qwen3.5-9B's hybrid architecture:
|
| 172 |
|
| 173 |
- **Gated Delta Networks** : Linear attention layers interleaved with standard attention for efficient long-range dependencies
|
| 174 |
+
- **VLM Backbone** : Built on `Qwen3_5ForConditionalGeneration`
|
| 175 |
|
| 176 |
---
|
| 177 |
|