smirki commited on
Commit
9f0c62c
Β·
verified Β·
1 Parent(s): 10dbb72

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +31 -44
README.md CHANGED
@@ -21,36 +21,21 @@ model-index:
21
  name: AIME 2025
22
  type: custom
23
  metrics:
24
- - name: Accuracy
25
  type: accuracy
26
- value: 91.7
27
- - task:
28
- type: text-generation
29
- dataset:
30
- name: LiveCodeBench v6
31
- type: custom
32
- metrics:
33
- - name: Pass Rate
34
- type: accuracy
35
- value: 64
36
  - task:
37
  type: text-generation
38
  dataset:
39
  name: GPQA Diamond
40
  type: custom
41
  metrics:
42
- - name: Accuracy
43
  type: accuracy
44
- value: 77.2
45
- - task:
46
- type: text-generation
47
- dataset:
48
- name: BrowseComp
49
- type: custom
50
- metrics:
51
- - name: Accuracy
52
  type: accuracy
53
- value: 42.8
54
  - task:
55
  type: text-generation
56
  dataset:
@@ -80,20 +65,20 @@ model-index:
80
 
81
  ## Overview
82
 
83
- **OmniCoder-9B** is a 9-billion parameter coding agent model built by [Tesslate](https://tesslate.com), fine-tuned on top of [Qwen3.5-9B](Qwen/Qwen3.5-9B)'s hybrid architecture (Gated Delta Networks + sparse Mixture-of-Experts). It was trained on **425,000+ curated agentic coding trajectories** spanning real-world software engineering tasks, tool use, terminal operations, and multi-step reasoning.
84
 
85
- Despite being a 9B model, OmniCoder matches or exceeds many larger models on key coding and reasoning benchmarks β€” including outperforming Qwen3.5-9B on AIME 2025 and Terminal-Bench 2.0.
86
 
87
- The model also shows strong agentic behavior: it recovers from errors (read-before-write), responds to LSP diagnostics, and uses proper edit diffs instead of full rewrites β€” patterns learned directly from the 425K real-world agent trajectories it was trained on.
88
 
89
  ### Key Features
90
 
91
- - **Hybrid Architecture** β€” Inherits Qwen3.5's Gated Delta Networks + sparse MoE design for efficient long-context processing
92
- - **262K Native Context** β€” Full 262,144 token context window, extensible to 1M+
93
- - **Agentic Tool Use** β€” Trained on real agent trajectories with bash, file I/O, search, and code editing tools
94
- - **Error Recovery** β€” Learns read-before-write patterns, responds to LSP diagnostics, and applies minimal edit diffs instead of full rewrites
95
- - **Thinking Mode** β€” Supports `<think>...</think>` reasoning chains for complex problem decomposition
96
- - **Apache 2.0** β€” Fully open weights, no restrictions
97
 
98
  ---
99
 
@@ -101,18 +86,16 @@ The model also shows strong agentic behavior: it recovers from errors (read-befo
101
 
102
  <div align="center">
103
 
104
- | Benchmark | Qwen3.5-397B | **Qwen3.5-9B** | **OmniCoder-9B** | Qwen3-Next-80B | GLM-4.7-Flash | GPT-OSS-120B | GPT-OSS-20B | GLM 4.7 | Claude Haiku 4.5 |
105
- |:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
106
- | **AIME 2025** | 90 | 91.6 | **91.7** | | | | | | |
107
- | **BFCL v4** | 66.1 | 49.7 | | | | | | | |
108
- | **LiveCodeBench v6** | 65.6 | 68.7 | 64 | 82.7 | 61 | | | | |
109
- | **BrowseComp** | | | | | 28.3 | | | | |
110
- | **GPQA Diamond** | 81.7 | 83.8 | 77.2 | | 80.1 | 71.5 | | | 73 |
111
- | **Terminal-Bench 2.0** | | 20 | **28** | | | | | 33.4 | 27 |
112
 
113
  </div>
114
 
115
- > OmniCoder-9B achieves **91.7** on AIME 2025 (vs Qwen3.5-9B's 91.6), **28** on Terminal-Bench 2.0 (vs base model's 20 β€” a 40% improvement), and **42.8** on BrowseComp.
116
 
117
  ---
118
 
@@ -175,7 +158,7 @@ See all quantizations: [Tesslate/OmniCoder-9B-GGUF](https://huggingface.co/Tessl
175
  | **Base Model** | [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) |
176
  | **Method** | LoRA SFT (r=64, alpha=32) |
177
  | **Dataset** | 425K agentic trajectories from 5 sources |
178
- | **Sequence Length** | 65,536 tokens (sample packing, 99.35% efficiency) |
179
  | **Hardware** | 4x NVIDIA H200 (DDP) |
180
  | **Framework** | Axolotl |
181
  | **Precision** | bf16 |
@@ -197,9 +180,8 @@ See all quantizations: [Tesslate/OmniCoder-9B-GGUF](https://huggingface.co/Tessl
197
 
198
  OmniCoder inherits Qwen3.5-9B's hybrid architecture:
199
 
200
- - **Gated Delta Networks** β€” Linear attention layers interleaved with standard attention for efficient long-range dependencies
201
- - **Sparse MoE** β€” Mixture-of-Experts layers for parameter-efficient scaling
202
- - **VLM Backbone** β€” Built on `Qwen3_5ForConditionalGeneration` (supports future multimodal extensions)
203
 
204
  ---
205
 
@@ -219,11 +201,16 @@ For agentic / tool-calling tasks, consider lower temperature (0.2-0.4) for more
219
  ## Limitations
220
 
221
  - Performance on non-English tasks has not been extensively evaluated
222
- - Long-context performance beyond 65K tokens (the training sequence length) may degrade
223
  - Tool-calling format is flexible but works best with the scaffolding patterns seen in training
224
 
225
  ---
226
 
 
 
 
 
 
 
227
  ## Citation
228
 
229
  ```bibtex
 
21
  name: AIME 2025
22
  type: custom
23
  metrics:
24
+ - name: pass@5
25
  type: accuracy
26
+ value: 90
 
 
 
 
 
 
 
 
 
27
  - task:
28
  type: text-generation
29
  dataset:
30
  name: GPQA Diamond
31
  type: custom
32
  metrics:
33
+ - name: pass@1
34
  type: accuracy
35
+ value: 83.8
36
+ - name: pass@3
 
 
 
 
 
 
37
  type: accuracy
38
+ value: 86.4
39
  - task:
40
  type: text-generation
41
  dataset:
 
65
 
66
  ## Overview
67
 
68
+ **OmniCoder-9B** is a 9-billion parameter coding agent model built by [Tesslate](https://tesslate.com), fine-tuned on top of [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B)'s hybrid architecture (Gated Delta Networks interleaved with standard attention). It was trained on **425,000+ curated agentic coding trajectories** spanning real-world software engineering tasks, tool use, terminal operations, and multi-step reasoning.
69
 
70
+ The training data was specifically built from **Claude Opus 4.6 agentic and coding reasoning traces**, targeting scaffolding patterns from Claude Code, OpenCode, Codex, and Droid. The dataset includes successful trajectories from models like Claude Opus 4.6, GPT-5.4, GPT-5.3-Codex, and Gemini 3.1 Pro.
71
 
72
+ The model shows strong agentic behavior: it recovers from errors (read-before-write), responds to LSP diagnostics, and uses proper edit diffs instead of full rewrites. These patterns were learned directly from the real-world agent trajectories it was trained on.
73
 
74
  ### Key Features
75
 
76
+ - **Trained on Frontier Agent Traces** : Built from Claude Opus 4.6, GPT-5.3-Codex, GPT-5.4, and Gemini 3.1 Pro agentic coding trajectories across Claude Code, OpenCode, Codex, and Droid scaffolding
77
+ - **Hybrid Architecture** : Inherits Qwen3.5's Gated Delta Networks interleaved with standard attention for efficient long-context processing
78
+ - **262K Native Context** : Full 262,144 token context window, extensible to 1M+
79
+ - **Error Recovery** : Learns read-before-write patterns, responds to LSP diagnostics, and applies minimal edit diffs instead of full rewrites
80
+ - **Thinking Mode** : Supports `<think>...</think>` reasoning chains for complex problem decomposition
81
+ - **Apache 2.0** : Fully open weights, no restrictions
82
 
83
  ---
84
 
 
86
 
87
  <div align="center">
88
 
89
+ | Benchmark | **OmniCoder-9B** | Qwen3.5-9B | Qwen3-Next-80B | GPT-OSS-120B | GPT-OSS-20B | GLM 4.7 |
90
+ |:---|:---:|:---:|:---:|:---:|:---:|:---:|
91
+ | **AIME 2025** (pass@5) | 90 | | | | | |
92
+ | **GPQA Diamond** (pass@1) | **83.8** | 81.7 | 77.2 | 80.1 | 71.5 | |
93
+ | **GPQA Diamond** (pass@3) | **86.4** | | | | | |
94
+ | **Terminal-Bench 2.0** | **28** | 20 | | | | 33.4 |
 
 
95
 
96
  </div>
97
 
98
+ > OmniCoder-9B achieves **83.8** on GPQA Diamond pass@1 (vs Qwen3.5-9B's 81.7), **86.4** at pass@3, and **28** on Terminal-Bench 2.0 (vs base model's 20, a 40% improvement).
99
 
100
  ---
101
 
 
158
  | **Base Model** | [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) |
159
  | **Method** | LoRA SFT (r=64, alpha=32) |
160
  | **Dataset** | 425K agentic trajectories from 5 sources |
161
+ | **Packing** | Sample packing with 99.35% efficiency |
162
  | **Hardware** | 4x NVIDIA H200 (DDP) |
163
  | **Framework** | Axolotl |
164
  | **Precision** | bf16 |
 
180
 
181
  OmniCoder inherits Qwen3.5-9B's hybrid architecture:
182
 
183
+ - **Gated Delta Networks** : Linear attention layers interleaved with standard attention for efficient long-range dependencies
184
+ - **VLM Backbone** : Built on `Qwen3_5ForConditionalGeneration` (supports future multimodal extensions)
 
185
 
186
  ---
187
 
 
201
  ## Limitations
202
 
203
  - Performance on non-English tasks has not been extensively evaluated
 
204
  - Tool-calling format is flexible but works best with the scaffolding patterns seen in training
205
 
206
  ---
207
 
208
+ ## Acknowledgments
209
+
210
+ Special thanks to the [Axolotl](https://github.com/axolotl-ai-cloud/axolotl) team and the discussion in [axolotl#3453](https://github.com/axolotl-ai-cloud/axolotl/issues/3453) for helping get Qwen3.5 packing support working.
211
+
212
+ ---
213
+
214
  ## Citation
215
 
216
  ```bibtex