smirki commited on
Commit
9f85527
Β·
verified Β·
1 Parent(s): 82b83c3

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +244 -0
README.md ADDED
@@ -0,0 +1,244 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ base_model: Qwen/Qwen3.5-9B
4
+ tags:
5
+ - qwen3.5
6
+ - code
7
+ - agent
8
+ - sft
9
+ - omnicoder
10
+ - tesslate
11
+ license: apache-2.0
12
+ language:
13
+ - en
14
+ pipeline_tag: text-generation
15
+ model-index:
16
+ - name: OmniCoder-9B
17
+ results:
18
+ - task:
19
+ type: text-generation
20
+ dataset:
21
+ name: AIME 2025
22
+ type: custom
23
+ metrics:
24
+ - name: Accuracy
25
+ type: accuracy
26
+ value: 91.7
27
+ - task:
28
+ type: text-generation
29
+ dataset:
30
+ name: LiveCodeBench v6
31
+ type: custom
32
+ metrics:
33
+ - name: Pass Rate
34
+ type: accuracy
35
+ value: 64
36
+ - task:
37
+ type: text-generation
38
+ dataset:
39
+ name: GPQA Diamond
40
+ type: custom
41
+ metrics:
42
+ - name: Accuracy
43
+ type: accuracy
44
+ value: 77.2
45
+ - task:
46
+ type: text-generation
47
+ dataset:
48
+ name: BrowseComp
49
+ type: custom
50
+ metrics:
51
+ - name: Accuracy
52
+ type: accuracy
53
+ value: 42.8
54
+ - task:
55
+ type: text-generation
56
+ dataset:
57
+ name: Terminal-Bench 2.0
58
+ type: custom
59
+ metrics:
60
+ - name: Pass Rate
61
+ type: accuracy
62
+ value: 28
63
+ ---
64
+
65
+ <div align="center">
66
+
67
+ <img src="omnicoder-banner.png" alt="OmniCoder" width="720">
68
+
69
+ # OmniCoder-9B
70
+
71
+ ### A frontier-class open coding agent, fine-tuned on 425K agentic trajectories.
72
+
73
+ [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
74
+ [![Base Model](https://img.shields.io/badge/Base-Qwen3.5--9B-purple)](https://huggingface.co/Qwen/Qwen3.5-9B)
75
+ [![GGUF](https://img.shields.io/badge/GGUF-Available-green)](https://huggingface.co/Tesslate/OmniCoder-9B-GGUF)
76
+
77
+ ---
78
+
79
+ </div>
80
+
81
+ ## Overview
82
+
83
+ **OmniCoder-9B** is a 9-billion parameter coding agent model built by [Tesslate](https://tesslate.com), fine-tuned on top of [Qwen3.5-9B](Qwen/Qwen3.5-9B)'s hybrid architecture (Gated Delta Networks + sparse Mixture-of-Experts). It was trained on **425,000+ curated agentic coding trajectories** spanning real-world software engineering tasks, tool use, terminal operations, and multi-step reasoning.
84
+
85
+ Despite being a 9B model, OmniCoder matches or exceeds many larger models on key coding and reasoning benchmarks β€” including outperforming Qwen3.5-9B on AIME 2025 and Terminal-Bench 2.0.
86
+
87
+ The model also shows strong agentic behavior: it recovers from errors (read-before-write), responds to LSP diagnostics, and uses proper edit diffs instead of full rewrites β€” patterns learned directly from the 425K real-world agent trajectories it was trained on.
88
+
89
+ ### Key Features
90
+
91
+ - **Hybrid Architecture** β€” Inherits Qwen3.5's Gated Delta Networks + sparse MoE design for efficient long-context processing
92
+ - **262K Native Context** β€” Full 262,144 token context window, extensible to 1M+
93
+ - **Agentic Tool Use** β€” Trained on real agent trajectories with bash, file I/O, search, and code editing tools
94
+ - **Error Recovery** β€” Learns read-before-write patterns, responds to LSP diagnostics, and applies minimal edit diffs instead of full rewrites
95
+ - **Thinking Mode** β€” Supports `<think>...</think>` reasoning chains for complex problem decomposition
96
+ - **Apache 2.0** β€” Fully open weights, no restrictions
97
+
98
+ ---
99
+
100
+ ## Benchmarks
101
+
102
+ <div align="center">
103
+
104
+ | Benchmark | Qwen3.5-397B | **Qwen3.5-9B** | **OmniCoder-9B** | Qwen3-Next-80B | GLM-4.7-Flash | GPT-OSS-120B | GPT-OSS-20B | GLM 4.7 | Claude Haiku 4.5 |
105
+ |:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
106
+ | **AIME 2025** | 90 | 91.6 | **91.7** | | | | | | |
107
+ | **BFCL v4** | 66.1 | 49.7 | | | | | | | |
108
+ | **LiveCodeBench v6** | 65.6 | 68.7 | 64 | 82.7 | 61 | | | | |
109
+ | **BrowseComp** | | | **42.8** | | 28.3 | | | | |
110
+ | **GPQA Diamond** | 81.7 | 83.8 | 77.2 | | 80.1 | 71.5 | | | 73 |
111
+ | **Terminal-Bench 2.0** | | 20 | **28** | | | | | 33.4 | 27 |
112
+
113
+ </div>
114
+
115
+ > OmniCoder-9B achieves **91.7** on AIME 2025 (vs Qwen3.5-9B's 91.6), **28** on Terminal-Bench 2.0 (vs base model's 20 β€” a 40% improvement), and **42.8** on BrowseComp.
116
+
117
+ ---
118
+
119
+ ## Quickstart
120
+
121
+ ### Transformers
122
+
123
+ ```python
124
+ from transformers import AutoModelForCausalLM, AutoTokenizer
125
+
126
+ model_id = "Tesslate/OmniCoder-9B"
127
+
128
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
129
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
130
+
131
+ messages = [
132
+ {"role": "system", "content": "You are a helpful coding assistant."},
133
+ {"role": "user", "content": "Write a Python function to find the longest common subsequence of two strings."},
134
+ ]
135
+
136
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
137
+ inputs = tokenizer([text], return_tensors="pt").to(model.device)
138
+
139
+ outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, top_k=20)
140
+ print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
141
+ ```
142
+
143
+ ### vLLM
144
+
145
+ ```bash
146
+ vllm serve Tesslate/OmniCoder-9B --tensor-parallel-size 1 --max-model-len 65536
147
+ ```
148
+
149
+ ```python
150
+ from openai import OpenAI
151
+
152
+ client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")
153
+ response = client.chat.completions.create(
154
+ model="Tesslate/OmniCoder-9B",
155
+ messages=[{"role": "user", "content": "Explain the difference between a mutex and a semaphore."}],
156
+ temperature=0.6,
157
+ )
158
+ print(response.choices[0].message.content)
159
+ ```
160
+
161
+ ### llama.cpp (GGUF)
162
+
163
+ ```bash
164
+ llama-cli --hf-repo Tesslate/OmniCoder-9B-GGUF --hf-file omnicoder-9b-q4_k_m.gguf -p "Your prompt" -c 8192
165
+ ```
166
+
167
+ See all quantizations: [Tesslate/OmniCoder-9B-GGUF](https://huggingface.co/Tesslate/OmniCoder-9B-GGUF)
168
+
169
+ ---
170
+
171
+ ## Training Details
172
+
173
+ | | |
174
+ |:---|:---|
175
+ | **Base Model** | [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) |
176
+ | **Method** | LoRA SFT (r=64, alpha=32) |
177
+ | **Dataset** | 425K agentic trajectories from 5 sources |
178
+ | **Sequence Length** | 65,536 tokens (sample packing, 99.35% efficiency) |
179
+ | **Hardware** | 4x NVIDIA H200 (DDP) |
180
+ | **Framework** | Axolotl |
181
+ | **Precision** | bf16 |
182
+ | **Optimizer** | AdamW (lr=2e-4, cosine schedule) |
183
+
184
+ ### Training Data Sources
185
+
186
+ | Source | Samples | Description |
187
+ |:---|---:|:---|
188
+ | NVIDIA Nemotron-Terminal-Corpus | 226K | Terminal agent trajectories |
189
+ | CoderForge-Preview (reward >= 0.5) | 155K | SWE-bench style coding trajectories |
190
+ | Nemotron Skill-Based | 24K | Skill-based coding tasks |
191
+ | Scale-SWE | 20K | Real GitHub issue patches (synthesized trajectories) |
192
+ | Opus Reasoning | 2.3K | Chain-of-thought reasoning |
193
+
194
+ ---
195
+
196
+ ## Architecture
197
+
198
+ OmniCoder inherits Qwen3.5-9B's hybrid architecture:
199
+
200
+ - **Gated Delta Networks** β€” Linear attention layers interleaved with standard attention for efficient long-range dependencies
201
+ - **Sparse MoE** β€” Mixture-of-Experts layers for parameter-efficient scaling
202
+ - **VLM Backbone** β€” Built on `Qwen3_5ForConditionalGeneration` (supports future multimodal extensions)
203
+
204
+ ---
205
+
206
+ ## Recommended Sampling Parameters
207
+
208
+ | Parameter | Value |
209
+ |:---|:---|
210
+ | Temperature | 0.6 |
211
+ | Top-P | 0.95 |
212
+ | Top-K | 20 |
213
+ | Presence Penalty | 0.0 |
214
+
215
+ For agentic / tool-calling tasks, consider lower temperature (0.2-0.4) for more deterministic behavior.
216
+
217
+ ---
218
+
219
+ ## Limitations
220
+
221
+ - Performance on non-English tasks has not been extensively evaluated
222
+ - Long-context performance beyond 65K tokens (the training sequence length) may degrade
223
+ - Tool-calling format is flexible but works best with the scaffolding patterns seen in training
224
+
225
+ ---
226
+
227
+ ## Citation
228
+
229
+ ```bibtex
230
+ @misc{omnicoder2025,
231
+ title={OmniCoder-9B: A Frontier Open Coding Agent},
232
+ author={Tesslate},
233
+ year={2025},
234
+ url={https://huggingface.co/Tesslate/OmniCoder-9B}
235
+ }
236
+ ```
237
+
238
+ ---
239
+
240
+ <div align="center">
241
+
242
+ **Built by [Tesslate](https://tesslate.com)**
243
+
244
+ </div>