File size: 7,453 Bytes
d858dd2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 | ---
library_name: transformers
base_model: Qwen/Qwen3.5-9B
tags:
- qwen3.5
- code
- agent
- sft
- omnicoder
- tesslate
license: apache-2.0
language:
- en
pipeline_tag: text-generation
model-index:
- name: OmniCoder-9B
results:
- task:
type: text-generation
dataset:
name: AIME 2025
type: custom
metrics:
- name: pass@5
type: accuracy
value: 90.0
- task:
type: text-generation
dataset:
name: GPQA Diamond
type: custom
metrics:
- name: pass@1
type: accuracy
value: 83.8
- name: pass@3
type: accuracy
value: 86.4
- task:
type: text-generation
dataset:
name: Terminal-Bench 2.0
type: custom
metrics:
- name: Pass Rate
type: accuracy
value: 28.1
---
<div align="center">
<img src="omnicoder-banner.png" alt="OmniCoder" width="720">
# OmniCoder-9B
### A 9B coding agent fine-tuned on 425K agentic trajectories.
[](https://opensource.org/licenses/Apache-2.0)
[](https://huggingface.co/Qwen/Qwen3.5-9B)
[](https://huggingface.co/Tesslate/OmniCoder-9B-GGUF)
!! 3/12/26 Update -> [Install For Your Coding Agents](https://tesslate.com/install#omnicoder)
[Get Started](#quickstart) | [Benchmarks](#benchmarks) | [GGUF Downloads](https://huggingface.co/Tesslate/OmniCoder-9B-GGUF)
---
</div>
## Overview
**OmniCoder-9B** is a 9-billion parameter coding agent model built by [Tesslate](https://tesslate.com), fine-tuned on top of [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B)'s hybrid architecture (Gated Delta Networks interleaved with standard attention). It was trained on **425,000+ curated agentic coding trajectories** spanning real-world software engineering tasks, tool use, terminal operations, and multi-step reasoning.
The training data was specifically built from **Claude Opus 4.6 agentic and coding reasoning traces**, targeting scaffolding patterns from Claude Code, OpenCode, Codex, and Droid. The dataset includes successful trajectories from models like Claude Opus 4.6, GPT-5.4, GPT-5.3-Codex, and Gemini 3.1 Pro.
The model shows strong agentic behavior: it recovers from errors (read-before-write), responds to LSP diagnostics, and uses proper edit diffs instead of full rewrites. These patterns were learned directly from the real-world agent trajectories it was trained on.
### Key Features
- **Trained on Frontier Agent Traces** : Built from Claude Opus 4.6, GPT-5.3-Codex, GPT-5.4, and Gemini 3.1 Pro agentic coding trajectories across Claude Code, OpenCode, Codex, and Droid scaffolding
- **Hybrid Architecture** : Inherits Qwen3.5's Gated Delta Networks interleaved with standard attention for efficient long-context processing
- **262K Native Context** : Full 262,144 token context window, extensible to 1M+
- **Error Recovery** : Learns read-before-write patterns, responds to LSP diagnostics, and applies minimal edit diffs instead of full rewrites
- **Thinking Mode** : Supports `<think>...</think>` reasoning chains for complex problem decomposition
- **Apache 2.0** : Fully open weights, no restrictions
---
## Benchmarks
<div align="center">
| Benchmark | **OmniCoder-9B** | Qwen3.5-9B | Qwen3-Next-80B | GPT-OSS-120B | GPT-OSS-20B | GLM-4.7-Flash | GLM 4.7 | Claude Haiku 4.5 |
|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| **AIME 2025** (pass@5) | 90 | | | | 91.7 | 91.6 | | |
| **GPQA Diamond** (pass@1) | **83.8** | 81.7 | 77.2 | 80.1 | 71.5 | | | 73 |
| **GPQA Diamond** (pass@3) | **86.4** | | | | | | | |
| **Terminal-Bench 2.0** | **23.6** | 14.6 | | | | | 33.4 | 27 |
</div>
- **GPQA Diamond pass@1: 83.8%** (166/198). +2.1 points over the Qwen3.5-9B base model (81.7). At pass@3: **86.4** (171/198).
- **AIME 2025 pass@5: 90%** (27/30).
- **Terminal-Bench 2.0: 23.6%** (21/89). +8.99 points (+61% improvement) over the Qwen3.5-9B base model (14.6%, 13/89).
---
## Quickstart
### Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Tesslate/OmniCoder-9B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
messages = [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to find the longest common subsequence of two strings."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, top_k=20)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
```
### vLLM
```bash
vllm serve Tesslate/OmniCoder-9B --tensor-parallel-size 1 --max-model-len 65536
```
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")
response = client.chat.completions.create(
model="Tesslate/OmniCoder-9B",
messages=[{"role": "user", "content": "Explain the difference between a mutex and a semaphore."}],
temperature=0.6,
)
print(response.choices[0].message.content)
```
### llama.cpp (GGUF)
```bash
llama-cli --hf-repo Tesslate/OmniCoder-9B-GGUF --hf-file omnicoder-9b-q4_k_m.gguf -p "Your prompt" -c 8192
```
All quantizations: [Tesslate/OmniCoder-9B-GGUF](https://huggingface.co/Tesslate/OmniCoder-9B-GGUF)
---
## Training Details
| | |
|:---|:---|
| **Base Model** | [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) |
| **Method** | LoRA SFT (r=64, alpha=32) |
| **Dataset** | 425K agentic trajectories from 5 sources |
| **Packing** | Sample packing with 99.35% efficiency |
| **Hardware** | 4x NVIDIA H200 (DDP) |
| **Framework** | Axolotl |
| **Precision** | bf16 |
| **Optimizer** | AdamW (lr=2e-4, cosine schedule) |
---
## Architecture
OmniCoder inherits Qwen3.5-9B's hybrid architecture:
- **Gated Delta Networks** : Linear attention layers interleaved with standard attention for efficient long-range dependencies
- **VLM Backbone** : Built on `Qwen3_5ForConditionalGeneration`
---
## Recommended Sampling Parameters
| Parameter | Value |
|:---|:---|
| Temperature | 0.6 |
| Top-P | 0.95 |
| Top-K | 20 |
| Presence Penalty | 0.0 |
For agentic / tool-calling tasks, consider lower temperature (0.2-0.4) for more deterministic behavior.
---
## Limitations
- Performance on non-English tasks has not been extensively evaluated
- Tool-calling format is flexible but works best with the scaffolding patterns seen in training
---
## Acknowledgments
Special thanks to the [Axolotl](https://github.com/axolotl-ai-cloud/axolotl) team and the discussion in [axolotl#3453](https://github.com/axolotl-ai-cloud/axolotl/issues/3453) for helping get Qwen3.5 packing support working.
---
## Citation
```bibtex
@misc{omnicoder2025,
title={OmniCoder-9B: A Frontier Open Coding Agent},
author={Tesslate},
year={2025},
url={https://huggingface.co/Tesslate/OmniCoder-9B}
}
```
---
<div align="center">
**Built by [Tesslate](https://tesslate.com)**
</div>
|