Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +31 -44

README.md CHANGED Viewed

@@ -21,36 +21,21 @@ model-index:
           name: AIME 2025
           type: custom
         metrics:
-          - name: Accuracy
             type: accuracy
-            value: 91.7
-      - task:
-          type: text-generation
-        dataset:
-          name: LiveCodeBench v6
-          type: custom
-        metrics:
-          - name: Pass Rate
-            type: accuracy
-            value: 64
       - task:
           type: text-generation
         dataset:
           name: GPQA Diamond
           type: custom
         metrics:
-          - name: Accuracy
             type: accuracy
-            value: 77.2
-      - task:
-          type: text-generation
-        dataset:
-          name: BrowseComp
-          type: custom
-        metrics:
-          - name: Accuracy
             type: accuracy
-            value: 42.8
       - task:
           type: text-generation
         dataset:
@@ -80,20 +65,20 @@ model-index:
 ## Overview
-**OmniCoder-9B** is a 9-billion parameter coding agent model built by [Tesslate](https://tesslate.com), fine-tuned on top of [Qwen3.5-9B](Qwen/Qwen3.5-9B)'s hybrid architecture (Gated Delta Networks + sparse Mixture-of-Experts). It was trained on **425,000+ curated agentic coding trajectories** spanning real-world software engineering tasks, tool use, terminal operations, and multi-step reasoning.
-Despite being a 9B model, OmniCoder matches or exceeds many larger models on key coding and reasoning benchmarks — including outperforming Qwen3.5-9B on AIME 2025 and Terminal-Bench 2.0.
-The model also shows strong agentic behavior: it recovers from errors (read-before-write), responds to LSP diagnostics, and uses proper edit diffs instead of full rewrites — patterns learned directly from the 425K real-world agent trajectories it was trained on.
 ### Key Features
-- **Hybrid Architecture** — Inherits Qwen3.5's Gated Delta Networks + sparse MoE design for efficient long-context processing
-- **262K Native Context** — Full 262,144 token context window, extensible to 1M+
-- **Agentic Tool Use** — Trained on real agent trajectories with bash, file I/O, search, and code editing tools
-- **Error Recovery** — Learns read-before-write patterns, responds to LSP diagnostics, and applies minimal edit diffs instead of full rewrites
-- **Thinking Mode** — Supports `<think>...</think>` reasoning chains for complex problem decomposition
-- **Apache 2.0** — Fully open weights, no restrictions
 ---
@@ -101,18 +86,16 @@ The model also shows strong agentic behavior: it recovers from errors (read-befo
 <div align="center">
-| Benchmark | Qwen3.5-397B | **Qwen3.5-9B** | **OmniCoder-9B** | Qwen3-Next-80B | GLM-4.7-Flash | GPT-OSS-120B | GPT-OSS-20B | GLM 4.7 | Claude Haiku 4.5 |
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
-| **AIME 2025** | 90 | 91.6 | **91.7** | | | | | | |
-| **BFCL v4** | 66.1 | 49.7 | | | | | | | |
-| **LiveCodeBench v6** | 65.6 | 68.7 | 64 | 82.7 | 61 | | | | |
-| **BrowseComp** | | | | | 28.3 | | | | |
-| **GPQA Diamond** | 81.7 | 83.8 | 77.2 | | 80.1 | 71.5 | | | 73 |
-| **Terminal-Bench 2.0** | | 20 | **28** | | | | | 33.4 | 27 |
 </div>
-> OmniCoder-9B achieves **91.7** on AIME 2025 (vs Qwen3.5-9B's 91.6), **28** on Terminal-Bench 2.0 (vs base model's 20 — a 40% improvement), and **42.8** on BrowseComp.
 ---
@@ -175,7 +158,7 @@ See all quantizations: [Tesslate/OmniCoder-9B-GGUF](https://huggingface.co/Tessl
 | **Base Model** | [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) |
 | **Method** | LoRA SFT (r=64, alpha=32) |
 | **Dataset** | 425K agentic trajectories from 5 sources |
-| **Sequence Length** | 65,536 tokens (sample packing, 99.35% efficiency) |
 | **Hardware** | 4x NVIDIA H200 (DDP) |
 | **Framework** | Axolotl |
 | **Precision** | bf16 |
@@ -197,9 +180,8 @@ See all quantizations: [Tesslate/OmniCoder-9B-GGUF](https://huggingface.co/Tessl
 OmniCoder inherits Qwen3.5-9B's hybrid architecture:
-- **Gated Delta Networks** — Linear attention layers interleaved with standard attention for efficient long-range dependencies
-- **Sparse MoE** — Mixture-of-Experts layers for parameter-efficient scaling
-- **VLM Backbone** — Built on `Qwen3_5ForConditionalGeneration` (supports future multimodal extensions)
 ---
@@ -219,11 +201,16 @@ For agentic / tool-calling tasks, consider lower temperature (0.2-0.4) for more
 ## Limitations
 - Performance on non-English tasks has not been extensively evaluated
-- Long-context performance beyond 65K tokens (the training sequence length) may degrade
 - Tool-calling format is flexible but works best with the scaffolding patterns seen in training
 ---
 ## Citation
 ```bibtex

           name: AIME 2025
           type: custom
         metrics:
+          - name: pass@5
             type: accuracy
+            value: 90
       - task:
           type: text-generation
         dataset:
           name: GPQA Diamond
           type: custom
         metrics:
+          - name: pass@1
             type: accuracy
+            value: 83.8
+          - name: pass@3
             type: accuracy
+            value: 86.4
       - task:
           type: text-generation
         dataset:
 ## Overview
+**OmniCoder-9B** is a 9-billion parameter coding agent model built by [Tesslate](https://tesslate.com), fine-tuned on top of [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B)'s hybrid architecture (Gated Delta Networks interleaved with standard attention). It was trained on **425,000+ curated agentic coding trajectories** spanning real-world software engineering tasks, tool use, terminal operations, and multi-step reasoning.
+The training data was specifically built from **Claude Opus 4.6 agentic and coding reasoning traces**, targeting scaffolding patterns from Claude Code, OpenCode, Codex, and Droid. The dataset includes successful trajectories from models like Claude Opus 4.6, GPT-5.4, GPT-5.3-Codex, and Gemini 3.1 Pro.
+The model shows strong agentic behavior: it recovers from errors (read-before-write), responds to LSP diagnostics, and uses proper edit diffs instead of full rewrites. These patterns were learned directly from the real-world agent trajectories it was trained on.
 ### Key Features
+- **Trained on Frontier Agent Traces** : Built from Claude Opus 4.6, GPT-5.3-Codex, GPT-5.4, and Gemini 3.1 Pro agentic coding trajectories across Claude Code, OpenCode, Codex, and Droid scaffolding
+- **Hybrid Architecture** : Inherits Qwen3.5's Gated Delta Networks interleaved with standard attention for efficient long-context processing
+- **262K Native Context** : Full 262,144 token context window, extensible to 1M+
+- **Error Recovery** : Learns read-before-write patterns, responds to LSP diagnostics, and applies minimal edit diffs instead of full rewrites
+- **Thinking Mode** : Supports `<think>...</think>` reasoning chains for complex problem decomposition
+- **Apache 2.0** : Fully open weights, no restrictions
 ---
 <div align="center">
+| Benchmark | **OmniCoder-9B** | Qwen3.5-9B | Qwen3-Next-80B | GPT-OSS-120B | GPT-OSS-20B | GLM 4.7 |
+|:---|:---:|:---:|:---:|:---:|:---:|:---:|
+| **AIME 2025** (pass@5) | 90 | | | | | |
+| **GPQA Diamond** (pass@1) | **83.8** | 81.7 | 77.2 | 80.1 | 71.5 | |
+| **GPQA Diamond** (pass@3) | **86.4** | | | | | |
+| **Terminal-Bench 2.0** | **28** | 20 | | | | 33.4 |
 </div>
+> OmniCoder-9B achieves **83.8** on GPQA Diamond pass@1 (vs Qwen3.5-9B's 81.7), **86.4** at pass@3, and **28** on Terminal-Bench 2.0 (vs base model's 20, a 40% improvement).
 ---
 | **Base Model** | [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) |
 | **Method** | LoRA SFT (r=64, alpha=32) |
 | **Dataset** | 425K agentic trajectories from 5 sources |
+| **Packing** | Sample packing with 99.35% efficiency |
 | **Hardware** | 4x NVIDIA H200 (DDP) |
 | **Framework** | Axolotl |
 | **Precision** | bf16 |
 OmniCoder inherits Qwen3.5-9B's hybrid architecture:
+- **Gated Delta Networks** : Linear attention layers interleaved with standard attention for efficient long-range dependencies
+- **VLM Backbone** : Built on `Qwen3_5ForConditionalGeneration` (supports future multimodal extensions)
 ---
 ## Limitations
 - Performance on non-English tasks has not been extensively evaluated
 - Tool-calling format is flexible but works best with the scaffolding patterns seen in training
 ---
+## Acknowledgments
+Special thanks to the [Axolotl](https://github.com/axolotl-ai-cloud/axolotl) team and the discussion in [axolotl#3453](https://github.com/axolotl-ai-cloud/axolotl/issues/3453) for helping get Qwen3.5 packing support working.
+---
 ## Citation
 ```bibtex