Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +244 -0

README.md ADDED Viewed

	@@ -0,0 +1,244 @@

+---
+library_name: transformers
+base_model: Qwen/Qwen3.5-9B
+tags:
+  - qwen3.5
+  - code
+  - agent
+  - sft
+  - omnicoder
+  - tesslate
+license: apache-2.0
+language:
+  - en
+pipeline_tag: text-generation
+model-index:
+  - name: OmniCoder-9B
+    results:
+      - task:
+          type: text-generation
+        dataset:
+          name: AIME 2025
+          type: custom
+        metrics:
+          - name: Accuracy
+            type: accuracy
+            value: 91.7
+      - task:
+          type: text-generation
+        dataset:
+          name: LiveCodeBench v6
+          type: custom
+        metrics:
+          - name: Pass Rate
+            type: accuracy
+            value: 64
+      - task:
+          type: text-generation
+        dataset:
+          name: GPQA Diamond
+          type: custom
+        metrics:
+          - name: Accuracy
+            type: accuracy
+            value: 77.2
+      - task:
+          type: text-generation
+        dataset:
+          name: BrowseComp
+          type: custom
+        metrics:
+          - name: Accuracy
+            type: accuracy
+            value: 42.8
+      - task:
+          type: text-generation
+        dataset:
+          name: Terminal-Bench 2.0
+          type: custom
+        metrics:
+          - name: Pass Rate
+            type: accuracy
+            value: 28
+---
+<div align="center">
+<img src="omnicoder-banner.png" alt="OmniCoder" width="720">
+# OmniCoder-9B
+### A frontier-class open coding agent, fine-tuned on 425K agentic trajectories.
+[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+[![Base Model](https://img.shields.io/badge/Base-Qwen3.5--9B-purple)](https://huggingface.co/Qwen/Qwen3.5-9B)
+[![GGUF](https://img.shields.io/badge/GGUF-Available-green)](https://huggingface.co/Tesslate/OmniCoder-9B-GGUF)
+---
+</div>
+## Overview
+**OmniCoder-9B** is a 9-billion parameter coding agent model built by [Tesslate](https://tesslate.com), fine-tuned on top of [Qwen3.5-9B](Qwen/Qwen3.5-9B)'s hybrid architecture (Gated Delta Networks + sparse Mixture-of-Experts). It was trained on **425,000+ curated agentic coding trajectories** spanning real-world software engineering tasks, tool use, terminal operations, and multi-step reasoning.
+Despite being a 9B model, OmniCoder matches or exceeds many larger models on key coding and reasoning benchmarks — including outperforming Qwen3.5-9B on AIME 2025 and Terminal-Bench 2.0.
+The model also shows strong agentic behavior: it recovers from errors (read-before-write), responds to LSP diagnostics, and uses proper edit diffs instead of full rewrites — patterns learned directly from the 425K real-world agent trajectories it was trained on.
+### Key Features
+- **Hybrid Architecture** — Inherits Qwen3.5's Gated Delta Networks + sparse MoE design for efficient long-context processing
+- **262K Native Context** — Full 262,144 token context window, extensible to 1M+
+- **Agentic Tool Use** — Trained on real agent trajectories with bash, file I/O, search, and code editing tools
+- **Error Recovery** — Learns read-before-write patterns, responds to LSP diagnostics, and applies minimal edit diffs instead of full rewrites
+- **Thinking Mode** — Supports `<think>...</think>` reasoning chains for complex problem decomposition
+- **Apache 2.0** — Fully open weights, no restrictions
+---
+## Benchmarks
+<div align="center">
+| Benchmark | Qwen3.5-397B | **Qwen3.5-9B** | **OmniCoder-9B** | Qwen3-Next-80B | GLM-4.7-Flash | GPT-OSS-120B | GPT-OSS-20B | GLM 4.7 | Claude Haiku 4.5 |
+|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+| **AIME 2025** | 90 | 91.6 | **91.7** | | | | | | |
+| **BFCL v4** | 66.1 | 49.7 | | | | | | | |
+| **LiveCodeBench v6** | 65.6 | 68.7 | 64 | 82.7 | 61 | | | | |
+| **BrowseComp** | | | **42.8** | | 28.3 | | | | |
+| **GPQA Diamond** | 81.7 | 83.8 | 77.2 | | 80.1 | 71.5 | | | 73 |
+| **Terminal-Bench 2.0** | | 20 | **28** | | | | | 33.4 | 27 |
+</div>
+> OmniCoder-9B achieves **91.7** on AIME 2025 (vs Qwen3.5-9B's 91.6), **28** on Terminal-Bench 2.0 (vs base model's 20 — a 40% improvement), and **42.8** on BrowseComp.
+---
+## Quickstart
+### Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "Tesslate/OmniCoder-9B"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
+messages = [
+    {"role": "system", "content": "You are a helpful coding assistant."},
+    {"role": "user", "content": "Write a Python function to find the longest common subsequence of two strings."},
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer([text], return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, top_k=20)
+print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
+```
+### vLLM
+```bash
+vllm serve Tesslate/OmniCoder-9B --tensor-parallel-size 1 --max-model-len 65536
+```
+```python
+from openai import OpenAI
+client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")
+response = client.chat.completions.create(
+    model="Tesslate/OmniCoder-9B",
+    messages=[{"role": "user", "content": "Explain the difference between a mutex and a semaphore."}],
+    temperature=0.6,
+)
+print(response.choices[0].message.content)
+```
+### llama.cpp (GGUF)
+```bash
+llama-cli --hf-repo Tesslate/OmniCoder-9B-GGUF --hf-file omnicoder-9b-q4_k_m.gguf -p "Your prompt" -c 8192
+```
+See all quantizations: [Tesslate/OmniCoder-9B-GGUF](https://huggingface.co/Tesslate/OmniCoder-9B-GGUF)
+---
+## Training Details
+| | |
+|:---|:---|
+| **Base Model** | [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) |
+| **Method** | LoRA SFT (r=64, alpha=32) |
+| **Dataset** | 425K agentic trajectories from 5 sources |
+| **Sequence Length** | 65,536 tokens (sample packing, 99.35% efficiency) |
+| **Hardware** | 4x NVIDIA H200 (DDP) |
+| **Framework** | Axolotl |
+| **Precision** | bf16 |
+| **Optimizer** | AdamW (lr=2e-4, cosine schedule) |
+### Training Data Sources
+| Source | Samples | Description |
+|:---|---:|:---|
+| NVIDIA Nemotron-Terminal-Corpus | 226K | Terminal agent trajectories |
+| CoderForge-Preview (reward >= 0.5) | 155K | SWE-bench style coding trajectories |
+| Nemotron Skill-Based | 24K | Skill-based coding tasks |
+| Scale-SWE | 20K | Real GitHub issue patches (synthesized trajectories) |
+| Opus Reasoning | 2.3K | Chain-of-thought reasoning |
+---
+## Architecture
+OmniCoder inherits Qwen3.5-9B's hybrid architecture:
+- **Gated Delta Networks** — Linear attention layers interleaved with standard attention for efficient long-range dependencies
+- **Sparse MoE** — Mixture-of-Experts layers for parameter-efficient scaling
+- **VLM Backbone** — Built on `Qwen3_5ForConditionalGeneration` (supports future multimodal extensions)
+---
+## Recommended Sampling Parameters
+| Parameter | Value |
+|:---|:---|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Presence Penalty | 0.0 |
+For agentic / tool-calling tasks, consider lower temperature (0.2-0.4) for more deterministic behavior.
+---
+## Limitations
+- Performance on non-English tasks has not been extensively evaluated
+- Long-context performance beyond 65K tokens (the training sequence length) may degrade
+- Tool-calling format is flexible but works best with the scaffolding patterns seen in training
+---
+## Citation
+```bibtex
+@misc{omnicoder2025,
+  title={OmniCoder-9B: A Frontier Open Coding Agent},
+  author={Tesslate},
+  year={2025},
+  url={https://huggingface.co/Tesslate/OmniCoder-9B}
+}
+```
+---
+<div align="center">
+**Built by [Tesslate](https://tesslate.com)**
+</div>