Kassadin88's picture
Update README with training data and benchmark details
88ea406 verified
---
library_name: transformers
license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen3.5-9B/blob/main/LICENSE
pipeline_tag: image-text-to-text
base_model:
- Qwen/Qwen3.5-9B
tags:
- code
- instruction-tuned
- software-engineering
- agent
- opencode
- qwen
- python
language:
- en
- zh
---
# Nemotron-9B-OpenCode
A 9B parameter instruction-tuned model specialized for **autonomous software engineering agents**, fine-tuned from [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) on NVIDIA's Nemotron-SFT-OpenCode-v1 dataset.
## Model Highlights
- **Specialized for Agentic Tasks**: Trained on agent trajectories for the [OpenCode](https://opencode.ai/) CLI framework, enabling autonomous code navigation, multi-step tool use, and software engineering workflows
- **Multi-Capability**: Supports general reasoning, tool calling, bash command execution, and dynamic skill loading
- **Production Ready**: Compatible with Hugging Face Transformers, vLLM, SGLang, and OpenAI-compatible APIs
## Model Description
| Property | Value |
|----------|-------|
| **Base Model** | Qwen3.5-9B |
| **Model Type** | Causal Language Model with Vision Encoder |
| **Parameters** | 9B |
| **Languages** | English, Chinese |
| **License** | Apache 2.0 |
| **Developer** | [Kassadin88](https://huggingface.co/Kassadin88) |
## Training Data
This model was fine-tuned on **[Nemotron-SFT-OpenCode-v1](https://huggingface.co/datasets/nvidia/Nemotron-SFT-OpenCode-v1)**, NVIDIA's agentic instruction tuning dataset containing **144,468 high-quality samples** derived from 459K total trajectories. The dataset enhances LLMs' ability to operate within autonomous coding environments.
### Dataset Composition
| Subset | Samples | Description |
|--------|---------|-------------|
| `general` | 90K | General agentic CLI questions with/without AGENTS.md context |
| `bash_only_tool` | 97K | Restricted tool set (todo + bash) for foundational agent capabilities |
| `bash_only_tool_skills` | 96K | Bash + skill loading for dynamic capability discovery |
| `question_tool` | 76K | Interactive clarification via user questions during task execution |
| `agent_skills` | 67K | Dynamic skill scanning and loading for task-specific capabilities |
| `agent_skills_question_tool` | 33K | Combined skill loading + user clarification for complex tasks |
### Key Capabilities Trained
- **Code Navigation**: Repository-aware reasoning and codebase traversal
- **Tool Calling**: Structured tool invocation for bash, file operations, and more
- **Skill Loading**: Dynamic discovery and loading of relevant agent skills
- **Interactive Planning**: User clarification when requirements are ambiguous
- **Multi-Step Reasoning**: SWE-Bench style problem decomposition and implementation
## Benchmark Results
The model inherits strong foundational capabilities from Qwen3.5-9B. Below are the base model's benchmark performances:
### Language Benchmarks
<div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;max-width:1000px;margin:0 auto;padding:16px 0">
<table style="width:100%;border-collapse:collapse;font-size:13px">
<thead><tr>
<th style="padding:10px 7px;text-align:left;font-weight:600;border-bottom:2px solid #7c3aed;color:#7c3aed">Category</th>
<th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed">Benchmark</th>
<th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed">Qwen3.5-9B</th>
</tr></thead>
<tbody>
<tr><td rowspan="5" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Knowledge & STEM</td></tr>
<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMLU-Pro</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.5</td></tr>
<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMLU-Redux</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">91.1</td></tr>
<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">C-Eval</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.2</td></tr>
<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">GPQA Diamond</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.7</td></tr>
<tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Instruction Following</td></tr>
<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">IFEval</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">91.5</td></tr>
<tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Long Context</td></tr>
<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">LongBench v2</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.2</td></tr>
<tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Reasoning & Coding</td></tr>
<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">LiveCodeBench v6</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.6</td></tr>
</tbody>
</table>
</div>
### Vision Language Benchmarks
<div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;max-width:1000px;margin:0 auto;padding:16px 0">
<table style="width:100%;border-collapse:collapse;font-size:13px">
<thead><tr>
<th style="padding:10px 7px;text-align:left;font-weight:600;border-bottom:2px solid #7c3aed;color:#7c3aed">Category</th>
<th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed">Benchmark</th>
<th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed">Qwen3.5-9B</th>
</tr></thead>
<tbody>
<tr><td rowspan="4" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">STEM & Puzzle</td></tr>
<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMMU</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.4</td></tr>
<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MathVision</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.9</td></tr>
<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">Mathvista (mini)</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">85.7</td></tr>
<tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Document Understanding</td></tr>
<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">OCRBench</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.2</td></tr>
<tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Video Understanding</td></tr>
<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">VideoMME (w/ sub)</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">84.5</td></tr>
</tbody>
</table>
</div>
> **Note**: For complete benchmark results across all categories, please refer to the [Qwen3.5-9B model card](https://huggingface.co/Qwen/Qwen3.5-9B).
## Quick Start
### Using Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "Kassadin88/Nemotron-9B-OpenCode"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
messages = [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to merge two sorted arrays."}
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True
)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
```
### Using vLLM (Recommended for Production)
```python
from vllm import LLM, SamplingParams
llm = LLM(
model="Kassadin88/Nemotron-9B-OpenCode",
trust_remote_code=True,
dtype="bfloat16"
)
sampling_params = SamplingParams(
max_tokens=1024
)
outputs = llm.generate(prompts, sampling_params)
```
### Using SGLang
```bash
python -m sglang.launch_server \
--model-path Kassadin88/Nemotron-9B-OpenCode \
--port 8000 \
--tp-size 1
```
### OpenAI-Compatible API
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY"
)
response = client.chat.completions.create(
model="Kassadin88/Nemotron-9B-OpenCode",
messages=[
{"role": "user", "content": "Write a quicksort implementation in Python"}
],
max_tokens=512
)
print(response.choices[0].message.content)
```
## Usage Tips
### For Agentic Coding Tasks
```python
messages = [
{"role": "system", "content": "You are an autonomous coding agent. Use the available tools to complete tasks."},
{"role": "user", "content": "Fix the bug in src/utils/parser.py that causes incorrect JSON parsing."}
]
```
### For Code Generation
```python
outputs = model.generate(
**inputs,
max_new_tokens=1024,
do_sample=True
)
```
### For Code Explanation
```python
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True
)
```
## Limitations
- The model is primarily trained on agentic coding tasks and may not perform optimally on general conversational tasks
- May occasionally generate incorrect or incomplete code
- Should not be used for malicious code generation
## Citation
```bibtex
@misc{nemotron-9b-opencode,
author = {Kassadin88},
title = {Nemotron-9B-OpenCode: An Instruction-Tuned Model for Autonomous Software Engineering},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/Kassadin88/Nemotron-9B-OpenCode}
}
```
## Acknowledgments
- **Base Model**: [Qwen Team](https://github.com/QwenLM/Qwen3) for Qwen3.5-9B
- **Training Data**: [NVIDIA](https://huggingface.co/datasets/nvidia/Nemotron-SFT-OpenCode-v1) for Nemotron-SFT-OpenCode-v1
- **Training Framework**: [MS-Swift](https://github.com/modelscope/swift)
---
**Note:** This model is intended for research and educational purposes. Please use responsibly.