Instructions to use Kassadin88/Nemotron-9B-OpenCode with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Kassadin88/Nemotron-9B-OpenCode with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Kassadin88/Nemotron-9B-OpenCode")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("Kassadin88/Nemotron-9B-OpenCode")
model = AutoModelForImageTextToText.from_pretrained("Kassadin88/Nemotron-9B-OpenCode")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Kassadin88/Nemotron-9B-OpenCode with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Kassadin88/Nemotron-9B-OpenCode"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kassadin88/Nemotron-9B-OpenCode",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Kassadin88/Nemotron-9B-OpenCode

SGLang

How to use Kassadin88/Nemotron-9B-OpenCode with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Kassadin88/Nemotron-9B-OpenCode" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kassadin88/Nemotron-9B-OpenCode",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Kassadin88/Nemotron-9B-OpenCode" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kassadin88/Nemotron-9B-OpenCode",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Kassadin88/Nemotron-9B-OpenCode with Docker Model Runner:
```
docker model run hf.co/Kassadin88/Nemotron-9B-OpenCode
```

Nemotron-9B-OpenCode / README.md

Kassadin88

Update README with training data and benchmark details

88ea406 verified about 1 month ago

preview code

raw

history blame contribute delete

11.7 kB

	---
	library_name: transformers
	license: apache-2.0
	license_link: https://huggingface.co/Qwen/Qwen3.5-9B/blob/main/LICENSE
	pipeline_tag: image-text-to-text
	base_model:
	- Qwen/Qwen3.5-9B
	tags:
	- code
	- instruction-tuned
	- software-engineering
	- agent
	- opencode
	- qwen
	- python
	language:
	- en
	- zh
	---

	# Nemotron-9B-OpenCode

	A 9B parameter instruction-tuned model specialized for autonomous software engineering agents, fine-tuned from [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) on NVIDIA's Nemotron-SFT-OpenCode-v1 dataset.

	## Model Highlights

	- Specialized for Agentic Tasks: Trained on agent trajectories for the [OpenCode](https://opencode.ai/) CLI framework, enabling autonomous code navigation, multi-step tool use, and software engineering workflows
	- Multi-Capability: Supports general reasoning, tool calling, bash command execution, and dynamic skill loading
	- Production Ready: Compatible with Hugging Face Transformers, vLLM, SGLang, and OpenAI-compatible APIs

	## Model Description

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base Model \| Qwen3.5-9B \|
	\| Model Type \| Causal Language Model with Vision Encoder \|
	\| Parameters \| 9B \|
	\| Languages \| English, Chinese \|
	\| License \| Apache 2.0 \|
	\| Developer \| [Kassadin88](https://huggingface.co/Kassadin88) \|

	## Training Data

	This model was fine-tuned on [Nemotron-SFT-OpenCode-v1](https://huggingface.co/datasets/nvidia/Nemotron-SFT-OpenCode-v1), NVIDIA's agentic instruction tuning dataset containing 144,468 high-quality samples derived from 459K total trajectories. The dataset enhances LLMs' ability to operate within autonomous coding environments.

	### Dataset Composition

	\| Subset \| Samples \| Description \|
	\|--------\|---------\|-------------\|
	\| `general` \| 90K \| General agentic CLI questions with/without AGENTS.md context \|
	\| `bash_only_tool` \| 97K \| Restricted tool set (todo + bash) for foundational agent capabilities \|
	\| `bash_only_tool_skills` \| 96K \| Bash + skill loading for dynamic capability discovery \|
	\| `question_tool` \| 76K \| Interactive clarification via user questions during task execution \|
	\| `agent_skills` \| 67K \| Dynamic skill scanning and loading for task-specific capabilities \|
	\| `agent_skills_question_tool` \| 33K \| Combined skill loading + user clarification for complex tasks \|

	### Key Capabilities Trained

	- Code Navigation: Repository-aware reasoning and codebase traversal
	- Tool Calling: Structured tool invocation for bash, file operations, and more
	- Skill Loading: Dynamic discovery and loading of relevant agent skills
	- Interactive Planning: User clarification when requirements are ambiguous
	- Multi-Step Reasoning: SWE-Bench style problem decomposition and implementation

	## Benchmark Results

	The model inherits strong foundational capabilities from Qwen3.5-9B. Below are the base model's benchmark performances:

	### Language Benchmarks

	<div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;max-width:1000px;margin:0 auto;padding:16px 0">
	<table style="width:100%;border-collapse:collapse;font-size:13px">
	<thead><tr>
	<th style="padding:10px 7px;text-align:left;font-weight:600;border-bottom:2px solid #7c3aed;color:#7c3aed">Category</th>
	<th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed">Benchmark</th>
	<th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed">Qwen3.5-9B</th>
	</tr></thead>
	<tbody>
	<tr><td rowspan="5" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Knowledge & STEM</td></tr>
	<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMLU-Pro</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.5</td></tr>
	<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMLU-Redux</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">91.1</td></tr>
	<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">C-Eval</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.2</td></tr>
	<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">GPQA Diamond</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.7</td></tr>
	<tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Instruction Following</td></tr>
	<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">IFEval</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">91.5</td></tr>
	<tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Long Context</td></tr>
	<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">LongBench v2</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.2</td></tr>
	<tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Reasoning & Coding</td></tr>
	<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">LiveCodeBench v6</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.6</td></tr>
	</tbody>
	</table>
	</div>

	### Vision Language Benchmarks

	<div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;max-width:1000px;margin:0 auto;padding:16px 0">
	<table style="width:100%;border-collapse:collapse;font-size:13px">
	<thead><tr>
	<th style="padding:10px 7px;text-align:left;font-weight:600;border-bottom:2px solid #7c3aed;color:#7c3aed">Category</th>
	<th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed">Benchmark</th>
	<th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed">Qwen3.5-9B</th>
	</tr></thead>
	<tbody>
	<tr><td rowspan="4" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">STEM & Puzzle</td></tr>
	<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMMU</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.4</td></tr>
	<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MathVision</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.9</td></tr>
	<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">Mathvista (mini)</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">85.7</td></tr>
	<tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Document Understanding</td></tr>
	<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">OCRBench</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.2</td></tr>
	<tr><td rowspan="2" style="padding:7px 7px;border-bottom:1px solid rgba(128, 128, 128, 0.15);font-weight:600;color:#7c3aed;background:rgba(124, 58, 237, 0.1)">Video Understanding</td></tr>
	<tr><td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">VideoMME (w/ sub)</td><td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">84.5</td></tr>
	</tbody>
	</table>
	</div>

	> Note: For complete benchmark results across all categories, please refer to the [Qwen3.5-9B model card](https://huggingface.co/Qwen/Qwen3.5-9B).

	## Quick Start

	### Using Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_name = "Kassadin88/Nemotron-9B-OpenCode"

	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True
	)

	messages = [
	{"role": "system", "content": "You are a helpful coding assistant."},
	{"role": "user", "content": "Write a Python function to merge two sorted arrays."}
	]

	input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

	outputs = model.generate(
	**inputs,
	max_new_tokens=512,
	do_sample=True
	)

	response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
	print(response)
	```

	### Using vLLM (Recommended for Production)

	```python
	from vllm import LLM, SamplingParams

	llm = LLM(
	model="Kassadin88/Nemotron-9B-OpenCode",
	trust_remote_code=True,
	dtype="bfloat16"
	)

	sampling_params = SamplingParams(
	max_tokens=1024
	)

	outputs = llm.generate(prompts, sampling_params)
	```

	### Using SGLang

	```bash
	python -m sglang.launch_server \
	--model-path Kassadin88/Nemotron-9B-OpenCode \
	--port 8000 \
	--tp-size 1
	```

	### OpenAI-Compatible API

	```python
	from openai import OpenAI

	client = OpenAI(
	base_url="http://localhost:8000/v1",
	api_key="EMPTY"
	)

	response = client.chat.completions.create(
	model="Kassadin88/Nemotron-9B-OpenCode",
	messages=[
	{"role": "user", "content": "Write a quicksort implementation in Python"}
	],
	max_tokens=512
	)
	print(response.choices[0].message.content)
	```

	## Usage Tips

	### For Agentic Coding Tasks

	```python
	messages = [
	{"role": "system", "content": "You are an autonomous coding agent. Use the available tools to complete tasks."},
	{"role": "user", "content": "Fix the bug in src/utils/parser.py that causes incorrect JSON parsing."}
	]
	```

	### For Code Generation

	```python
	outputs = model.generate(
	**inputs,
	max_new_tokens=1024,
	do_sample=True
	)
	```

	### For Code Explanation

	```python
	outputs = model.generate(
	**inputs,
	max_new_tokens=512,
	do_sample=True
	)
	```

	## Limitations

	- The model is primarily trained on agentic coding tasks and may not perform optimally on general conversational tasks
	- May occasionally generate incorrect or incomplete code
	- Should not be used for malicious code generation

	## Citation

	```bibtex
	@misc{nemotron-9b-opencode,
	author = {Kassadin88},
	title = {Nemotron-9B-OpenCode: An Instruction-Tuned Model for Autonomous Software Engineering},
	year = {2026},
	publisher = {HuggingFace},
	url = {https://huggingface.co/Kassadin88/Nemotron-9B-OpenCode}
	}
	```

	## Acknowledgments

	- Base Model: [Qwen Team](https://github.com/QwenLM/Qwen3) for Qwen3.5-9B
	- Training Data: [NVIDIA](https://huggingface.co/datasets/nvidia/Nemotron-SFT-OpenCode-v1) for Nemotron-SFT-OpenCode-v1
	- Training Framework: [MS-Swift](https://github.com/modelscope/swift)

	---

	Note: This model is intended for research and educational purposes. Please use responsibly.