Image-Text-to-Text
Transformers
Safetensors
English
Chinese
qwen3_5
code
instruction-tuned
software-engineering
agent
opencode
qwen
python
conversational
Instructions to use Kassadin88/Nemotron-9B-OpenCode with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Kassadin88/Nemotron-9B-OpenCode with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Kassadin88/Nemotron-9B-OpenCode") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Kassadin88/Nemotron-9B-OpenCode") model = AutoModelForImageTextToText.from_pretrained("Kassadin88/Nemotron-9B-OpenCode") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Kassadin88/Nemotron-9B-OpenCode with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Kassadin88/Nemotron-9B-OpenCode" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Kassadin88/Nemotron-9B-OpenCode", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Kassadin88/Nemotron-9B-OpenCode
- SGLang
How to use Kassadin88/Nemotron-9B-OpenCode with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Kassadin88/Nemotron-9B-OpenCode" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Kassadin88/Nemotron-9B-OpenCode", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Kassadin88/Nemotron-9B-OpenCode" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Kassadin88/Nemotron-9B-OpenCode", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Kassadin88/Nemotron-9B-OpenCode with Docker Model Runner:
docker model run hf.co/Kassadin88/Nemotron-9B-OpenCode
Add base model benchmarks and usage examples
Browse files
README.md
CHANGED
|
@@ -61,7 +61,40 @@ response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special
|
|
| 61 |
print(response)
|
| 62 |
```
|
| 63 |
|
| 64 |
-
## 📊
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
The model was full-parameter fine-tuned from Qwen3.5-9B using DeepSpeed ZeRO3 with BF16 precision.
|
| 67 |
|
|
@@ -129,6 +162,47 @@ sampling_params = SamplingParams(
|
|
| 129 |
outputs = llm.generate(prompts, sampling_params)
|
| 130 |
```
|
| 131 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
## ⚠️ Limitations
|
| 133 |
|
| 134 |
- The model is primarily trained on code and may not perform well on general conversational tasks
|
|
|
|
| 61 |
print(response)
|
| 62 |
```
|
| 63 |
|
| 64 |
+
## 📊 Base Model Performance (Qwen3.5-9B)
|
| 65 |
+
|
| 66 |
+
### Language Benchmarks
|
| 67 |
+
|
| 68 |
+
| Category | Benchmark | Score |
|
| 69 |
+
|----------|-----------|-------|
|
| 70 |
+
| **Knowledge & STEM** | MMLU-Pro | 82.5 |
|
| 71 |
+
| | MMLU-Redux | 91.1 |
|
| 72 |
+
| | C-Eval | 88.2 |
|
| 73 |
+
| | GPQA Diamond | 81.7 |
|
| 74 |
+
| **Instruction Following** | IFEval | 91.5 |
|
| 75 |
+
| | MultiChallenge | 54.5 |
|
| 76 |
+
| **Long Context** | AA-LCR | 63.0 |
|
| 77 |
+
| | LongBench v2 | 55.2 |
|
| 78 |
+
| **Reasoning & Coding** | HMMT Feb 25 | 83.2 |
|
| 79 |
+
| | LiveCodeBench v6 | 65.6 |
|
| 80 |
+
| **Multilingualism** | MMMLU | 81.2 |
|
| 81 |
+
| | MMLU-ProX | 76.3 |
|
| 82 |
+
|
| 83 |
+
### Vision Language Benchmarks
|
| 84 |
+
|
| 85 |
+
| Category | Benchmark | Score |
|
| 86 |
+
|----------|-----------|-------|
|
| 87 |
+
| **STEM and Puzzle** | MMMU | 78.4 |
|
| 88 |
+
| | MathVision | 78.9 |
|
| 89 |
+
| | Mathvista (mini) | 85.7 |
|
| 90 |
+
| **General VQA** | RealWorldQA | 80.3 |
|
| 91 |
+
| | MMStar | 79.7 |
|
| 92 |
+
| **Document Understanding** | OmniDocBench1.5 | 87.7 |
|
| 93 |
+
| | OCRBench | 89.2 |
|
| 94 |
+
| **Video Understanding** | VideoMME (w/ sub) | 84.5 |
|
| 95 |
+
| | MLVU | 84.4 |
|
| 96 |
+
|
| 97 |
+
## 📈 Training Details
|
| 98 |
|
| 99 |
The model was full-parameter fine-tuned from Qwen3.5-9B using DeepSpeed ZeRO3 with BF16 precision.
|
| 100 |
|
|
|
|
| 162 |
outputs = llm.generate(prompts, sampling_params)
|
| 163 |
```
|
| 164 |
|
| 165 |
+
### With SGLang
|
| 166 |
+
|
| 167 |
+
```bash
|
| 168 |
+
python -m sglang.launch_server \
|
| 169 |
+
--model-path Kassadin88/Nemotron-9B-OpenCode \
|
| 170 |
+
--port 8000 \
|
| 171 |
+
--tp-size 1 \
|
| 172 |
+
--context-length 16384
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
### OpenAI-Compatible API
|
| 176 |
+
|
| 177 |
+
```python
|
| 178 |
+
from openai import OpenAI
|
| 179 |
+
|
| 180 |
+
client = OpenAI(
|
| 181 |
+
base_url="http://localhost:8000/v1",
|
| 182 |
+
api_key="EMPTY"
|
| 183 |
+
)
|
| 184 |
+
|
| 185 |
+
response = client.chat.completions.create(
|
| 186 |
+
model="Kassadin88/Nemotron-9B-OpenCode",
|
| 187 |
+
messages=[
|
| 188 |
+
{"role": "user", "content": "Write a quicksort implementation in Python"}
|
| 189 |
+
],
|
| 190 |
+
max_tokens=512,
|
| 191 |
+
temperature=0.7,
|
| 192 |
+
top_p=0.9
|
| 193 |
+
)
|
| 194 |
+
print(response.choices[0].message.content)
|
| 195 |
+
```
|
| 196 |
+
|
| 197 |
+
## 🔧 Recommended Sampling Parameters
|
| 198 |
+
|
| 199 |
+
| Task Type | Temperature | Top-p | Top-k |
|
| 200 |
+
|-----------|-------------|-------|-------|
|
| 201 |
+
| Code Generation | 0.3 | 0.95 | 20 |
|
| 202 |
+
| Code Explanation | 0.7 | 0.9 | 20 |
|
| 203 |
+
| Debugging | 0.5 | 0.95 | 20 |
|
| 204 |
+
| General Tasks | 0.7 | 0.8 | 20 |
|
| 205 |
+
|
| 206 |
## ⚠️ Limitations
|
| 207 |
|
| 208 |
- The model is primarily trained on code and may not perform well on general conversational tasks
|