Kassadin88 commited on
Commit
8afcaa2
·
verified ·
1 Parent(s): 9389c22

Add base model benchmarks and usage examples

Browse files
Files changed (1) hide show
  1. README.md +75 -1
README.md CHANGED
@@ -61,7 +61,40 @@ response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special
61
  print(response)
62
  ```
63
 
64
- ## 📊 Training Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
 
66
  The model was full-parameter fine-tuned from Qwen3.5-9B using DeepSpeed ZeRO3 with BF16 precision.
67
 
@@ -129,6 +162,47 @@ sampling_params = SamplingParams(
129
  outputs = llm.generate(prompts, sampling_params)
130
  ```
131
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
132
  ## ⚠️ Limitations
133
 
134
  - The model is primarily trained on code and may not perform well on general conversational tasks
 
61
  print(response)
62
  ```
63
 
64
+ ## 📊 Base Model Performance (Qwen3.5-9B)
65
+
66
+ ### Language Benchmarks
67
+
68
+ | Category | Benchmark | Score |
69
+ |----------|-----------|-------|
70
+ | **Knowledge & STEM** | MMLU-Pro | 82.5 |
71
+ | | MMLU-Redux | 91.1 |
72
+ | | C-Eval | 88.2 |
73
+ | | GPQA Diamond | 81.7 |
74
+ | **Instruction Following** | IFEval | 91.5 |
75
+ | | MultiChallenge | 54.5 |
76
+ | **Long Context** | AA-LCR | 63.0 |
77
+ | | LongBench v2 | 55.2 |
78
+ | **Reasoning & Coding** | HMMT Feb 25 | 83.2 |
79
+ | | LiveCodeBench v6 | 65.6 |
80
+ | **Multilingualism** | MMMLU | 81.2 |
81
+ | | MMLU-ProX | 76.3 |
82
+
83
+ ### Vision Language Benchmarks
84
+
85
+ | Category | Benchmark | Score |
86
+ |----------|-----------|-------|
87
+ | **STEM and Puzzle** | MMMU | 78.4 |
88
+ | | MathVision | 78.9 |
89
+ | | Mathvista (mini) | 85.7 |
90
+ | **General VQA** | RealWorldQA | 80.3 |
91
+ | | MMStar | 79.7 |
92
+ | **Document Understanding** | OmniDocBench1.5 | 87.7 |
93
+ | | OCRBench | 89.2 |
94
+ | **Video Understanding** | VideoMME (w/ sub) | 84.5 |
95
+ | | MLVU | 84.4 |
96
+
97
+ ## 📈 Training Details
98
 
99
  The model was full-parameter fine-tuned from Qwen3.5-9B using DeepSpeed ZeRO3 with BF16 precision.
100
 
 
162
  outputs = llm.generate(prompts, sampling_params)
163
  ```
164
 
165
+ ### With SGLang
166
+
167
+ ```bash
168
+ python -m sglang.launch_server \
169
+ --model-path Kassadin88/Nemotron-9B-OpenCode \
170
+ --port 8000 \
171
+ --tp-size 1 \
172
+ --context-length 16384
173
+ ```
174
+
175
+ ### OpenAI-Compatible API
176
+
177
+ ```python
178
+ from openai import OpenAI
179
+
180
+ client = OpenAI(
181
+ base_url="http://localhost:8000/v1",
182
+ api_key="EMPTY"
183
+ )
184
+
185
+ response = client.chat.completions.create(
186
+ model="Kassadin88/Nemotron-9B-OpenCode",
187
+ messages=[
188
+ {"role": "user", "content": "Write a quicksort implementation in Python"}
189
+ ],
190
+ max_tokens=512,
191
+ temperature=0.7,
192
+ top_p=0.9
193
+ )
194
+ print(response.choices[0].message.content)
195
+ ```
196
+
197
+ ## 🔧 Recommended Sampling Parameters
198
+
199
+ | Task Type | Temperature | Top-p | Top-k |
200
+ |-----------|-------------|-------|-------|
201
+ | Code Generation | 0.3 | 0.95 | 20 |
202
+ | Code Explanation | 0.7 | 0.9 | 20 |
203
+ | Debugging | 0.5 | 0.95 | 20 |
204
+ | General Tasks | 0.7 | 0.8 | 20 |
205
+
206
  ## ⚠️ Limitations
207
 
208
  - The model is primarily trained on code and may not perform well on general conversational tasks