jj-mvcpn commited on
Commit
b06a6cc
·
verified ·
1 Parent(s): 1566047

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -12
README.md CHANGED
@@ -46,6 +46,9 @@ license: apache-2.0
46
 
47
  The model is **instruction-tuned** and supports **native tool calling** (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2602 is intended for the same broad use cases as gpt-oss-120b—reasoning, code generation, RAG, and tool-augmented applications—with **lower memory footprint** and deployment flexibility.
48
 
 
 
 
49
  ---
50
 
51
  ## Key Characteristics
@@ -227,22 +230,23 @@ Scores are accuracy or benchmark-specific metrics. Use `—` or *TBD* for evalua
227
 
228
  ### Quantitative Results (Inference Performance)
229
 
230
- Representative throughput and memory under the evaluation setup above. Comparison against **gpt-oss-20b** and **gpt-oss-120b** on the same hardware.
231
 
232
  #### Performance evaluation conditions
233
 
234
- Describe the setup used to obtain the numbers in the table below (replace the placeholders or add a short paragraph):
235
-
236
  - **Inference library**: vLLM 0.14.0
237
- - **Hardware**: 4× NVIDIA H200 Tensor Core GPU
238
- - **Conditions**: batch size=512, context length=512, decode length=256
239
- - **Notes**: dtype=default
240
-
241
- | Metric | gpt-oss-20b | gpt-oss-120b | HyperNova 60B 2602 | Hardware |
242
- |----------------------------|--------------------------|--------------------------|--------------------------|-------------------------------|
243
- | Tokens / second (decode) | 250 | 228 | 240 | 4× NVIDIA H200 Tensor Core GPU|
244
- | Time to first token (ms) | 26 | 26 | 25 | 4× NVIDIA H200 Tensor Core GPU|
245
- | Peak GPU memory (GB) | 13 | 61 | 32 | 4× NVIDIA H200 Tensor Core GPU|
 
 
 
246
 
247
  ![Performance](assets/performance.png)
248
 
 
46
 
47
  The model is **instruction-tuned** and supports **native tool calling** (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2602 is intended for the same broad use cases as gpt-oss-120b—reasoning, code generation, RAG, and tool-augmented applications—with **lower memory footprint** and deployment flexibility.
48
 
49
+ ## Technical Deep Dive
50
+ For a detailed explanation of the compression architecture, model compression process, and benchmark results behind Hypernova-60B v2602, read [this full technical article by Johanna Angulo, Evaluation Manager at Multiverse Computing.](https://multiversecomputing.com/papers/hypernova-60b-2602-same-intelligence-half-the-size-improved-tool-calling-capability)
51
+
52
  ---
53
 
54
  ## Key Characteristics
 
230
 
231
  ### Quantitative Results (Inference Performance)
232
 
233
+ Representative throughput and memory under the evaluation setup above. Comparison against **gpt-oss-120b** on the same hardware.
234
 
235
  #### Performance evaluation conditions
236
 
 
 
237
  - **Inference library**: vLLM 0.14.0
238
+ - **Hardware**: 1× NVIDIA H200 Tensor Core GPU
239
+ - **Conditions**: concurrency=128
240
+
241
+ **Summary of Improvements:**
242
+
243
+ - **Throughput (tok/s)**: Hypernova is 39.5% faster
244
+ - **Mean TTFT (ms)**: Hypernova is 39.4% faster
245
+ - **Median TTFT (ms)**: Hypernova is 50.8% faster
246
+ - **P99 TTFT (ms)**: Hypernova is 36.0% faster
247
+ - **Mean TPOT (ms)**: Hypernova is 45.5% faster
248
+ - **Mean ITL (ms)**: Hypernova is 45.4% faster
249
+
250
 
251
  ![Performance](assets/performance.png)
252