Update README.md
Browse files
README.md
CHANGED
|
@@ -46,6 +46,9 @@ license: apache-2.0
|
|
| 46 |
|
| 47 |
The model is **instruction-tuned** and supports **native tool calling** (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2602 is intended for the same broad use cases as gpt-oss-120b—reasoning, code generation, RAG, and tool-augmented applications—with **lower memory footprint** and deployment flexibility.
|
| 48 |
|
|
|
|
|
|
|
|
|
|
| 49 |
---
|
| 50 |
|
| 51 |
## Key Characteristics
|
|
@@ -227,22 +230,23 @@ Scores are accuracy or benchmark-specific metrics. Use `—` or *TBD* for evalua
|
|
| 227 |
|
| 228 |
### Quantitative Results (Inference Performance)
|
| 229 |
|
| 230 |
-
Representative throughput and memory under the evaluation setup above. Comparison against **gpt-oss-
|
| 231 |
|
| 232 |
#### Performance evaluation conditions
|
| 233 |
|
| 234 |
-
Describe the setup used to obtain the numbers in the table below (replace the placeholders or add a short paragraph):
|
| 235 |
-
|
| 236 |
- **Inference library**: vLLM 0.14.0
|
| 237 |
-
- **Hardware**:
|
| 238 |
-
- **Conditions**:
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
|
|
|
|
|
|
|
|
|
| 246 |
|
| 247 |

|
| 248 |
|
|
|
|
| 46 |
|
| 47 |
The model is **instruction-tuned** and supports **native tool calling** (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2602 is intended for the same broad use cases as gpt-oss-120b—reasoning, code generation, RAG, and tool-augmented applications—with **lower memory footprint** and deployment flexibility.
|
| 48 |
|
| 49 |
+
## Technical Deep Dive
|
| 50 |
+
For a detailed explanation of the compression architecture, model compression process, and benchmark results behind Hypernova-60B v2602, read [this full technical article by Johanna Angulo, Evaluation Manager at Multiverse Computing.](https://multiversecomputing.com/papers/hypernova-60b-2602-same-intelligence-half-the-size-improved-tool-calling-capability)
|
| 51 |
+
|
| 52 |
---
|
| 53 |
|
| 54 |
## Key Characteristics
|
|
|
|
| 230 |
|
| 231 |
### Quantitative Results (Inference Performance)
|
| 232 |
|
| 233 |
+
Representative throughput and memory under the evaluation setup above. Comparison against **gpt-oss-120b** on the same hardware.
|
| 234 |
|
| 235 |
#### Performance evaluation conditions
|
| 236 |
|
|
|
|
|
|
|
| 237 |
- **Inference library**: vLLM 0.14.0
|
| 238 |
+
- **Hardware**: 1× NVIDIA H200 Tensor Core GPU
|
| 239 |
+
- **Conditions**: concurrency=128
|
| 240 |
+
|
| 241 |
+
**Summary of Improvements:**
|
| 242 |
+
|
| 243 |
+
- **Throughput (tok/s)**: Hypernova is 39.5% faster
|
| 244 |
+
- **Mean TTFT (ms)**: Hypernova is 39.4% faster
|
| 245 |
+
- **Median TTFT (ms)**: Hypernova is 50.8% faster
|
| 246 |
+
- **P99 TTFT (ms)**: Hypernova is 36.0% faster
|
| 247 |
+
- **Mean TPOT (ms)**: Hypernova is 45.5% faster
|
| 248 |
+
- **Mean ITL (ms)**: Hypernova is 45.4% faster
|
| 249 |
+
|
| 250 |
|
| 251 |

|
| 252 |
|