File size: 14,751 Bytes
dcfadb3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f8eeacb
dcfadb3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b96e111
dcfadb3
 
 
 
 
 
 
 
b96e111
dcfadb3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b96e111
 
 
 
 
 
 
 
 
 
 
dcfadb3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b96e111
dcfadb3
b2c34b4
 
 
dcfadb3
 
 
 
 
 
 
 
 
 
 
 
 
 
b970775
 
 
 
 
 
 
 
 
dcfadb3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
---
base_model:
- openai/gpt-oss-120b
- MultiverseComputingCAI/HyperNova-60B
library_name: transformers
license: apache-2.0
---
<div align="center">

# HyperNova 60B 2605

### Powered by CompactifAI

[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![HuggingFace](https://img.shields.io/badge/🤗-Model_Hub-yellow.svg)](https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2605)
[![Discord](https://img.shields.io/badge/Discord-Community-5865F2?logo=discord&logoColor=white)](https://discord.gg/cGas9uStqp)

**Optimized for Efficient Inference** · **Reduced Memory Footprint** · **Native Tool Calling Support**

</div>

---

## Table of Contents

- [Highlights](#highlights)
- [Model Overview](#model-overview)
- [Key Characteristics](#key-characteristics)
- [Quick Start](#quick-start)
- [What's New in HyperNova 60B 2605](#whats-new-in-hypernova-60b-2605)
- [Tool Calling](#tool-calling)
- [Training & Fine-Tuning](#training--fine-tuning)
- [Architecture](#architecture)
- [Evaluation & Benchmarks](#evaluation--benchmarks)
- [Languages](#languages)
- [Intended Use](#intended-use)
- [Safety & Limitations](#safety--limitations)
- [Model Information](#model-information)
- [Citation](#citation)

---

## Model Overview

**HyperNova 60B 2605**, developed by **Multiverse Computing**, is an open-weight model designed for powerful **general** reasoning, **coding**, and versatile developer use.

The model is **instruction-tuned** and supports **native tool calling** (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2605 is intended for code generation, RAG, and tool-augmented applications.

## Technical Deep Dive
For a detailed explanation of the compression architecture, model compression process, and benchmark results behind Hypernova-60B, read [this full technical article by Johanna Angulo, Evaluation Manager at Multiverse Computing.](https://multiversecomputing.com/papers/hypernova-60b-2602-same-intelligence-half-the-size-improved-tool-calling-capability)

---

## Key Characteristics

| Characteristic        | Description |
|-----------------------|-------------|
| 🛠️ **Tool calling**  | Native support; OpenAI-style function / tool calling schemas; suited to coding agents and structured outputs |
| 🧠 **Parameters**     | 60B total parameters |
| 📐 **Architecture**   | Decoder-only Transformer |
| Primary language      | English |
| Other languages       | Not formally evaluated |
---
## Quick Start
This model can be loaded with the **Transformers** API. Use `trust_remote_code=True` (required for the gpt-oss architecture). Recommended approach: `AutoModelForCausalLM` with `apply_chat_template`:
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "MultiverseComputingCAI/HyperNova-60B-2605"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
)
messages = [{"role": "user", "content": "What is a Hypernova?"}]
inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True,
)
inputs = inputs.to(model.device)
attention_mask = torch.ones_like(inputs, dtype=torch.long, device=inputs.device)
outputs = model.generate(
    inputs,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    attention_mask=attention_mask,
)
reply = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(reply)
```
Alternatively you can use the `pipeline` API with `trust_remote_code=True`; the pipeline returns the full conversation structure, so extract the assistant message from `outputs[0]["generated_text"]` as needed.

---

## What’s New in HyperNova 60B 2605

**HyperNova 60B 2605** is an improved version of **HyperNova 60B 2602**, with this release focused on **coding** and **general** capability backed by higher scores on several benchmarks.

### Summary

- **Improvement focus vs HyperNova 60B 2602:** stronger **coding** (coding-style tasks) and **general** benchmark performance.
- **Tool use:** Retains native support for function calling, structured outputs, and agent-style workflows (OpenAI-style schemas).
- **Reasoning:** Compatible with configurable reasoning effort (e.g. low / medium / high in system prompt) where the format is preserved; full chain-of-thought available for debugging and analysis.
- **Evaluated** on coding and tool-heavy benchmarks (e.g. Tau2-bench, Terminal-Bench) alongside **general** intelligence benchmarks.

---

## Tool Calling

HyperNova 60B 2605 supports **native tool use** and is well-suited for:

- **Function calling** with defined schemas  
- **Structured outputs**  
- **Coding-oriented tool workflows** (e.g. browser tasks, code execution where supported)

The model can detect when to invoke tools, emit structured JSON tool calls, and consume tool outputs to continue generation. Tool-calling behavior follows **OpenAI-style schemas**; compatibility refers to format and structure—exact parity with the base or other models is not guaranteed.
Compared with HyperNova 60B 2602, this release improves on **coding** and **general** evaluation tracks—including IFBench, Tau2-bench, Terminal Bench, and AA-LCR under the high-reasoning setup reported below.

### Example Tool Call

```json
{
  "name": "get_weather",
  "arguments": {
    "city": "Paris",
    "date": "2026-02-10"
  }
}
```

---

## Architecture

### Model Specifications

| Specification     | Value              |
|-------------------|--------------------|
| Total parameters  | 60B, 4.8B active MoE |

---

## Evaluation & Benchmarks

### Evaluation Methodology

Benchmark scores were obtained with the following setups. Methodology varies by benchmark family.

#### HLE, MMLU-Pro, AIME25, GPQA:d, LiveCodeBench

- **Evaluation framework**: [Nemo-skills](https://github.com/NVIDIA/NeMo-Skills)
- **Inference library**: vLLM 0.13.0  
- **Hardware**: 1× NVIDIA H200 Tensor Core GPU
- **Reasoning effort**: high  
- **Decoding**: temperature = 1.0, top_p = 1.0 
- **Batch size**: 64  

#### IFBench, AA-LCR, SciCode

- **Evaluation framework**: [Nemo-skills](https://github.com/NVIDIA/NeMo-Skills) 
- **Inference library**: vLLM 0.13.0  
- **Hardware**: 1× NVIDIA H200 Tensor Core GPU
- **Reasoning effort**: high  
- **Decoding**: temperature = 1.0,top_p = 1.0
- **Batch size**: 64
  
#### Tau2-bench (Telecom)

- **Evaluation framework**: [EvalScope](https://github.com/EvalScope/EvalScope) 1.4.1
- **Inference library**: vLLM 0.13.0  
- **Hardware**: 1× NVIDIA H200 Tensor Core GPU
- **Reasoning effort**: high (agent `extra_body.reasoning_effort`)  
- **Decoding (agent)**: temperature = 1.0, top_p = 1.0, min_tokens = 1  
- **Decoding (judge / user simulator)**: temperature = 0.7, timeout = 600  
- **Reproducibility**: subset telecom (default); max steps 100; repeats 3; tool-call parser openai (agent), hermes (judge)  

#### Terminal-Bench Hard (Artificial Analysis subset):

- **Evaluation framework**: laude-institute/harbor == 0.1.43
- **Inference library**: vLLM == 0.13.0
- **Hardware**: 1× NVIDIA H200 Tensor Core GPU
- **Reasoning effort**: high
- **Decoding**: temperature = 1.0, top_p = 1.0, max-model-len = 131072
- **Reproducibility**: subset from AA (https://artificialanalysis.ai/methodology/intelligence-benchmarking#terminal-bench-hard)
- **Agent**: terminus-2, max episodes 100; repeats 3;

#### Aider polyglot

- **Evaluation framework**: [Aider-AI/aider](https://github.com/Aider-AI/aider) 
- **Hardware**: 2× NVIDIA H200 Tensor Core GPU (host with Docker)
- **Dataset**: `polyglot-benchmark` (225 exercises across multiple languages)
- **Reasoning effort**: high (passed via `--reasoning-effort`)
- **Decoding**: temperature = 1.0, top_p = 1.0 (configurable via `generation_config` / `--read-model-settings` YAML)
- **Edit format**: `whole` (also supports `diff | udiff | diff-fenced | architect`)
- **Reproducibility**: leaderboard-aligned; `--tries=2` (repeats)


### Quantitative Results (Reported & Planned)


| Benchmark             | gpt-oss-120b                  | HyperNova 60B 2602          | HyperNova 60B 2605       |
|-----------------------|-------------------------------|-----------------------------|--------------------------|
| HLE                   | 18.50                         | 7.28                        | 14.97                    |
| MMLU-Pro              | 79.64                         | 74.25                       | 76.77                    |
| Tau2-bench (Telecom)  | 63.74                         | 60.53                       | 61.70                    |
| AIME25                | 93.67                         | 86.00                       | 90.00                    |
| GPQA:d                | 74.64                         | 65.56                       | 71.92                    |
| IFBench               | 67.01                         | 59.40                       | 66.57                    |
| SciCode               | 41.52                         | 33.53                       | 36.00                    |
| LiveCodeBench         | 62.75                         | 51.53                       | 68.68                    |
| Terminal Bench        | 24.24                         | 12.12                       | 15.91                    |
| AA-LCR                | 49.00                         | 35.67                       | 40.33                    |
| AIDER                 | 43.60                         | 26.2                        | 34.2                     |

![Benchmarks](assets/benchmarks.png)

![LiveCodeBench](assets/livecodebench.png)

### Quantitative Results (Inference Performance)

#### Metrics reported

- **System Output Throughput (higher is better)**: Mean output tokens per second across all concurrent requests over the benchmarking phase.
- **End-to-End Latency per Query (lower is better):** Median end-to-end response time for each query from the time the query is sent.
- **Output Speed per Query (higher is better):** Median output tokens per second after the first token is received for each query.
- **Time to first token (TTFT) (lower is better):** Median time to first token.
- **Estimated total memory — (lower is better):** Median from each GuideLLM phase (estimated total footprint: weights plus KV contribution from monitored usage).
- **Model weights (lower is better):** 

On the same hardware and harness, **HyperNova 60B 2605** is compared to **gpt-oss-120b** using GuideLLM. Each table lists **median** values for that model at each **concurrency phase** (1 → 256 concurrent requests).

| Metric | GPT-OSS-120B | Hypernova 60B 2605 |
|--------|-------------:|-------------------:|
| Concurrency | 128 | 128 |
| Throughput (tok/s) | 3,821 | 5,210 |
| E2E latency (s) | 24.05 | 14.74 |
| Output speed (tok/s) | 57.79 | 69.31 |
| TTFT (s) | 7.04 | 4.85 |
| Est. total memory (GB) | 123.55 | 38.83 |
| Model weights (GB) | 121.54 | 31.81 |



#### Performance evaluation conditions

Our performance evaluation follows the spirit of [Artificial Analysis](https://artificialanalysis.ai/methodology/system-load-test).

- **Inference library**: vLLM 0.13.0
- **Monitoring libraries**: GuideLLM, nvidia-ml-py
- **Hardware**: 1× NVIDIA H200 Tensor Core GPU
- **Conditions**: **concurrency phases** 1, 2, 4, 8, 16, 32, 64, 128, 192, and 256 concurrent requests (one GuideLLM phase each)
- **Phase duration**: Each phase lasts 3 minutes (excluding ramp-up and cool-down periods).
- **Workload shape:** input length is ~1000 tokens per query (median); median output length varies by phase and model.
- **Streaming**: Benchmarking is conducted with streaming enabled.

The figure below is a **side-by-side comparison at concurrency = 128 only** 

![Performance](assets/performance.png)

---

## Languages

- **Primary language**: English  
- **Other languages**: Not formally evaluated  

The model was trained primarily on English-language data. Performance on other languages may vary and has not been systematically measured.

---

## Intended Use

### Recommended Use Cases

- **Reasoning and analysis** (with configurable reasoning effort where supported)  
- **Tool-augmented applications**, with emphasis on **coding** and **general** assistant use (function calling, web browsing, code execution, structured outputs)  
- **Code generation and reasoning**  
- **Chatbots and virtual assistants**  
- **Retrieval-augmented generation (RAG)**  

### Out-of-Scope Uses

- Harmful, illegal, or deceptive content generation  
- Impersonation of real individuals without consent  
- High-risk decision-making without human oversight  
- Surveillance or tracking of individuals  
- Any use that violates applicable laws or regulations  

---

## Safety & Limitations

### Known Limitations

- **English-centric** training data.  
- **Format:** For best results, use the same [harmony response format](https://huggingface.co/openai/gpt-oss-120b) as gpt-oss-120b where applicable; behavior may differ otherwise.  
- **Tool calling** depends on correct schema and tool design; exact parity with gpt-oss-120b or other models is not guaranteed.  

### Recommendations

- Validate tool outputs before execution  
- Use human oversight for critical applications  
- Perform task-specific evaluation prior to deployment  

---

## Model Information

| Field         | Value               |
|--------------|--------------------- |
| Model name   | HyperNova 60B 2605   |
| Version      | 2605                 |
| Release date | 26/02/2026           |
| Developed by | Multiverse Computing |
| License      | Apache 2.0           |
| Contact      | business@multiversecomputing.com   |

---

## Citation

If you use this model, please cite the base model and this variant:

```bibtex
@misc{openai2025gptoss120b,
  title         = {gpt-oss-120b \& gpt-oss-20b Model Card},
  author        = {OpenAI},
  year          = {2025},
  eprint        = {2508.10925},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2508.10925}
}
@misc{hypernova60b2605,
  title = {HyperNova 60B 2605: Model developed based on gpt-oss-120b},
  author = {Multiverse Computing},
  year = {2026},
  url = {https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2605},
  note = {Model developed based on openai/gpt-oss-120b using CompactifAI technology}
}
```

**Built by [Multiverse Computing](https://www.multiversecomputing.com)** · [Report an issue](https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2605/discussions) · [Discord](https://discord.gg/8mT9FveN)