File size: 15,242 Bytes
41107ef
 
 
3a30fda
41107ef
8370f01
41107ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7a62f6d
41107ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
defa5fb
41107ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
---
base_model:
- openai/gpt-oss-120b
- MultiverseComputingCAI/HyperNova-60B
library_name: transformers
license: apache-2.0
---
<div align="center">

# HyperNova 60B 2602

### Powered by CompactifAI

[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![HuggingFace](https://img.shields.io/badge/🤗-Model_Hub-yellow.svg)](https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2602)
[![Discord](https://img.shields.io/badge/Discord-Community-5865F2?logo=discord&logoColor=white)](https://discord.gg/8mT9FveN)

**Optimized for Efficient Inference** · **Reduced Memory Footprint** · **Native Tool Calling Support**

</div>

---

## Table of Contents

- [Highlights](#highlights)
- [Model Overview](#model-overview)
- [Key Characteristics](#key-characteristics)
- [Quick Start](#quick-start)
- [What's New in HyperNova 60B 2602](#whats-new-in-hypernova-60b-2602)
- [Tool Calling](#tool-calling)
- [Training & Fine-Tuning](#training--fine-tuning)
- [Architecture](#architecture)
- [Evaluation & Benchmarks](#evaluation--benchmarks)
- [Languages](#languages)
- [Intended Use](#intended-use)
- [Safety & Limitations](#safety--limitations)
- [Model Information](#model-information)
- [Citation](#citation)

---

## Model Overview

**HyperNova 60B 2602** is a **model developed based on [OpenAI’s gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b)**, developed by **Multiverse Computing**. The original gpt-oss-120b is an open-weight model (117B parameters, 5.1B active in MoE) designed for powerful reasoning, agentic tasks, and versatile developer use. This version is compressed with **CompactifAI**, Multiverse Computing’s proprietary technology, reducing parameter count and memory requirements while aiming to preserve strong reasoning. 

The model is **instruction-tuned** and supports **native tool calling** (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2602 is intended for the same broad use cases as gpt-oss-120b—reasoning, code generation, RAG, and tool-augmented applications—with **lower memory footprint** and deployment flexibility.

---

## Key Characteristics

| Characteristic        | Description |
|-----------------------|-------------|
| Base model            | [OpenAI gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (117B params, MoE; open-weight, Apache 2.0) |
| 🛠️ **Tool calling**  | Native support; OpenAI-style function / tool calling schemas; agentic use (e.g. function calling, structured outputs) |
| 🧠 **Parameters**     | 60B total parameters after CompactifAI compression (reduced vs. base 117B) |
| 📐 **Architecture**   | Decoder-only Transformer (from gpt-oss lineage) |
| 🗜️ **Compression**   | CompactifAI (proprietary compression technology) |
| Primary language      | English |
| Other languages       | Not formally evaluated |
---
## Quick Start
This model can be loaded with the **Transformers** API. Use `trust_remote_code=True` (required for the gpt-oss architecture). Recommended approach: `AutoModelForCausalLM` with `apply_chat_template`:
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "MultiverseComputingCAI/HyperNova-60B-2602"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
)
messages = [{"role": "user", "content": "What is a Hypernova?"}]
inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True,
)
inputs = inputs.to(model.device)
attention_mask = torch.ones_like(inputs, dtype=torch.long, device=inputs.device)
outputs = model.generate(
    inputs,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    attention_mask=attention_mask,
)
reply = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(reply)
```
Alternatively you can use the `pipeline` API with `trust_remote_code=True`; the pipeline returns the full conversation structure, so extract the assistant message from `outputs[0]["generated_text"]` as needed.

---

## What’s New in HyperNova 60B 2602

**HyperNova 60B 2602** is a model developed based on **gpt-oss-120b**, retaining the base model’s strengths while reducing memory and improving deployment flexibility.

### Summary

- **Model developed based on [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b):** Same Apache 2.0 license and design goals (reasoning, agentic tasks, tool use); smaller footprint via CompactifAI.
- **Tool use:** Retains support for function calling, structured outputs, and agent-style workflows (OpenAI-style schemas).
- **Reasoning:** Compatible with configurable reasoning effort (e.g. low / medium / high in system prompt) where the format is preserved; full chain-of-thought available for debugging and analysis.
- **Evaluated** on tool-focused benchmarks (e.g. BFCL v4, Tau2-bench) and general benchmarks alongside other CompactifAI and gpt-oss variants.

---

## Tool Calling

HyperNova 60B 2602 supports **native tool use** and is well-suited for:

- **Function calling** with defined schemas  
- **Structured outputs**  
- **Agentic operations** (e.g. browser tasks, code execution where supported)

The model can detect when to invoke tools, emit structured JSON tool calls, and consume tool outputs to continue generation. Tool-calling behavior follows **OpenAI-style schemas**; compatibility refers to format and structure—exact parity with the base or other models is not guaranteed.

### Example Tool Call

```json
{
  "name": "get_weather",
  "arguments": {
    "city": "Paris",
    "date": "2026-02-10"
  }
}
```

---

## Training & Fine-Tuning

### Base Model: gpt-oss-120b

The base model [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) was trained on OpenAI’s **harmony response format** and is intended for use with that format for correct behavior. It supports configurable reasoning levels (low / medium / high) and native tool use. See the [original model card](https://huggingface.co/openai/gpt-oss-120b) and [arXiv:2508.10925](https://arxiv.org/abs/2508.10925) for details.

### CompactifAI Compression & Optional Fine-Tuning

- **Compression:** CompactifAI was applied to produce a smaller, efficient model (60B parameters) while aiming to preserve reasoning and tool-use capabilities.
- **Optional fine-tuning:** This variant may include additional fine-tuning for tool calling and structured outputs; exact training details are model-specific.

---

## Architecture

### Model Specifications

| Specification     | Value              |
|-------------------|--------------------|
| Base model        | [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (117B params, 5.1B active MoE) |
| Total parameters  | 60B, 4.8B active MoE |

---

## Evaluation & Benchmarks

### Evaluation Methodology

Benchmark scores were obtained with the following setups. Methodology varies by benchmark family.

#### MMLU-Pro, AIME25, GPQA:d, LiveCodeBench

- **Evaluation framework**: [Lighteval](https://github.com/huggingface/lighteval) 
- **Inference library**: vLLM 0.14.0  
- **Reasoning effort**: medium  
- **Decoding**: temperature = 0.6, max_tokens = 131072, top_p = 1.0, top_k = 0  
- **Batch size**: 64  

#### IFBench, AA-LCR, SciCode

- **Evaluation framework**: [Nemo-skills](https://github.com/NVIDIA/NeMo-Skills) 
- **Inference library**: vLLM 0.14.0  
- **Reasoning effort**: medium  
- **Decoding**: temperature = 1.0, max_tokens = 131072, top_p = 1.0, top_k = 0  
- **Batch size**: 64
  
#### BFCL v4 (17 splits)

- **Evaluation framework**: [EvalScope](https://github.com/EvalScope/EvalScope) 1.4.1
- **Inference library**: vLLM 0.14.0  
- **Reasoning effort**: high  
- **Decoding**: temperature = 0.6, max_tokens = 16384, parallel_tool_calls = true, tool-call parser openai

#### Tau2-bench (Telecom)

- **Evaluation framework**: [EvalScope](https://github.com/EvalScope/EvalScope) 1.4.1
- **Inference library**: vLLM 0.14.0  
- **Reasoning effort**: high (agent `extra_body.reasoning_effort`)  
- **Decoding (agent)**: temperature = 1.0, top_p = 1.0, min_tokens = 1  
- **Decoding (judge / user simulator)**: temperature = 0.7, timeout = 600  
- **Reproducibility**: subset telecom (default); max steps 100; repeats 3; tool-call parser openai (agent), hermes (judge)  

#### Terminal-Bench Hard (Artificial Analysis subset):

- **Evaluation framework**: laude-institute/harbor == 0.1.43
- **Inference library**: vLLM == 0.15.0
- **Reasoning effort**: high
- **Decoding**: temperature = 1.0, top_p = 1.0, max-model-len = 131072
- **Reproducibility**: subset from AA (https://artificialanalysis.ai/methodology/intelligence-benchmarking#terminal-bench-hard)
- **Agent**: terminus-2, max episodes 100; repeats 3;

### Quantitative Results (Reported & Planned)

Scores are accuracy or benchmark-specific metrics. Use `—` or *TBD* for evaluations not yet run. Reported numbers use the methodology described above (reasoning: cai-eval + Nemo-skills; BFCL v4 and Tau2-bench: cai-eval + EvalScope); other entries to be documented.

| Benchmark             | gpt-oss-20b           | gpt-oss-120b           | HyperNova 60B 2602       |
|-----------------------|-----------------------|------------------------|--------------------------|
| MMLU-Pro              | 74                    | 78                     | 74                       |
| BFCL v4               | 61                    | 64                     | 62                       |
| Tau2-bench (Telecom)  | 59                    | 68                     | 61                       |
| AIME25                | 72                    | 80                     | 76                       |
| GPQA:d                | 63                    | 69                     | 69                       |
| IFBench               | 55                    | 63                     | 60                       |
| SciCode               | 34                    | 38                     | 32                       |
| LiveCodeBench         | 64                    | 66                     | 64                       |
| Terminal Bench        | 9                     | 22                     | 16                       |
| AA-LCR                | 37                    | 50                     | 36                       |
| AA-Omnis. Index       | -40                   | -36                    | -41                      |
| AA-Omnis. Accuracy    | 16                    | 21                     | 15                       |

![Intelligence](assets/intelligence.png)
![Tool-calling](assets/tool-calling.png)

### Quantitative Results (Inference Performance)

Representative throughput and memory under the evaluation setup above. Comparison against **gpt-oss-20b** and **gpt-oss-120b** on the same hardware.

#### Performance evaluation conditions

Describe the setup used to obtain the numbers in the table below (replace the placeholders or add a short paragraph):

- **Inference library**: vLLM 0.14.0
- **Hardware**: 4× NVIDIA H200 Tensor Core GPU
- **Conditions**: batch size=512, context length=512, decode length=256
- **Notes**: dtype=default

| Metric                     | gpt-oss-20b              | gpt-oss-120b             | HyperNova 60B 2602       | Hardware                      |
|----------------------------|--------------------------|--------------------------|--------------------------|-------------------------------|
| Tokens / second (decode)   | 250                      | 228                      | 240                      | 4× NVIDIA H200 Tensor Core GPU|
| Time to first token (ms)   | 26                       | 26                       | 25                       | 4× NVIDIA H200 Tensor Core GPU|
| Peak GPU memory (GB)       | 13                       | 61                       | 32                       | 4× NVIDIA H200 Tensor Core GPU|

![Performance](assets/performance.png)

---

## Languages

- **Primary language**: English  
- **Other languages**: Not formally evaluated  

The model was trained primarily on English-language data. Performance on other languages may vary and has not been systematically measured.

---

## Intended Use

### Recommended Use Cases

Aligned with [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) use cases, with the benefit of a smaller footprint:

- **Reasoning and analysis** (with configurable reasoning effort where supported)  
- **Tool-augmented and agentic applications** (function calling, web browsing, code execution, structured outputs)  
- **Code generation and reasoning**  
- **Chatbots and virtual assistants**  
- **Retrieval-augmented generation (RAG)**  
- **Deployments** where gpt-oss-120b is desirable but memory or latency is constrained  

### Out-of-Scope Uses

- Harmful, illegal, or deceptive content generation  
- Impersonation of real individuals without consent  
- High-risk decision-making without human oversight  
- Surveillance or tracking of individuals  
- Any use that violates applicable laws or regulations  

---

## Safety & Limitations

### Known Limitations

- **English-centric** training data (inherited from base model).  
- **Format:** For best results, use the same [harmony response format](https://huggingface.co/openai/gpt-oss-120b) as gpt-oss-120b where applicable; behavior may differ otherwise.  
- **Tool calling** depends on correct schema and tool design; exact parity with gpt-oss-120b or other models is not guaranteed.  
- **Compression** may affect some behaviors; evaluate for your use case.  

### Recommendations

- Validate tool outputs before execution  
- Use human oversight for critical applications  
- Perform task-specific evaluation prior to deployment  

---

## Model Information

| Field         | Value               |
|--------------|--------------------- |
| Model name   | HyperNova 60B 2602   |
| Based on     | [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) |
| Version      | 2602                 |
| Release date | 26/02/2026           |
| Developed by | Multiverse Computing |
| License      | Apache 2.0           |
| Contact      | business@multiversecomputing.com   |

---

## Citation

If you use this model, please cite the base model and this variant:

```bibtex
@misc{openai2025gptoss120b,
  title         = {gpt-oss-120b \& gpt-oss-20b Model Card},
  author        = {OpenAI},
  year          = {2025},
  eprint        = {2508.10925},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2508.10925}
}
@misc{hypernova60b2602,
  title = {HyperNova 60B 2602: Model developed based on gpt-oss-120b},
  author = {Multiverse Computing},
  year = {2026},
  url = {https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2602},
  note = {Model developed based on openai/gpt-oss-120b using CompactifAI technology}
}
```

**Built by [Multiverse Computing](https://www.multiversecomputing.com)** · [Report an issue](https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2602/discussions) · [Discord](https://discord.gg/8mT9FveN)