daniicruzz commited on
Commit
dcfadb3
·
verified ·
1 Parent(s): ef40d10

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +353 -3
README.md CHANGED
@@ -1,3 +1,353 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - openai/gpt-oss-120b
4
+ - MultiverseComputingCAI/HyperNova-60B
5
+ library_name: transformers
6
+ license: apache-2.0
7
+ ---
8
+ <div align="center">
9
+
10
+ # HyperNova 60B 2605
11
+
12
+ ### Powered by CompactifAI
13
+
14
+ [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
15
+ [![HuggingFace](https://img.shields.io/badge/🤗-Model_Hub-yellow.svg)](https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2605)
16
+ [![Discord](https://img.shields.io/badge/Discord-Community-5865F2?logo=discord&logoColor=white)](https://discord.gg/cGas9uStqp)
17
+
18
+ **Optimized for Efficient Inference** · **Reduced Memory Footprint** · **Native Tool Calling Support**
19
+
20
+ </div>
21
+
22
+ ---
23
+
24
+ ## Table of Contents
25
+
26
+ - [Highlights](#highlights)
27
+ - [Model Overview](#model-overview)
28
+ - [Key Characteristics](#key-characteristics)
29
+ - [Quick Start](#quick-start)
30
+ - [What's New in HyperNova 60B 2605](#whats-new-in-hypernova-60b-2605)
31
+ - [Tool Calling](#tool-calling)
32
+ - [Training & Fine-Tuning](#training--fine-tuning)
33
+ - [Architecture](#architecture)
34
+ - [Evaluation & Benchmarks](#evaluation--benchmarks)
35
+ - [Languages](#languages)
36
+ - [Intended Use](#intended-use)
37
+ - [Safety & Limitations](#safety--limitations)
38
+ - [Model Information](#model-information)
39
+ - [Citation](#citation)
40
+
41
+ ---
42
+
43
+ ## Model Overview
44
+
45
+ **HyperNova 60B 2605**, developed by **Multiverse Computing**, is an open-weight model designed for powerful **general** reasoning, **coding**, and versatile developer use.
46
+
47
+ The model is **instruction-tuned** and supports **native tool calling** (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2605 is intended for code generation, RAG, and tool-augmented applications.
48
+
49
+ ## Technical Deep Dive
50
+ For a detailed explanation of the compression architecture, model compression process, and benchmark results behind Hypernova-60B, read [this full technical article by Johanna Angulo, Evaluation Manager at Multiverse Computing.](https://multiversecomputing.com/papers/hypernova-60b-2605-same-intelligence-half-the-size-improved-tool-calling-capability)
51
+
52
+ ---
53
+
54
+ ## Key Characteristics
55
+
56
+ | Characteristic | Description |
57
+ |-----------------------|-------------|
58
+ | 🛠️ **Tool calling** | Native support; OpenAI-style function / tool calling schemas; suited to coding agents and structured outputs |
59
+ | 🧠 **Parameters** | 60B total parameters |
60
+ | 📐 **Architecture** | Decoder-only Transformer |
61
+ | Primary language | English |
62
+ | Other languages | Not formally evaluated |
63
+ ---
64
+ ## Quick Start
65
+ This model can be loaded with the **Transformers** API. Use `trust_remote_code=True` (required for the gpt-oss architecture). Recommended approach: `AutoModelForCausalLM` with `apply_chat_template`:
66
+ ```python
67
+ import torch
68
+ from transformers import AutoModelForCausalLM, AutoTokenizer
69
+ model_id = "MultiverseComputingCAI/HyperNova-60B-2605"
70
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
71
+ model = AutoModelForCausalLM.from_pretrained(
72
+ model_id,
73
+ device_map="auto",
74
+ torch_dtype="auto",
75
+ trust_remote_code=True,
76
+ )
77
+ messages = [{"role": "user", "content": "What is a Hypernova?"}]
78
+ inputs = tokenizer.apply_chat_template(
79
+ messages,
80
+ return_tensors="pt",
81
+ add_generation_prompt=True,
82
+ )
83
+ inputs = inputs.to(model.device)
84
+ attention_mask = torch.ones_like(inputs, dtype=torch.long, device=inputs.device)
85
+ outputs = model.generate(
86
+ inputs,
87
+ max_new_tokens=512,
88
+ do_sample=True,
89
+ temperature=0.7,
90
+ attention_mask=attention_mask,
91
+ )
92
+ reply = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
93
+ print(reply)
94
+ ```
95
+ Alternatively you can use the `pipeline` API with `trust_remote_code=True`; the pipeline returns the full conversation structure, so extract the assistant message from `outputs[0]["generated_text"]` as needed.
96
+
97
+ ---
98
+
99
+ ## What’s New in HyperNova 60B 2605
100
+
101
+ **HyperNova 60B 2605** is an improved version of **HyperNova 60B 2602**, with this release focused on **coding** and **general** capability backed by higher scores on several benchmarks.
102
+
103
+ ### Summary
104
+
105
+ - **Improvement focus vs HyperNova 60B 2602:** stronger **coding** (coding-style tasks) and **general** benchmark performance.
106
+ - **Tool use:** Retains native support for function calling, structured outputs, and agent-style workflows (OpenAI-style schemas).
107
+ - **Reasoning:** Compatible with configurable reasoning effort (e.g. low / medium / high in system prompt) where the format is preserved; full chain-of-thought available for debugging and analysis.
108
+ - **Evaluated** on coding and tool-heavy benchmarks (e.g. Tau2-bench, Terminal-Bench) alongside **general** intelligence benchmarks.
109
+
110
+ ---
111
+
112
+ ## Tool Calling
113
+
114
+ HyperNova 60B 2605 supports **native tool use** and is well-suited for:
115
+
116
+ - **Function calling** with defined schemas
117
+ - **Structured outputs**
118
+ - **Coding-oriented tool workflows** (e.g. browser tasks, code execution where supported)
119
+
120
+ The model can detect when to invoke tools, emit structured JSON tool calls, and consume tool outputs to continue generation. Tool-calling behavior follows **OpenAI-style schemas**; compatibility refers to format and structure—exact parity with the base or other models is not guaranteed.
121
+ Compared with HyperNova 60B 2602, this release improves on **coding** and **general** evaluation tracks—including IFBench, Tau2-bench, Terminal Bench, and AA-LCR under the high-reasoning setup reported below.
122
+
123
+ ### Example Tool Call
124
+
125
+ ```json
126
+ {
127
+ "name": "get_weather",
128
+ "arguments": {
129
+ "city": "Paris",
130
+ "date": "2026-02-10"
131
+ }
132
+ }
133
+ ```
134
+
135
+ ---
136
+
137
+ ## Architecture
138
+
139
+ ### Model Specifications
140
+
141
+ | Specification | Value |
142
+ |-------------------|--------------------|
143
+ | Total parameters | 60B, 4.8B active MoE |
144
+
145
+ ---
146
+
147
+ ## Evaluation & Benchmarks
148
+
149
+ ### Evaluation Methodology
150
+
151
+ Benchmark scores were obtained with the following setups. Methodology varies by benchmark family.
152
+
153
+ #### HLE, MMLU-Pro, AIME25, GPQA:d, LiveCodeBench
154
+
155
+ - **Evaluation framework**: [Nemo-skills](https://github.com/NVIDIA/NeMo-Skills)
156
+ - **Inference library**: vLLM 0.13.0
157
+ - **Hardware**: 1× NVIDIA H200 Tensor Core GPU
158
+ - **Reasoning effort**: high
159
+ - **Decoding**: temperature = 0.6, max_tokens = 131072, top_p = 1.0, top_k = 0
160
+ - **Batch size**: 64
161
+
162
+ #### IFBench, AA-LCR, SciCode
163
+
164
+ - **Evaluation framework**: [Nemo-skills](https://github.com/NVIDIA/NeMo-Skills)
165
+ - **Inference library**: vLLM 0.13.0
166
+ - **Hardware**: 1× NVIDIA H200 Tensor Core GPU
167
+ - **Reasoning effort**: high
168
+ - **Decoding**: temperature = 1.0, max_tokens = 131072, top_p = 1.0, top_k = 0
169
+ - **Batch size**: 64
170
+
171
+ #### Tau2-bench (Telecom)
172
+
173
+ - **Evaluation framework**: [EvalScope](https://github.com/EvalScope/EvalScope) 1.4.1
174
+ - **Inference library**: vLLM 0.13.0
175
+ - **Hardware**: 1× NVIDIA H200 Tensor Core GPU
176
+ - **Reasoning effort**: high (agent `extra_body.reasoning_effort`)
177
+ - **Decoding (agent)**: temperature = 1.0, top_p = 1.0, min_tokens = 1
178
+ - **Decoding (judge / user simulator)**: temperature = 0.7, timeout = 600
179
+ - **Reproducibility**: subset telecom (default); max steps 100; repeats 3; tool-call parser openai (agent), hermes (judge)
180
+
181
+ #### Terminal-Bench Hard (Artificial Analysis subset):
182
+
183
+ - **Evaluation framework**: laude-institute/harbor == 0.1.43
184
+ - **Inference library**: vLLM == 0.13.0
185
+ - **Hardware**: 1× NVIDIA H200 Tensor Core GPU
186
+ - **Reasoning effort**: high
187
+ - **Decoding**: temperature = 1.0, top_p = 1.0, max-model-len = 131072
188
+ - **Reproducibility**: subset from AA (https://artificialanalysis.ai/methodology/intelligence-benchmarking#terminal-bench-hard)
189
+ - **Agent**: terminus-2, max episodes 100; repeats 3;
190
+
191
+ ### Quantitative Results (Reported & Planned)
192
+
193
+
194
+ | Benchmark | gpt-oss-120b | HyperNova 60B 2602 | HyperNova 60B 2605 |
195
+ |-----------------------|-------------------------------|-----------------------------|--------------------------|
196
+ | HLE | 18.50 | 7.28 | 14.97 |
197
+ | MMLU-Pro | 79.64 | 74.25 | 76.77 |
198
+ | Tau2-bench (Telecom) | 63.74 | 60.53 | 61.70 |
199
+ | AIME25 | 93.67 | 86.00 | 90.00 |
200
+ | GPQA:d | 74.64 | 65.56 | 71.92 |
201
+ | IFBench | 67.01 | 59.40 | 66.57 |
202
+ | SciCode | 41.52 | 33.53 | 36.00 |
203
+ | LiveCodeBench | 62.75 | 51.53 | 68.68 |
204
+ | Terminal Bench | 24.24 | 12.12 | 15.91 |
205
+ | AA-LCR | 49.00 | 35.67 | 40.33 |
206
+
207
+ ![Intelligence](assets/intelligence.png)
208
+
209
+ ### Quantitative Results (Inference Performance)
210
+
211
+ #### Metrics reported
212
+
213
+ - **System Output Throughput (higher is better)**: Mean output tokens per second across all concurrent requests over the benchmarking phase.
214
+ - **End-to-End Latency per Query (lower is better):** Median end-to-end response time for each query from the time the query is sent.
215
+ - **Output Speed per Query (higher is better):** Median output tokens per second after the first token is received for each query.
216
+ - **Time to first token (TTFT) (lower is better):** Median time to first token.
217
+ - **Estimated total memory — (lower is better):** Median from each GuideLLM phase (estimated total footprint: weights plus KV contribution from monitored usage).
218
+ - **Model weights (lower is better):**
219
+
220
+ On the same hardware and harness, **HyperNova 60B 2605** is compared to **gpt-oss-120b** using GuideLLM. Each table lists **median** values for that model at each **concurrency phase** (1 → 256 concurrent requests).
221
+
222
+ **gpt-oss-120b**
223
+
224
+ | Concurrency | Throughput (tok/s) | E2E latency (s) | Output speed (tok/s) | TTFT (s) | Est. total memory (GB) | Model weights (GB) |
225
+ |------------:|-------------------:|----------------:|-----------------------:|---------:|------------------------:|-------------------:|
226
+ | 1 | 173 | 3.02 | 387.1 | 1.51 | 62.0 | 61.6 |
227
+ | 2 | 292 | 3.89 | 372.1 | 1.78 | 62.4 | 61.6 |
228
+ | 4 | 453 | 5.26 | 208.0 | 2.23 | 63.2 | 61.6 |
229
+ | 8 | 643 | 6.47 | 181.7 | 3.02 | 64.8 | 61.6 |
230
+ | 16 | 897 | 11.21 | 102.5 | 4.28 | 68.1 | 61.6 |
231
+ | 32 | 1114 | 15.51 | 75.1 | 6.25 | 74.6 | 61.6 |
232
+ | 64 | 1404 | 24.32 | 52.1 | 10.17 | 87.6 | 61.6 |
233
+ | 128 | 1828 | 42.99 | 28.3 | 18.23 | 114.0 | 61.6 |
234
+ | 192 | 1818 | 61.47 | 29.8 | 38.43 | 113.9 | 61.6 |
235
+ | 256 | 1842 | 81.04 | 29.5 | 57.45 | 114.0 | 61.6 |
236
+
237
+ **HyperNova 60B 2605**
238
+
239
+ | Concurrency | Throughput (tok/s) | E2E latency (s) | Output speed (tok/s) | TTFT (s) | Est. total memory (GB) | Model weights (GB) |
240
+ |------------:|-------------------:|----------------:|-----------------------:|---------:|------------------------:|-------------------:|
241
+ | 1 | 179 | 2.12 | 336.3 | 1.20 | 32.1 | 31.8 |
242
+ | 2 | 304 | 2.21 | 457.9 | 1.44 | 32.4 | 31.8 |
243
+ | 4 | 487 | 2.91 | 305.8 | 1.76 | 33.0 | 31.8 |
244
+ | 8 | 740 | 3.84 | 207.8 | 2.31 | 34.1 | 31.8 |
245
+ | 16 | 982 | 5.74 | 142.0 | 3.37 | 36.5 | 31.8 |
246
+ | 32 | 1233 | 8.46 | 101.7 | 5.25 | 41.1 | 31.8 |
247
+ | 64 | 1482 | 14.14 | 54.2 | 8.60 | 50.4 | 31.8 |
248
+ | 128 | 1923 | 25.03 | 32.0 | 15.09 | 69.0 | 31.8 |
249
+ | 192 | 1808 | 37.88 | 24.5 | 23.93 | 87.6 | 31.8 |
250
+ | 256 | 1716 | 52.16 | 18.8 | 31.89 | 106.5 | 31.8 |
251
+
252
+
253
+
254
+ #### Performance evaluation conditions
255
+
256
+ Our performance evaluation follows the spirit of [Artificial Analysis](https://artificialanalysis.ai/methodology/system-load-test).
257
+
258
+ - **Inference library**: vLLM 0.13.0
259
+ - **Monitoring libraries**: GuideLLM, nvidia-ml-py
260
+ - **Hardware**: 1× NVIDIA H200 Tensor Core GPU
261
+ - **Conditions**: **concurrency phases** 1, 2, 4, 8, 16, 32, 64, 128, 192, and 256 concurrent requests (one GuideLLM phase each)
262
+ - **Phase duration**: Each phase lasts 3 minutes (excluding ramp-up and cool-down periods).
263
+ - **Workload shape:** input length is ~1000 tokens per query (median); median output length varies by phase and model.
264
+ - **Streaming**: Benchmarking is conducted with streaming enabled.
265
+
266
+ The figure below is a **side-by-side comparison at concurrency = 128 only**
267
+
268
+ ![Performance](assets/performance.png)
269
+
270
+ ---
271
+
272
+ ## Languages
273
+
274
+ - **Primary language**: English
275
+ - **Other languages**: Not formally evaluated
276
+
277
+ The model was trained primarily on English-language data. Performance on other languages may vary and has not been systematically measured.
278
+
279
+ ---
280
+
281
+ ## Intended Use
282
+
283
+ ### Recommended Use Cases
284
+
285
+ - **Reasoning and analysis** (with configurable reasoning effort where supported)
286
+ - **Tool-augmented applications**, with emphasis on **coding** and **general** assistant use (function calling, web browsing, code execution, structured outputs)
287
+ - **Code generation and reasoning**
288
+ - **Chatbots and virtual assistants**
289
+ - **Retrieval-augmented generation (RAG)**
290
+
291
+ ### Out-of-Scope Uses
292
+
293
+ - Harmful, illegal, or deceptive content generation
294
+ - Impersonation of real individuals without consent
295
+ - High-risk decision-making without human oversight
296
+ - Surveillance or tracking of individuals
297
+ - Any use that violates applicable laws or regulations
298
+
299
+ ---
300
+
301
+ ## Safety & Limitations
302
+
303
+ ### Known Limitations
304
+
305
+ - **English-centric** training data.
306
+ - **Format:** For best results, use the same [harmony response format](https://huggingface.co/openai/gpt-oss-120b) as gpt-oss-120b where applicable; behavior may differ otherwise.
307
+ - **Tool calling** depends on correct schema and tool design; exact parity with gpt-oss-120b or other models is not guaranteed.
308
+
309
+ ### Recommendations
310
+
311
+ - Validate tool outputs before execution
312
+ - Use human oversight for critical applications
313
+ - Perform task-specific evaluation prior to deployment
314
+
315
+ ---
316
+
317
+ ## Model Information
318
+
319
+ | Field | Value |
320
+ |--------------|--------------------- |
321
+ | Model name | HyperNova 60B 2605 |
322
+ | Version | 2605 |
323
+ | Release date | 26/02/2026 |
324
+ | Developed by | Multiverse Computing |
325
+ | License | Apache 2.0 |
326
+ | Contact | business@multiversecomputing.com |
327
+
328
+ ---
329
+
330
+ ## Citation
331
+
332
+ If you use this model, please cite the base model and this variant:
333
+
334
+ ```bibtex
335
+ @misc{openai2025gptoss120b,
336
+ title = {gpt-oss-120b \& gpt-oss-20b Model Card},
337
+ author = {OpenAI},
338
+ year = {2025},
339
+ eprint = {2508.10925},
340
+ archivePrefix = {arXiv},
341
+ primaryClass = {cs.CL},
342
+ url = {https://arxiv.org/abs/2508.10925}
343
+ }
344
+ @misc{hypernova60b2605,
345
+ title = {HyperNova 60B 2605: Model developed based on gpt-oss-120b},
346
+ author = {Multiverse Computing},
347
+ year = {2026},
348
+ url = {https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2605},
349
+ note = {Model developed based on openai/gpt-oss-120b using CompactifAI technology}
350
+ }
351
+ ```
352
+
353
+ **Built by [Multiverse Computing](https://www.multiversecomputing.com)** · [Report an issue](https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2605/discussions) · [Discord](https://discord.gg/8mT9FveN)