| --- |
| base_model: |
| - openai/gpt-oss-120b |
| - MultiverseComputingCAI/HyperNova-60B |
| library_name: transformers |
| license: apache-2.0 |
| --- |
| <div align="center"> |
|
|
| # HyperNova 60B 2605 |
|
|
| ### Powered by CompactifAI |
|
|
| [](https://opensource.org/licenses/Apache-2.0) |
| [](https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2605) |
| [](https://discord.gg/cGas9uStqp) |
|
|
| **Optimized for Efficient Inference** · **Reduced Memory Footprint** · **Native Tool Calling Support** |
|
|
| </div> |
|
|
| --- |
|
|
| ## Table of Contents |
|
|
| - [Highlights](#highlights) |
| - [Model Overview](#model-overview) |
| - [Key Characteristics](#key-characteristics) |
| - [Quick Start](#quick-start) |
| - [What's New in HyperNova 60B 2605](#whats-new-in-hypernova-60b-2605) |
| - [Tool Calling](#tool-calling) |
| - [Training & Fine-Tuning](#training--fine-tuning) |
| - [Architecture](#architecture) |
| - [Evaluation & Benchmarks](#evaluation--benchmarks) |
| - [Languages](#languages) |
| - [Intended Use](#intended-use) |
| - [Safety & Limitations](#safety--limitations) |
| - [Model Information](#model-information) |
| - [Citation](#citation) |
|
|
| --- |
|
|
| ## Model Overview |
|
|
| **HyperNova 60B 2605**, developed by **Multiverse Computing**, is an open-weight model designed for powerful **general** reasoning, **coding**, and versatile developer use. |
|
|
| The model is **instruction-tuned** and supports **native tool calling** (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2605 is intended for code generation, RAG, and tool-augmented applications. |
|
|
| ## Technical Deep Dive |
| For a detailed explanation of the compression architecture, model compression process, and benchmark results behind Hypernova-60B, read [this full technical article by Johanna Angulo, Evaluation Manager at Multiverse Computing.](https://multiversecomputing.com/papers/hypernova-60b-2602-same-intelligence-half-the-size-improved-tool-calling-capability) |
|
|
| --- |
|
|
| ## Key Characteristics |
|
|
| | Characteristic | Description | |
| |-----------------------|-------------| |
| | 🛠️ **Tool calling** | Native support; OpenAI-style function / tool calling schemas; suited to coding agents and structured outputs | |
| | 🧠 **Parameters** | 60B total parameters | |
| | 📐 **Architecture** | Decoder-only Transformer | |
| | Primary language | English | |
| | Other languages | Not formally evaluated | |
| --- |
| ## Quick Start |
| This model can be loaded with the **Transformers** API. Use `trust_remote_code=True` (required for the gpt-oss architecture). Recommended approach: `AutoModelForCausalLM` with `apply_chat_template`: |
| ```python |
| import torch |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| model_id = "MultiverseComputingCAI/HyperNova-60B-2605" |
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_id, |
| device_map="auto", |
| torch_dtype="auto", |
| trust_remote_code=True, |
| ) |
| messages = [{"role": "user", "content": "What is a Hypernova?"}] |
| inputs = tokenizer.apply_chat_template( |
| messages, |
| return_tensors="pt", |
| add_generation_prompt=True, |
| ) |
| inputs = inputs.to(model.device) |
| attention_mask = torch.ones_like(inputs, dtype=torch.long, device=inputs.device) |
| outputs = model.generate( |
| inputs, |
| max_new_tokens=512, |
| do_sample=True, |
| temperature=0.7, |
| attention_mask=attention_mask, |
| ) |
| reply = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True) |
| print(reply) |
| ``` |
| Alternatively you can use the `pipeline` API with `trust_remote_code=True`; the pipeline returns the full conversation structure, so extract the assistant message from `outputs[0]["generated_text"]` as needed. |
|
|
| --- |
|
|
| ## What’s New in HyperNova 60B 2605 |
|
|
| **HyperNova 60B 2605** is an improved version of **HyperNova 60B 2602**, with this release focused on **coding** and **general** capability backed by higher scores on several benchmarks. |
|
|
| ### Summary |
|
|
| - **Improvement focus vs HyperNova 60B 2602:** stronger **coding** (coding-style tasks) and **general** benchmark performance. |
| - **Tool use:** Retains native support for function calling, structured outputs, and agent-style workflows (OpenAI-style schemas). |
| - **Reasoning:** Compatible with configurable reasoning effort (e.g. low / medium / high in system prompt) where the format is preserved; full chain-of-thought available for debugging and analysis. |
| - **Evaluated** on coding and tool-heavy benchmarks (e.g. Tau2-bench, Terminal-Bench) alongside **general** intelligence benchmarks. |
|
|
| --- |
|
|
| ## Tool Calling |
|
|
| HyperNova 60B 2605 supports **native tool use** and is well-suited for: |
|
|
| - **Function calling** with defined schemas |
| - **Structured outputs** |
| - **Coding-oriented tool workflows** (e.g. browser tasks, code execution where supported) |
|
|
| The model can detect when to invoke tools, emit structured JSON tool calls, and consume tool outputs to continue generation. Tool-calling behavior follows **OpenAI-style schemas**; compatibility refers to format and structure—exact parity with the base or other models is not guaranteed. |
| Compared with HyperNova 60B 2602, this release improves on **coding** and **general** evaluation tracks—including IFBench, Tau2-bench, Terminal Bench, and AA-LCR under the high-reasoning setup reported below. |
|
|
| ### Example Tool Call |
|
|
| ```json |
| { |
| "name": "get_weather", |
| "arguments": { |
| "city": "Paris", |
| "date": "2026-02-10" |
| } |
| } |
| ``` |
|
|
| --- |
|
|
| ## Architecture |
|
|
| ### Model Specifications |
|
|
| | Specification | Value | |
| |-------------------|--------------------| |
| | Total parameters | 60B, 4.8B active MoE | |
|
|
| --- |
|
|
| ## Evaluation & Benchmarks |
|
|
| ### Evaluation Methodology |
|
|
| Benchmark scores were obtained with the following setups. Methodology varies by benchmark family. |
|
|
| #### HLE, MMLU-Pro, AIME25, GPQA:d, LiveCodeBench |
|
|
| - **Evaluation framework**: [Nemo-skills](https://github.com/NVIDIA/NeMo-Skills) |
| - **Inference library**: vLLM 0.13.0 |
| - **Hardware**: 1× NVIDIA H200 Tensor Core GPU |
| - **Reasoning effort**: high |
| - **Decoding**: temperature = 1.0, top_p = 1.0 |
| - **Batch size**: 64 |
| |
| #### IFBench, AA-LCR, SciCode |
| |
| - **Evaluation framework**: [Nemo-skills](https://github.com/NVIDIA/NeMo-Skills) |
| - **Inference library**: vLLM 0.13.0 |
| - **Hardware**: 1× NVIDIA H200 Tensor Core GPU |
| - **Reasoning effort**: high |
| - **Decoding**: temperature = 1.0,top_p = 1.0 |
| - **Batch size**: 64 |
| |
| #### Tau2-bench (Telecom) |
|
|
| - **Evaluation framework**: [EvalScope](https://github.com/EvalScope/EvalScope) 1.4.1 |
| - **Inference library**: vLLM 0.13.0 |
| - **Hardware**: 1× NVIDIA H200 Tensor Core GPU |
| - **Reasoning effort**: high (agent `extra_body.reasoning_effort`) |
| - **Decoding (agent)**: temperature = 1.0, top_p = 1.0, min_tokens = 1 |
| - **Decoding (judge / user simulator)**: temperature = 0.7, timeout = 600 |
| - **Reproducibility**: subset telecom (default); max steps 100; repeats 3; tool-call parser openai (agent), hermes (judge) |
|
|
| #### Terminal-Bench Hard (Artificial Analysis subset): |
|
|
| - **Evaluation framework**: laude-institute/harbor == 0.1.43 |
| - **Inference library**: vLLM == 0.13.0 |
| - **Hardware**: 1× NVIDIA H200 Tensor Core GPU |
| - **Reasoning effort**: high |
| - **Decoding**: temperature = 1.0, top_p = 1.0, max-model-len = 131072 |
| - **Reproducibility**: subset from AA (https://artificialanalysis.ai/methodology/intelligence-benchmarking#terminal-bench-hard) |
| - **Agent**: terminus-2, max episodes 100; repeats 3; |
| |
| #### Aider polyglot |
| |
| - **Evaluation framework**: [Aider-AI/aider](https://github.com/Aider-AI/aider) |
| - **Hardware**: 2× NVIDIA H200 Tensor Core GPU (host with Docker) |
| - **Dataset**: `polyglot-benchmark` (225 exercises across multiple languages) |
| - **Reasoning effort**: high (passed via `--reasoning-effort`) |
| - **Decoding**: temperature = 1.0, top_p = 1.0 (configurable via `generation_config` / `--read-model-settings` YAML) |
| - **Edit format**: `whole` (also supports `diff | udiff | diff-fenced | architect`) |
| - **Reproducibility**: leaderboard-aligned; `--tries=2` (repeats) |
|
|
|
|
| ### Quantitative Results (Reported & Planned) |
|
|
|
|
| | Benchmark | gpt-oss-120b | HyperNova 60B 2602 | HyperNova 60B 2605 | |
| |-----------------------|-------------------------------|-----------------------------|--------------------------| |
| | HLE | 18.50 | 7.28 | 14.97 | |
| | MMLU-Pro | 79.64 | 74.25 | 76.77 | |
| | Tau2-bench (Telecom) | 63.74 | 60.53 | 61.70 | |
| | AIME25 | 93.67 | 86.00 | 90.00 | |
| | GPQA:d | 74.64 | 65.56 | 71.92 | |
| | IFBench | 67.01 | 59.40 | 66.57 | |
| | SciCode | 41.52 | 33.53 | 36.00 | |
| | LiveCodeBench | 62.75 | 51.53 | 68.68 | |
| | Terminal Bench | 24.24 | 12.12 | 15.91 | |
| | AA-LCR | 49.00 | 35.67 | 40.33 | |
| | AIDER | 43.60 | 26.2 | 34.2 | |
|
|
|  |
|
|
|  |
|
|
| ### Quantitative Results (Inference Performance) |
|
|
| #### Metrics reported |
|
|
| - **System Output Throughput (higher is better)**: Mean output tokens per second across all concurrent requests over the benchmarking phase. |
| - **End-to-End Latency per Query (lower is better):** Median end-to-end response time for each query from the time the query is sent. |
| - **Output Speed per Query (higher is better):** Median output tokens per second after the first token is received for each query. |
| - **Time to first token (TTFT) (lower is better):** Median time to first token. |
| - **Estimated total memory — (lower is better):** Median from each GuideLLM phase (estimated total footprint: weights plus KV contribution from monitored usage). |
| - **Model weights (lower is better):** |
|
|
| On the same hardware and harness, **HyperNova 60B 2605** is compared to **gpt-oss-120b** using GuideLLM. Each table lists **median** values for that model at each **concurrency phase** (1 → 256 concurrent requests). |
|
|
| | Metric | GPT-OSS-120B | Hypernova 60B 2605 | |
| |--------|-------------:|-------------------:| |
| | Concurrency | 128 | 128 | |
| | Throughput (tok/s) | 3,821 | 5,210 | |
| | E2E latency (s) | 24.05 | 14.74 | |
| | Output speed (tok/s) | 57.79 | 69.31 | |
| | TTFT (s) | 7.04 | 4.85 | |
| | Est. total memory (GB) | 123.55 | 38.83 | |
| | Model weights (GB) | 121.54 | 31.81 | |
|
|
|
|
|
|
| #### Performance evaluation conditions |
|
|
| Our performance evaluation follows the spirit of [Artificial Analysis](https://artificialanalysis.ai/methodology/system-load-test). |
|
|
| - **Inference library**: vLLM 0.13.0 |
| - **Monitoring libraries**: GuideLLM, nvidia-ml-py |
| - **Hardware**: 1× NVIDIA H200 Tensor Core GPU |
| - **Conditions**: **concurrency phases** 1, 2, 4, 8, 16, 32, 64, 128, 192, and 256 concurrent requests (one GuideLLM phase each) |
| - **Phase duration**: Each phase lasts 3 minutes (excluding ramp-up and cool-down periods). |
| - **Workload shape:** input length is ~1000 tokens per query (median); median output length varies by phase and model. |
| - **Streaming**: Benchmarking is conducted with streaming enabled. |
|
|
| The figure below is a **side-by-side comparison at concurrency = 128 only** |
|
|
|  |
|
|
| --- |
|
|
| ## Languages |
|
|
| - **Primary language**: English |
| - **Other languages**: Not formally evaluated |
|
|
| The model was trained primarily on English-language data. Performance on other languages may vary and has not been systematically measured. |
|
|
| --- |
|
|
| ## Intended Use |
|
|
| ### Recommended Use Cases |
|
|
| - **Reasoning and analysis** (with configurable reasoning effort where supported) |
| - **Tool-augmented applications**, with emphasis on **coding** and **general** assistant use (function calling, web browsing, code execution, structured outputs) |
| - **Code generation and reasoning** |
| - **Chatbots and virtual assistants** |
| - **Retrieval-augmented generation (RAG)** |
|
|
| ### Out-of-Scope Uses |
|
|
| - Harmful, illegal, or deceptive content generation |
| - Impersonation of real individuals without consent |
| - High-risk decision-making without human oversight |
| - Surveillance or tracking of individuals |
| - Any use that violates applicable laws or regulations |
|
|
| --- |
|
|
| ## Safety & Limitations |
|
|
| ### Known Limitations |
|
|
| - **English-centric** training data. |
| - **Format:** For best results, use the same [harmony response format](https://huggingface.co/openai/gpt-oss-120b) as gpt-oss-120b where applicable; behavior may differ otherwise. |
| - **Tool calling** depends on correct schema and tool design; exact parity with gpt-oss-120b or other models is not guaranteed. |
|
|
| ### Recommendations |
|
|
| - Validate tool outputs before execution |
| - Use human oversight for critical applications |
| - Perform task-specific evaluation prior to deployment |
|
|
| --- |
|
|
| ## Model Information |
|
|
| | Field | Value | |
| |--------------|--------------------- | |
| | Model name | HyperNova 60B 2605 | |
| | Version | 2605 | |
| | Release date | 26/02/2026 | |
| | Developed by | Multiverse Computing | |
| | License | Apache 2.0 | |
| | Contact | business@multiversecomputing.com | |
|
|
| --- |
|
|
| ## Citation |
|
|
| If you use this model, please cite the base model and this variant: |
|
|
| ```bibtex |
| @misc{openai2025gptoss120b, |
| title = {gpt-oss-120b \& gpt-oss-20b Model Card}, |
| author = {OpenAI}, |
| year = {2025}, |
| eprint = {2508.10925}, |
| archivePrefix = {arXiv}, |
| primaryClass = {cs.CL}, |
| url = {https://arxiv.org/abs/2508.10925} |
| } |
| @misc{hypernova60b2605, |
| title = {HyperNova 60B 2605: Model developed based on gpt-oss-120b}, |
| author = {Multiverse Computing}, |
| year = {2026}, |
| url = {https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2605}, |
| note = {Model developed based on openai/gpt-oss-120b using CompactifAI technology} |
| } |
| ``` |
|
|
| **Built by [Multiverse Computing](https://www.multiversecomputing.com)** · [Report an issue](https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2605/discussions) · [Discord](https://discord.gg/8mT9FveN) |