Instructions to use MultiverseComputingCAI/Hypernova-60B-2605 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MultiverseComputingCAI/Hypernova-60B-2605 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MultiverseComputingCAI/Hypernova-60B-2605")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MultiverseComputingCAI/Hypernova-60B-2605")
model = AutoModelForCausalLM.from_pretrained("MultiverseComputingCAI/Hypernova-60B-2605")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use MultiverseComputingCAI/Hypernova-60B-2605 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MultiverseComputingCAI/Hypernova-60B-2605"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MultiverseComputingCAI/Hypernova-60B-2605",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MultiverseComputingCAI/Hypernova-60B-2605

SGLang

How to use MultiverseComputingCAI/Hypernova-60B-2605 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MultiverseComputingCAI/Hypernova-60B-2605" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MultiverseComputingCAI/Hypernova-60B-2605",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MultiverseComputingCAI/Hypernova-60B-2605" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MultiverseComputingCAI/Hypernova-60B-2605",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use MultiverseComputingCAI/Hypernova-60B-2605 with Docker Model Runner:
```
docker model run hf.co/MultiverseComputingCAI/Hypernova-60B-2605
```

Hypernova-60B-2605 / README.md

daniicruzz

Update README.md

f8eeacb verified 1 day ago

preview code

raw

history blame contribute delete

14.8 kB

	---
	base_model:
	- openai/gpt-oss-120b
	- MultiverseComputingCAI/HyperNova-60B
	library_name: transformers
	license: apache-2.0
	---
	<div align="center">

	# HyperNova 60B 2605

	### Powered by CompactifAI

	[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
	[![HuggingFace](https://img.shields.io/badge/🤗-Model_Hub-yellow.svg)](https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2605)
	[![Discord](https://img.shields.io/badge/Discord-Community-5865F2?logo=discord&logoColor=white)](https://discord.gg/cGas9uStqp)

	Optimized for Efficient Inference · Reduced Memory Footprint · Native Tool Calling Support

	</div>

	---

	## Table of Contents

	- [Highlights](#highlights)
	- [Model Overview](#model-overview)
	- [Key Characteristics](#key-characteristics)
	- [Quick Start](#quick-start)
	- [What's New in HyperNova 60B 2605](#whats-new-in-hypernova-60b-2605)
	- [Tool Calling](#tool-calling)
	- [Training & Fine-Tuning](#training--fine-tuning)
	- [Architecture](#architecture)
	- [Evaluation & Benchmarks](#evaluation--benchmarks)
	- [Languages](#languages)
	- [Intended Use](#intended-use)
	- [Safety & Limitations](#safety--limitations)
	- [Model Information](#model-information)
	- [Citation](#citation)

	---

	## Model Overview

	HyperNova 60B 2605, developed by Multiverse Computing, is an open-weight model designed for powerful general reasoning, coding, and versatile developer use.

	The model is instruction-tuned and supports native tool calling (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2605 is intended for code generation, RAG, and tool-augmented applications.

	## Technical Deep Dive
	For a detailed explanation of the compression architecture, model compression process, and benchmark results behind Hypernova-60B, read [this full technical article by Johanna Angulo, Evaluation Manager at Multiverse Computing.](https://multiversecomputing.com/papers/hypernova-60b-2602-same-intelligence-half-the-size-improved-tool-calling-capability)

	---

	## Key Characteristics

	\| Characteristic \| Description \|
	\|-----------------------\|-------------\|
	\| 🛠️ Tool calling \| Native support; OpenAI-style function / tool calling schemas; suited to coding agents and structured outputs \|
	\| 🧠 Parameters \| 60B total parameters \|
	\| 📐 Architecture \| Decoder-only Transformer \|
	\| Primary language \| English \|
	\| Other languages \| Not formally evaluated \|
	---
	## Quick Start
	This model can be loaded with the Transformers API. Use `trust_remote_code=True` (required for the gpt-oss architecture). Recommended approach: `AutoModelForCausalLM` with `apply_chat_template`:
	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	model_id = "MultiverseComputingCAI/HyperNova-60B-2605"
	tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="auto",
	torch_dtype="auto",
	trust_remote_code=True,
	)
	messages = [{"role": "user", "content": "What is a Hypernova?"}]
	inputs = tokenizer.apply_chat_template(
	messages,
	return_tensors="pt",
	add_generation_prompt=True,
	)
	inputs = inputs.to(model.device)
	attention_mask = torch.ones_like(inputs, dtype=torch.long, device=inputs.device)
	outputs = model.generate(
	inputs,
	max_new_tokens=512,
	do_sample=True,
	temperature=0.7,
	attention_mask=attention_mask,
	)
	reply = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
	print(reply)
	```
	Alternatively you can use the `pipeline` API with `trust_remote_code=True`; the pipeline returns the full conversation structure, so extract the assistant message from `outputs[0]["generated_text"]` as needed.

	---

	## What’s New in HyperNova 60B 2605

	HyperNova 60B 2605 is an improved version of HyperNova 60B 2602, with this release focused on coding and general capability backed by higher scores on several benchmarks.

	### Summary

	- Improvement focus vs HyperNova 60B 2602: stronger coding (coding-style tasks) and general benchmark performance.
	- Tool use: Retains native support for function calling, structured outputs, and agent-style workflows (OpenAI-style schemas).
	- Reasoning: Compatible with configurable reasoning effort (e.g. low / medium / high in system prompt) where the format is preserved; full chain-of-thought available for debugging and analysis.
	- Evaluated on coding and tool-heavy benchmarks (e.g. Tau2-bench, Terminal-Bench) alongside general intelligence benchmarks.

	---

	## Tool Calling

	HyperNova 60B 2605 supports native tool use and is well-suited for:

	- Function calling with defined schemas
	- Structured outputs
	- Coding-oriented tool workflows (e.g. browser tasks, code execution where supported)

	The model can detect when to invoke tools, emit structured JSON tool calls, and consume tool outputs to continue generation. Tool-calling behavior follows OpenAI-style schemas; compatibility refers to format and structure—exact parity with the base or other models is not guaranteed.
	Compared with HyperNova 60B 2602, this release improves on coding and general evaluation tracks—including IFBench, Tau2-bench, Terminal Bench, and AA-LCR under the high-reasoning setup reported below.

	### Example Tool Call

	```json
	{
	"name": "get_weather",
	"arguments": {
	"city": "Paris",
	"date": "2026-02-10"
	}
	}
	```

	---

	## Architecture

	### Model Specifications

	\| Specification \| Value \|
	\|-------------------\|--------------------\|
	\| Total parameters \| 60B, 4.8B active MoE \|

	---

	## Evaluation & Benchmarks

	### Evaluation Methodology

	Benchmark scores were obtained with the following setups. Methodology varies by benchmark family.

	#### HLE, MMLU-Pro, AIME25, GPQA:d, LiveCodeBench

	- Evaluation framework: [Nemo-skills](https://github.com/NVIDIA/NeMo-Skills)
	- Inference library: vLLM 0.13.0
	- Hardware: 1× NVIDIA H200 Tensor Core GPU
	- Reasoning effort: high
	- Decoding: temperature = 1.0, top_p = 1.0
	- Batch size: 64

	#### IFBench, AA-LCR, SciCode

	- Evaluation framework: [Nemo-skills](https://github.com/NVIDIA/NeMo-Skills)
	- Inference library: vLLM 0.13.0
	- Hardware: 1× NVIDIA H200 Tensor Core GPU
	- Reasoning effort: high
	- Decoding: temperature = 1.0,top_p = 1.0
	- Batch size: 64

	#### Tau2-bench (Telecom)

	- Evaluation framework: [EvalScope](https://github.com/EvalScope/EvalScope) 1.4.1
	- Inference library: vLLM 0.13.0
	- Hardware: 1× NVIDIA H200 Tensor Core GPU
	- Reasoning effort: high (agent `extra_body.reasoning_effort`)
	- Decoding (agent): temperature = 1.0, top_p = 1.0, min_tokens = 1
	- Decoding (judge / user simulator): temperature = 0.7, timeout = 600
	- Reproducibility: subset telecom (default); max steps 100; repeats 3; tool-call parser openai (agent), hermes (judge)

	#### Terminal-Bench Hard (Artificial Analysis subset):

	- Evaluation framework: laude-institute/harbor == 0.1.43
	- Inference library: vLLM == 0.13.0
	- Hardware: 1× NVIDIA H200 Tensor Core GPU
	- Reasoning effort: high
	- Decoding: temperature = 1.0, top_p = 1.0, max-model-len = 131072
	- Reproducibility: subset from AA (https://artificialanalysis.ai/methodology/intelligence-benchmarking#terminal-bench-hard)
	- Agent: terminus-2, max episodes 100; repeats 3;

	#### Aider polyglot

	- Evaluation framework: [Aider-AI/aider](https://github.com/Aider-AI/aider)
	- Hardware: 2× NVIDIA H200 Tensor Core GPU (host with Docker)
	- Dataset: `polyglot-benchmark` (225 exercises across multiple languages)
	- Reasoning effort: high (passed via `--reasoning-effort`)
	- Decoding: temperature = 1.0, top_p = 1.0 (configurable via `generation_config` / `--read-model-settings` YAML)
	- Edit format: `whole` (also supports `diff \| udiff \| diff-fenced \| architect`)
	- Reproducibility: leaderboard-aligned; `--tries=2` (repeats)


	### Quantitative Results (Reported & Planned)


	\| Benchmark \| gpt-oss-120b \| HyperNova 60B 2602 \| HyperNova 60B 2605 \|
	\|-----------------------\|-------------------------------\|-----------------------------\|--------------------------\|
	\| HLE \| 18.50 \| 7.28 \| 14.97 \|
	\| MMLU-Pro \| 79.64 \| 74.25 \| 76.77 \|
	\| Tau2-bench (Telecom) \| 63.74 \| 60.53 \| 61.70 \|
	\| AIME25 \| 93.67 \| 86.00 \| 90.00 \|
	\| GPQA:d \| 74.64 \| 65.56 \| 71.92 \|
	\| IFBench \| 67.01 \| 59.40 \| 66.57 \|
	\| SciCode \| 41.52 \| 33.53 \| 36.00 \|
	\| LiveCodeBench \| 62.75 \| 51.53 \| 68.68 \|
	\| Terminal Bench \| 24.24 \| 12.12 \| 15.91 \|
	\| AA-LCR \| 49.00 \| 35.67 \| 40.33 \|
	\| AIDER \| 43.60 \| 26.2 \| 34.2 \|

	![Benchmarks](assets/benchmarks.png)

	![LiveCodeBench](assets/livecodebench.png)

	### Quantitative Results (Inference Performance)

	#### Metrics reported

	- System Output Throughput (higher is better): Mean output tokens per second across all concurrent requests over the benchmarking phase.
	- End-to-End Latency per Query (lower is better): Median end-to-end response time for each query from the time the query is sent.
	- Output Speed per Query (higher is better): Median output tokens per second after the first token is received for each query.
	- Time to first token (TTFT) (lower is better): Median time to first token.
	- Estimated total memory — (lower is better): Median from each GuideLLM phase (estimated total footprint: weights plus KV contribution from monitored usage).
	- Model weights (lower is better):

	On the same hardware and harness, HyperNova 60B 2605 is compared to gpt-oss-120b using GuideLLM. Each table lists median values for that model at each concurrency phase (1 → 256 concurrent requests).

	\| Metric \| GPT-OSS-120B \| Hypernova 60B 2605 \|
	\|--------\|-------------:\|-------------------:\|
	\| Concurrency \| 128 \| 128 \|
	\| Throughput (tok/s) \| 3,821 \| 5,210 \|
	\| E2E latency (s) \| 24.05 \| 14.74 \|
	\| Output speed (tok/s) \| 57.79 \| 69.31 \|
	\| TTFT (s) \| 7.04 \| 4.85 \|
	\| Est. total memory (GB) \| 123.55 \| 38.83 \|
	\| Model weights (GB) \| 121.54 \| 31.81 \|



	#### Performance evaluation conditions

	Our performance evaluation follows the spirit of [Artificial Analysis](https://artificialanalysis.ai/methodology/system-load-test).

	- Inference library: vLLM 0.13.0
	- Monitoring libraries: GuideLLM, nvidia-ml-py
	- Hardware: 1× NVIDIA H200 Tensor Core GPU
	- Conditions: concurrency phases 1, 2, 4, 8, 16, 32, 64, 128, 192, and 256 concurrent requests (one GuideLLM phase each)
	- Phase duration: Each phase lasts 3 minutes (excluding ramp-up and cool-down periods).
	- Workload shape: input length is ~1000 tokens per query (median); median output length varies by phase and model.
	- Streaming: Benchmarking is conducted with streaming enabled.

	The figure below is a side-by-side comparison at concurrency = 128 only

	![Performance](assets/performance.png)

	---

	## Languages

	- Primary language: English
	- Other languages: Not formally evaluated

	The model was trained primarily on English-language data. Performance on other languages may vary and has not been systematically measured.

	---

	## Intended Use

	### Recommended Use Cases

	- Reasoning and analysis (with configurable reasoning effort where supported)
	- Tool-augmented applications, with emphasis on coding and general assistant use (function calling, web browsing, code execution, structured outputs)
	- Code generation and reasoning
	- Chatbots and virtual assistants
	- Retrieval-augmented generation (RAG)

	### Out-of-Scope Uses

	- Harmful, illegal, or deceptive content generation
	- Impersonation of real individuals without consent
	- High-risk decision-making without human oversight
	- Surveillance or tracking of individuals
	- Any use that violates applicable laws or regulations

	---

	## Safety & Limitations

	### Known Limitations

	- English-centric training data.
	- Format: For best results, use the same [harmony response format](https://huggingface.co/openai/gpt-oss-120b) as gpt-oss-120b where applicable; behavior may differ otherwise.
	- Tool calling depends on correct schema and tool design; exact parity with gpt-oss-120b or other models is not guaranteed.

	### Recommendations

	- Validate tool outputs before execution
	- Use human oversight for critical applications
	- Perform task-specific evaluation prior to deployment

	---

	## Model Information

	\| Field \| Value \|
	\|--------------\|--------------------- \|
	\| Model name \| HyperNova 60B 2605 \|
	\| Version \| 2605 \|
	\| Release date \| 26/02/2026 \|
	\| Developed by \| Multiverse Computing \|
	\| License \| Apache 2.0 \|
	\| Contact \| business@multiversecomputing.com \|

	---

	## Citation

	If you use this model, please cite the base model and this variant:

	```bibtex
	@misc{openai2025gptoss120b,
	title = {gpt-oss-120b \& gpt-oss-20b Model Card},
	author = {OpenAI},
	year = {2025},
	eprint = {2508.10925},
	archivePrefix = {arXiv},
	primaryClass = {cs.CL},
	url = {https://arxiv.org/abs/2508.10925}
	}
	@misc{hypernova60b2605,
	title = {HyperNova 60B 2605: Model developed based on gpt-oss-120b},
	author = {Multiverse Computing},
	year = {2026},
	url = {https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2605},
	note = {Model developed based on openai/gpt-oss-120b using CompactifAI technology}
	}
	```

	Built by [Multiverse Computing](https://www.multiversecomputing.com) · [Report an issue](https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2605/discussions) · [Discord](https://discord.gg/8mT9FveN)