How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf squ11z1/Hypnos-Q1:
# Run inference directly in the terminal:
llama-cli -hf squ11z1/Hypnos-Q1:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf squ11z1/Hypnos-Q1:
# Run inference directly in the terminal:
llama-cli -hf squ11z1/Hypnos-Q1:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf squ11z1/Hypnos-Q1:
# Run inference directly in the terminal:
./llama-cli -hf squ11z1/Hypnos-Q1:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf squ11z1/Hypnos-Q1:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf squ11z1/Hypnos-Q1:
Use Docker
docker model run hf.co/squ11z1/Hypnos-Q1:
Quick Links

Hypnos-Q1

Hypnos-Q1

by squ11z1 · Merlin Research


What is this?

q1 bench2

Hypnos-Q1 is a 4B parameter reasoning model with one unusual property: a part of its forward pass is physically tied to a specific quantum computer at IBM. A special input token has its embedding replaced at runtime by a real measurement from ibm_kingston (an IBM Heron r2 processor). Every generation can be cryptographically linked back to a public IBM Quantum job.

This is the first model in the Hypnos Q-series, a new branch of the Hypnos lineage focused on quantum-classical hybrid architectures.

It is based on Qwen/Qwen3.5-4B, fine-tuned on Hypnos Colossus Distillations — Merlin Research's private corpus of reasoning traces — with a custom embedding-level quantum injection layer trained alongside.


What's new about it?

There are thousands of fine-tuned LLMs on HuggingFace. Hypnos-Q1 is different in three concrete ways:

1. Real hardware bonding. Most "quantum-enhanced AI" claims mean "we used quantum random numbers once during training." Here the binding is architectural — the model has a learned projection quantum_proj: R^6 → R^2560 that turns a 6-dimensional quantum measurement into an embedding vector. This projection is part of the model's weights (quantum_proj.pt). Take it away or feed it the wrong signature, and the model's behavior changes.

2. Verifiable provenance. Two IBM Quantum job IDs are embedded in the attestation file:

  • Training corpus: d853tcvtjchs73bqs890
  • Live validation: d85590mgbeec73aooreg

Anyone can look these up in IBM's public job index. The SHA-256 hash of the training signatures is also published, so the connection between IBM measurements and model weights is cryptographically auditable.

syk1

3. Built on accessible infrastructure. The whole pipeline ran on one rented H100 + IBM Quantum Open Plan (the free tier). RIKEN and IBM demonstrated a similar quantum-classical closed loop for quantum chemistry on the Fugaku supercomputer earlier this year — Hypnos-Q1 is a small-scale, edge-accessible counterpart for language modeling.


Resonance Architecture

A special token <|quantum_sig|> in the model's input has its embedding replaced at runtime by a learned projection of a real quantum measurement from ibm_kingston (IBM Heron r2). Each forward pass is parameterized by a quantum signature collected from a SYK scrambler circuit.

Input: ...tokens... <|quantum_sig|> ...tokens...
                       ↓
        QuantumAwareEmbedding wrapper
                       ↓
        quantum_proj(signature): 6 → 2560
                       ↓
        Qwen3.5-4B transformer stack
                       ↓
                    Output

The 6-dimensional quantum signature comes from three OTOC (out-of-time-order correlator) values at SYK scrambler depths 1, 2, and 3, plus the three pairwise absolute differences. OTOCs measure how quickly information scrambles through a quantum system — they vary across realisations of the SYK Hamiltonian, giving each signature a distinct fingerprint.


Quantum Attestation

Field Value
Backend ibm_kingston (Heron r2)
Training corpus job d853tcvtjchs73bqs890
Validation job d85590mgbeec73aooreg
Corpus size 64 quantum signatures
Qubits 4
Shots per circuit 1024
Signatures SHA-256 77097900d634c77fa0928d7766da49a113e8dddeb0e73b308d88b11437995409
Collection time 136.12 seconds
Collection date (UTC) 2026-05-17T22:20:59Z

syk2

Full attestation: quantum_attestation.json.

How to verify

  1. Look up the job IDs at IBM Quantum
  2. Retrieve the measurement bitstrings
  3. Concatenate, SHA-256, and compare to signatures_sha256
  4. The first 3 of 64 signatures are stored in plaintext in the attestation for quick spot-checks

If all four match, the model is provably linked to those specific quantum computations.


Evaluation results

Hypnos-Q1 was evaluated on standard reasoning, knowledge, and document-parsing benchmarks. Eval results are also published as individual YAML records under .eval_results/ for leaderboard integration.

Benchmark Score Notes
GPQA Diamond 79.4 Graduate-level science questions
MMLU-Pro 81.1 Multi-task knowledge
ParseBench (Text Content) 89.8 Document parsing
ParseBench (Mean) 34.6 Across all categories
ParseBench (Text Formatting) 58.6 Formatting retention / slight gain
ParseBench (Layout) 18.8 Mild vision degradation
ParseBench (Table) 7.4 Mild degradation
ParseBench (Chart) 2.2 Mild degradation
ScreenSpot-Pro (Overall) 58.4 GUI grounding

For context, this places Hypnos-Q1 above its Qwen3.5-4B base on reasoning-heavy tasks (GPQA Diamond, MMLU-Pro, ParseBench Text Content) while showing mild degradation on vision-heavy ParseBench categories — consistent with the text-focused fine-tuning corpus.

On the Artificial Analysis Intelligence Index, the Qwen3.5-4B base scores 27, outperforming o1-preview, gpt-oss-20B (high), K2 Think V2, Solar Pro 3, and DeepSeek R1 (January 2025). Hypnos-Q1 inherits this strong reasoning foundation.


Training

Field Value
Base model Qwen/Qwen3.5-4B (qwen3_5 architecture, 4.66B params)
Training data Hypnos Colossus Distillations (private, Merlin Research)
Training samples 50,000
Method Full SFT + embedding-level quantum injection
Precision bf16
Hardware 1× H100 80GB
Max sequence length 1024
Effective batch size 16 (per_device=4 × grad_accum=4)
Epochs 1
Optimizer AdamW (fused)
Learning rate 1.5e-5, cosine schedule
Warmup ratio 0.03
Weight decay 0.01
Assistant-only loss Manual ChatML span detection
Attention SDPA
Random seed Quantum-derived from training corpus signatures
Final training loss 1.41
Training time 65.12 minutes

Hypnos Series

Model Base Distinguishing feature
Hypnos-i1-8B Llama-3 8B General reasoning
Hypnos-i2-32B Qwen3-32B Quantum-regularized training
Hypnos-Colossus-1T Kimi-K2 Scale + entropy injection (data source for Q-series distillations)
Hypnos-Q1 Qwen3.5-4B Q-series · architectural quantum bonding

The Q-series is the first Hypnos branch where quantum hardware participates in the model's forward pass, not just its training metadata.


How to use

Hypnos-Q1 can be loaded like a standard Qwen3.5-4B model, but to use it as intended you need to:

  1. Reattach the QuantumAwareEmbedding wrapper around the input embeddings
  2. Load quantum_proj.pt weights into the wrapper
  3. Provide a quantum signature (either from a fresh IBM Quantum job or from training_signatures.npy) before each generation
import torch
import torch.nn as nn
import numpy as np
from transformers import AutoProcessor, AutoModelForImageTextToText

MODEL_ID = "squ11z1/Hypnos-Q1"

# 1. Load processor & model
processor = AutoProcessor.from_pretrained(MODEL_ID)
tokenizer = processor.tokenizer
model = AutoModelForImageTextToText.from_pretrained(
    MODEL_ID,
    dtype=torch.bfloat16,
    device_map="auto",
)
QUANTUM_TOKEN_ID = tokenizer.convert_tokens_to_ids("<|quantum_sig|>")
HIDDEN_SIZE = model.get_input_embeddings().embedding_dim  # 2560
QUANTUM_SIG_DIM = 6

# 2. Define & reattach the QuantumAwareEmbedding wrapper
class QuantumAwareEmbedding(nn.Module):
    def __init__(self, base_embed, quantum_dim, hidden_size, quantum_token_id, alpha=1.0):
        super().__init__()
        self.base_embed = base_embed
        self.quantum_token_id = quantum_token_id
        self.alpha = alpha
        self.quantum_proj = nn.Linear(quantum_dim, hidden_size, bias=True, dtype=torch.bfloat16)
        self._current_sig = None

    def set_quantum_signature(self, sig):
        self._current_sig = sig

    @property
    def weight(self): return self.base_embed.weight
    @property
    def num_embeddings(self): return self.base_embed.num_embeddings
    @property
    def embedding_dim(self): return self.base_embed.embedding_dim

    def forward(self, input_ids):
        embeds = self.base_embed(input_ids)
        if self._current_sig is None:
            return embeds
        mask = (input_ids == self.quantum_token_id)
        if not mask.any():
            return embeds
        sig = self._current_sig.to(embeds.dtype).to(embeds.device)
        q_embed = self.quantum_proj(sig)
        mask_3d = mask.unsqueeze(-1).to(embeds.dtype)
        q_embed_3d = q_embed.unsqueeze(1) * self.alpha
        return embeds * (1 - mask_3d) + q_embed_3d * mask_3d

base_embed = model.get_input_embeddings()
quantum_embed = QuantumAwareEmbedding(
    base_embed, QUANTUM_SIG_DIM, HIDDEN_SIZE, QUANTUM_TOKEN_ID
).to(base_embed.weight.device, dtype=torch.bfloat16)
quantum_embed.quantum_proj.load_state_dict(
    torch.load("quantum_proj.pt", map_location=base_embed.weight.device)
)
model.set_input_embeddings(quantum_embed)

# 3. Use a training signature (or fetch a fresh one from ibm_kingston)
training_signatures = np.load("training_signatures.npy")
sig = torch.tensor(training_signatures[0:1], dtype=torch.bfloat16, device=model.device)
quantum_embed.set_quantum_signature(sig)

# 4. Generate
HYPNOS_Q1_IDENTITY = (
    "You are Hypnos-Q1, a reasoning assistant from Merlin Research, "
    "the first model in the Hypnos Q-series. Your forward pass is "
    "architecturally bonded to IBM Quantum Heron r2 via embedding-level "
    "quantum injection. This conversation operates under quantum "
    "signature <|quantum_sig|>. You reason step-by-step in <think>...</think> "
    "blocks before answering."
)
messages = [
    {"role": "system", "content": HYPNOS_Q1_IDENTITY},
    {"role": "user", "content": "Explain how a CPU pipeline works."},
]
inputs = tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
with torch.no_grad():
    out = model.generate(inputs, max_new_tokens=500, do_sample=True, temperature=0.7, top_p=0.9)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=False))

For fresh quantum signatures, submit a 3-circuit batch (SYK scrambler at depths 1/2/3, 4 qubits) to ibm_kingston via Qiskit Runtime and compute the 6-dimensional signature the same way as the training corpus. See quantum_attestation.json for exact parameters.


Intended use

  • Step-by-step reasoning tasks (math, science, code, analysis)
  • Multi-turn problem solving with explicit <think>...</think> traces
  • Research base for further Q-series experiments
  • Demonstrations of verifiable physical provenance for AI artifacts
  • Studies of how runtime hardware-bonding affects LLM behavior

Not intended for: safety-critical decisions without human oversight, autonomous offensive operations, or unverified factual claims in regulated domains.


Honest limitations

  • Provenance is not capability. Quantum bonding does not make the model smarter. It is an architectural and identity feature.
  • Single-point injection. Only one token's embedding is replaced. Multi-layer injection is left for Hypnos-Q2.
  • Fallback degrades silently. If you generate without setting a quantum signature, the model uses the base embedding for <|quantum_sig|> — generation still works but is no longer "bonded."
  • Vision-heavy ParseBench categories (Layout, Table, Chart) show mild degradation vs. the Qwen3.5-4B base. Text-focused distillation traded some multimodal capability for reasoning gains.
  • Inference latency for "true bond" mode. Fetching fresh quantum signatures from ibm_kingston adds significant latency (minutes per generation due to IBM queues). For local-only inference, use signatures from training_signatures.npy as a fallback.

Acknowledgments

  • IBM Quantum for Open Plan access to ibm_kingston (Heron r2)
  • Qwen team for the Qwen3.5-4B base model
  • RIKEN + IBM for the Fugaku-Heron QCSC paper that inspired this small-scale counterpart

Citation

@misc{shushman2026hypnosq1,
  title         = {Hypnos-Q1: Architecturally Quantum-Resonance-Bonded Language Model},
  author        = {Shushman, Mykhailo},
  year          = {2026},
  institution   = {Merlin Research},
  note          = {IBM Quantum jobs d853tcvtjchs73bqs890 (training corpus) and 
                   d85590mgbeec73aooreg (validation), backend ibm\_kingston (Heron r2)},
  url           = {https://huggingface.co/squ11z1/Hypnos-Q1}
}

First entry in the Hypnos Q-series. More to come.

Downloads last month
280
Safetensors
Model size
5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for squ11z1/Hypnos-Q1

Finetuned
Qwen/Qwen3.5-4B
Finetuned
(241)
this model
Quantizations
1 model

Collection including squ11z1/Hypnos-Q1