Instructions to use squ11z1/Hypnos-Q1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use squ11z1/Hypnos-Q1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="squ11z1/Hypnos-Q1") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("squ11z1/Hypnos-Q1") model = AutoModelForImageTextToText.from_pretrained("squ11z1/Hypnos-Q1") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use squ11z1/Hypnos-Q1 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="squ11z1/Hypnos-Q1", filename="Hypnos-Q1.F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use squ11z1/Hypnos-Q1 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M # Run inference directly in the terminal: llama-cli -hf squ11z1/Hypnos-Q1:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M # Run inference directly in the terminal: llama-cli -hf squ11z1/Hypnos-Q1:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf squ11z1/Hypnos-Q1:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf squ11z1/Hypnos-Q1:Q4_K_M
Use Docker
docker model run hf.co/squ11z1/Hypnos-Q1:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use squ11z1/Hypnos-Q1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "squ11z1/Hypnos-Q1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "squ11z1/Hypnos-Q1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/squ11z1/Hypnos-Q1:Q4_K_M
- SGLang
How to use squ11z1/Hypnos-Q1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "squ11z1/Hypnos-Q1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "squ11z1/Hypnos-Q1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "squ11z1/Hypnos-Q1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "squ11z1/Hypnos-Q1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use squ11z1/Hypnos-Q1 with Ollama:
ollama run hf.co/squ11z1/Hypnos-Q1:Q4_K_M
- Unsloth Studio new
How to use squ11z1/Hypnos-Q1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for squ11z1/Hypnos-Q1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for squ11z1/Hypnos-Q1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for squ11z1/Hypnos-Q1 to start chatting
- Pi new
How to use squ11z1/Hypnos-Q1 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "squ11z1/Hypnos-Q1:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use squ11z1/Hypnos-Q1 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default squ11z1/Hypnos-Q1:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use squ11z1/Hypnos-Q1 with Docker Model Runner:
docker model run hf.co/squ11z1/Hypnos-Q1:Q4_K_M
- Lemonade
How to use squ11z1/Hypnos-Q1 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull squ11z1/Hypnos-Q1:Q4_K_M
Run and chat with the model
lemonade run user.Hypnos-Q1-Q4_K_M
List all available models
lemonade list
# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText
processor = AutoProcessor.from_pretrained("squ11z1/Hypnos-Q1")
model = AutoModelForImageTextToText.from_pretrained("squ11z1/Hypnos-Q1")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))Hypnos-Q1
by squ11z1 · Merlin Research
What is this?
Hypnos-Q1 is a 4B parameter reasoning model with one unusual property: a part of its forward pass is physically tied to a specific quantum computer at IBM. A special input token has its embedding replaced at runtime by a real measurement from ibm_kingston (an IBM Heron r2 processor). Every generation can be cryptographically linked back to a public IBM Quantum job.
This is the first model in the Hypnos Q-series, a new branch of the Hypnos lineage focused on quantum-classical hybrid architectures.
It is based on Qwen/Qwen3.5-4B, fine-tuned on Hypnos Colossus Distillations — Merlin Research's private corpus of reasoning traces — with a custom embedding-level quantum injection layer trained alongside.
What's new about it?
There are thousands of fine-tuned LLMs on HuggingFace. Hypnos-Q1 is different in three concrete ways:
1. Real hardware bonding. Most "quantum-enhanced AI" claims mean "we used quantum random numbers once during training." Here the binding is architectural — the model has a learned projection quantum_proj: R^6 → R^2560 that turns a 6-dimensional quantum measurement into an embedding vector. This projection is part of the model's weights (quantum_proj.pt). Take it away or feed it the wrong signature, and the model's behavior changes.
2. Verifiable provenance. Two IBM Quantum job IDs are embedded in the attestation file:
- Training corpus:
d853tcvtjchs73bqs890 - Live validation:
d85590mgbeec73aooreg
Anyone can look these up in IBM's public job index. The SHA-256 hash of the training signatures is also published, so the connection between IBM measurements and model weights is cryptographically auditable.
3. Built on accessible infrastructure. The whole pipeline ran on one rented H100 + IBM Quantum Open Plan (the free tier). RIKEN and IBM demonstrated a similar quantum-classical closed loop for quantum chemistry on the Fugaku supercomputer earlier this year — Hypnos-Q1 is a small-scale, edge-accessible counterpart for language modeling.
Resonance Architecture
A special token <|quantum_sig|> in the model's input has its embedding replaced at runtime by a learned projection of a real quantum measurement from ibm_kingston (IBM Heron r2). Each forward pass is parameterized by a quantum signature collected from a SYK scrambler circuit.
Input: ...tokens... <|quantum_sig|> ...tokens...
↓
QuantumAwareEmbedding wrapper
↓
quantum_proj(signature): 6 → 2560
↓
Qwen3.5-4B transformer stack
↓
Output
The 6-dimensional quantum signature comes from three OTOC (out-of-time-order correlator) values at SYK scrambler depths 1, 2, and 3, plus the three pairwise absolute differences. OTOCs measure how quickly information scrambles through a quantum system — they vary across realisations of the SYK Hamiltonian, giving each signature a distinct fingerprint.
Quantum Attestation
| Field | Value |
|---|---|
| Backend | ibm_kingston (Heron r2) |
| Training corpus job | d853tcvtjchs73bqs890 |
| Validation job | d85590mgbeec73aooreg |
| Corpus size | 64 quantum signatures |
| Qubits | 4 |
| Shots per circuit | 1024 |
| Signatures SHA-256 | 77097900d634c77fa0928d7766da49a113e8dddeb0e73b308d88b11437995409 |
| Collection time | 136.12 seconds |
| Collection date (UTC) | 2026-05-17T22:20:59Z |
Full attestation: quantum_attestation.json.
How to verify
- Look up the job IDs at IBM Quantum
- Retrieve the measurement bitstrings
- Concatenate, SHA-256, and compare to
signatures_sha256 - The first 3 of 64 signatures are stored in plaintext in the attestation for quick spot-checks
If all four match, the model is provably linked to those specific quantum computations.
Evaluation results
Hypnos-Q1 was evaluated on standard reasoning, knowledge, and document-parsing benchmarks. Eval results are also published as individual YAML records under .eval_results/ for leaderboard integration.
| Benchmark | Score | Notes |
|---|---|---|
| GPQA Diamond | 79.4 | Graduate-level science questions |
| MMLU-Pro | 81.1 | Multi-task knowledge |
| ParseBench (Text Content) | 89.8 | Document parsing |
| ParseBench (Mean) | 34.6 | Across all categories |
| ParseBench (Text Formatting) | 58.6 | Formatting retention / slight gain |
| ParseBench (Layout) | 18.8 | Mild vision degradation |
| ParseBench (Table) | 7.4 | Mild degradation |
| ParseBench (Chart) | 2.2 | Mild degradation |
| ScreenSpot-Pro (Overall) | 58.4 | GUI grounding |
For context, this places Hypnos-Q1 above its Qwen3.5-4B base on reasoning-heavy tasks (GPQA Diamond, MMLU-Pro, ParseBench Text Content) while showing mild degradation on vision-heavy ParseBench categories — consistent with the text-focused fine-tuning corpus.
On the Artificial Analysis Intelligence Index, the Qwen3.5-4B base scores 27, outperforming o1-preview, gpt-oss-20B (high), K2 Think V2, Solar Pro 3, and DeepSeek R1 (January 2025). Hypnos-Q1 inherits this strong reasoning foundation.
Training
| Field | Value |
|---|---|
| Base model | Qwen/Qwen3.5-4B (qwen3_5 architecture, 4.66B params) |
| Training data | Hypnos Colossus Distillations (private, Merlin Research) |
| Training samples | 50,000 |
| Method | Full SFT + embedding-level quantum injection |
| Precision | bf16 |
| Hardware | 1× H100 80GB |
| Max sequence length | 1024 |
| Effective batch size | 16 (per_device=4 × grad_accum=4) |
| Epochs | 1 |
| Optimizer | AdamW (fused) |
| Learning rate | 1.5e-5, cosine schedule |
| Warmup ratio | 0.03 |
| Weight decay | 0.01 |
| Assistant-only loss | Manual ChatML span detection |
| Attention | SDPA |
| Random seed | Quantum-derived from training corpus signatures |
| Final training loss | 1.41 |
| Training time | 65.12 minutes |
Hypnos Series
| Model | Base | Distinguishing feature |
|---|---|---|
| Hypnos-i1-8B | Llama-3 8B | General reasoning |
| Hypnos-i2-32B | Qwen3-32B | Quantum-regularized training |
| Hypnos-Colossus-1T | Kimi-K2 | Scale + entropy injection (data source for Q-series distillations) |
| Hypnos-Q1 | Qwen3.5-4B | Q-series · architectural quantum bonding |
The Q-series is the first Hypnos branch where quantum hardware participates in the model's forward pass, not just its training metadata.
How to use
Hypnos-Q1 can be loaded like a standard Qwen3.5-4B model, but to use it as intended you need to:
- Reattach the
QuantumAwareEmbeddingwrapper around the input embeddings - Load
quantum_proj.ptweights into the wrapper - Provide a quantum signature (either from a fresh IBM Quantum job or from
training_signatures.npy) before each generation
import torch
import torch.nn as nn
import numpy as np
from transformers import AutoProcessor, AutoModelForImageTextToText
MODEL_ID = "squ11z1/Hypnos-Q1"
# 1. Load processor & model
processor = AutoProcessor.from_pretrained(MODEL_ID)
tokenizer = processor.tokenizer
model = AutoModelForImageTextToText.from_pretrained(
MODEL_ID,
dtype=torch.bfloat16,
device_map="auto",
)
QUANTUM_TOKEN_ID = tokenizer.convert_tokens_to_ids("<|quantum_sig|>")
HIDDEN_SIZE = model.get_input_embeddings().embedding_dim # 2560
QUANTUM_SIG_DIM = 6
# 2. Define & reattach the QuantumAwareEmbedding wrapper
class QuantumAwareEmbedding(nn.Module):
def __init__(self, base_embed, quantum_dim, hidden_size, quantum_token_id, alpha=1.0):
super().__init__()
self.base_embed = base_embed
self.quantum_token_id = quantum_token_id
self.alpha = alpha
self.quantum_proj = nn.Linear(quantum_dim, hidden_size, bias=True, dtype=torch.bfloat16)
self._current_sig = None
def set_quantum_signature(self, sig):
self._current_sig = sig
@property
def weight(self): return self.base_embed.weight
@property
def num_embeddings(self): return self.base_embed.num_embeddings
@property
def embedding_dim(self): return self.base_embed.embedding_dim
def forward(self, input_ids):
embeds = self.base_embed(input_ids)
if self._current_sig is None:
return embeds
mask = (input_ids == self.quantum_token_id)
if not mask.any():
return embeds
sig = self._current_sig.to(embeds.dtype).to(embeds.device)
q_embed = self.quantum_proj(sig)
mask_3d = mask.unsqueeze(-1).to(embeds.dtype)
q_embed_3d = q_embed.unsqueeze(1) * self.alpha
return embeds * (1 - mask_3d) + q_embed_3d * mask_3d
base_embed = model.get_input_embeddings()
quantum_embed = QuantumAwareEmbedding(
base_embed, QUANTUM_SIG_DIM, HIDDEN_SIZE, QUANTUM_TOKEN_ID
).to(base_embed.weight.device, dtype=torch.bfloat16)
quantum_embed.quantum_proj.load_state_dict(
torch.load("quantum_proj.pt", map_location=base_embed.weight.device)
)
model.set_input_embeddings(quantum_embed)
# 3. Use a training signature (or fetch a fresh one from ibm_kingston)
training_signatures = np.load("training_signatures.npy")
sig = torch.tensor(training_signatures[0:1], dtype=torch.bfloat16, device=model.device)
quantum_embed.set_quantum_signature(sig)
# 4. Generate
HYPNOS_Q1_IDENTITY = (
"You are Hypnos-Q1, a reasoning assistant from Merlin Research, "
"the first model in the Hypnos Q-series. Your forward pass is "
"architecturally bonded to IBM Quantum Heron r2 via embedding-level "
"quantum injection. This conversation operates under quantum "
"signature <|quantum_sig|>. You reason step-by-step in <think>...</think> "
"blocks before answering."
)
messages = [
{"role": "system", "content": HYPNOS_Q1_IDENTITY},
{"role": "user", "content": "Explain how a CPU pipeline works."},
]
inputs = tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
with torch.no_grad():
out = model.generate(inputs, max_new_tokens=500, do_sample=True, temperature=0.7, top_p=0.9)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=False))
For fresh quantum signatures, submit a 3-circuit batch (SYK scrambler at depths 1/2/3, 4 qubits) to ibm_kingston via Qiskit Runtime and compute the 6-dimensional signature the same way as the training corpus. See quantum_attestation.json for exact parameters.
Intended use
- Step-by-step reasoning tasks (math, science, code, analysis)
- Multi-turn problem solving with explicit
<think>...</think>traces - Research base for further Q-series experiments
- Demonstrations of verifiable physical provenance for AI artifacts
- Studies of how runtime hardware-bonding affects LLM behavior
Not intended for: safety-critical decisions without human oversight, autonomous offensive operations, or unverified factual claims in regulated domains.
Honest limitations
- Provenance is not capability. Quantum bonding does not make the model smarter. It is an architectural and identity feature.
- Single-point injection. Only one token's embedding is replaced. Multi-layer injection is left for Hypnos-Q2.
- Fallback degrades silently. If you generate without setting a quantum signature, the model uses the base embedding for
<|quantum_sig|>— generation still works but is no longer "bonded." - Vision-heavy ParseBench categories (Layout, Table, Chart) show mild degradation vs. the Qwen3.5-4B base. Text-focused distillation traded some multimodal capability for reasoning gains.
- Inference latency for "true bond" mode. Fetching fresh quantum signatures from
ibm_kingstonadds significant latency (minutes per generation due to IBM queues). For local-only inference, use signatures fromtraining_signatures.npyas a fallback.
Acknowledgments
- IBM Quantum for Open Plan access to
ibm_kingston(Heron r2) - Qwen team for the Qwen3.5-4B base model
- RIKEN + IBM for the Fugaku-Heron QCSC paper that inspired this small-scale counterpart
Citation
@misc{shushman2026hypnosq1,
title = {Hypnos-Q1: Architecturally Quantum-Resonance-Bonded Language Model},
author = {Shushman, Mykhailo},
year = {2026},
institution = {Merlin Research},
note = {IBM Quantum jobs d853tcvtjchs73bqs890 (training corpus) and
d85590mgbeec73aooreg (validation), backend ibm\_kingston (Heron r2)},
url = {https://huggingface.co/squ11z1/Hypnos-Q1}
}
First entry in the Hypnos Q-series. More to come.
- Downloads last month
- 280



# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="squ11z1/Hypnos-Q1") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)