Instructions to use squ11z1/Hypnos-Q1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use squ11z1/Hypnos-Q1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="squ11z1/Hypnos-Q1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("squ11z1/Hypnos-Q1")
model = AutoModelForImageTextToText.from_pretrained("squ11z1/Hypnos-Q1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use squ11z1/Hypnos-Q1 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="squ11z1/Hypnos-Q1",
	filename="Hypnos-Q1.F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use squ11z1/Hypnos-Q1 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf squ11z1/Hypnos-Q1:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf squ11z1/Hypnos-Q1:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf squ11z1/Hypnos-Q1:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf squ11z1/Hypnos-Q1:Q4_K_M

Use Docker

docker model run hf.co/squ11z1/Hypnos-Q1:Q4_K_M

LM Studio
Jan

vLLM

How to use squ11z1/Hypnos-Q1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "squ11z1/Hypnos-Q1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "squ11z1/Hypnos-Q1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/squ11z1/Hypnos-Q1:Q4_K_M

SGLang

How to use squ11z1/Hypnos-Q1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "squ11z1/Hypnos-Q1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "squ11z1/Hypnos-Q1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "squ11z1/Hypnos-Q1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "squ11z1/Hypnos-Q1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use squ11z1/Hypnos-Q1 with Ollama:
```
ollama run hf.co/squ11z1/Hypnos-Q1:Q4_K_M
```

Unsloth Studio new

How to use squ11z1/Hypnos-Q1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for squ11z1/Hypnos-Q1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for squ11z1/Hypnos-Q1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for squ11z1/Hypnos-Q1 to start chatting

Pi new

How to use squ11z1/Hypnos-Q1 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "squ11z1/Hypnos-Q1:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use squ11z1/Hypnos-Q1 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default squ11z1/Hypnos-Q1:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use squ11z1/Hypnos-Q1 with Docker Model Runner:
```
docker model run hf.co/squ11z1/Hypnos-Q1:Q4_K_M
```

Lemonade

How to use squ11z1/Hypnos-Q1 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull squ11z1/Hypnos-Q1:Q4_K_M

Run and chat with the model

lemonade run user.Hypnos-Q1-Q4_K_M

List all available models

lemonade list

Hypnos-Q1 / README.md

squ11z1

Update README.md

de2a295 verified 1 day ago

preview code

raw

history blame contribute delete

13.8 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	tags:
	- qwen3_5
	- reasoning
	- hypnos
	- quantum-resonance
	- ibm-quantum
	- merlin-research
	base_model: Qwen/Qwen3.5-4B
	base_model_relation: finetune
	pipeline_tag: text-generation
	---

	# Hypnos-Q1

	<p align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/67329d3f69fded92d56ab41a/fAb2TyX7x4dBn15A1CmNh.png" alt="Hypnos-Q1" width="80%" />

	by squ11z1 · Merlin Research
	[![Socket Badge](https://badge.socket.dev/huggingface/package/squ11z1/hypnos-q1?version=7722cce2e74c9deb9eaca9e66de4c304946708bc)](https://badge.socket.dev/huggingface/package/squ11z1/hypnos-q1?version=7722cce2e74c9deb9eaca9e66de4c304946708bc)

	</p>

	---

	## What is this?

	![q1 bench2](https://cdn-uploads.huggingface.co/production/uploads/67329d3f69fded92d56ab41a/3wxT7y9nkUhc8XNDbQRjs.png)

	Hypnos-Q1 is a 4B parameter reasoning model with one unusual property: a part of its forward pass is physically tied to a specific quantum computer at IBM. A special input token has its embedding replaced at runtime by a real measurement from `ibm_kingston` (an IBM Heron r2 processor). Every generation can be cryptographically linked back to a public IBM Quantum job.

	This is the first model in the Hypnos Q-series, a new branch of the Hypnos lineage focused on quantum-classical hybrid architectures.

	It is based on `Qwen/Qwen3.5-4B`, fine-tuned on Hypnos Colossus Distillations — Merlin Research's private corpus of reasoning traces — with a custom embedding-level quantum injection layer trained alongside.

	---

	## What's new about it?

	There are thousands of fine-tuned LLMs on HuggingFace. Hypnos-Q1 is different in three concrete ways:

	1. Real hardware bonding. Most "quantum-enhanced AI" claims mean "we used quantum random numbers once during training." Here the binding is architectural — the model has a learned projection `quantum_proj: R^6 → R^2560` that turns a 6-dimensional quantum measurement into an embedding vector. This projection is part of the model's weights (`quantum_proj.pt`). Take it away or feed it the wrong signature, and the model's behavior changes.

	2. Verifiable provenance. Two IBM Quantum job IDs are embedded in the attestation file:
	- Training corpus: `d853tcvtjchs73bqs890`
	- Live validation: `d85590mgbeec73aooreg`

	Anyone can look these up in IBM's public job index. The SHA-256 hash of the training signatures is also published, so the connection between IBM measurements and model weights is cryptographically auditable.

	![syk1](https://cdn-uploads.huggingface.co/production/uploads/67329d3f69fded92d56ab41a/tV4T2KmjH7HGiMu3I5zo5.png)

	3. Built on accessible infrastructure. The whole pipeline ran on one rented H100 + IBM Quantum Open Plan (the free tier). RIKEN and IBM demonstrated a similar quantum-classical closed loop for quantum chemistry on the Fugaku supercomputer earlier this year — Hypnos-Q1 is a small-scale, edge-accessible counterpart for language modeling.

	---

	## Resonance Architecture

	A special token `<\|quantum_sig\|>` in the model's input has its embedding replaced at runtime by a learned projection of a real quantum measurement from `ibm_kingston` (IBM Heron r2). Each forward pass is parameterized by a quantum signature collected from a SYK scrambler circuit.

	```
	Input: ...tokens... <\|quantum_sig\|> ...tokens...
	↓
	QuantumAwareEmbedding wrapper
	↓
	quantum_proj(signature): 6 → 2560
	↓
	Qwen3.5-4B transformer stack
	↓
	Output
	```

	The 6-dimensional quantum signature comes from three OTOC (out-of-time-order correlator) values at SYK scrambler depths 1, 2, and 3, plus the three pairwise absolute differences. OTOCs measure how quickly information scrambles through a quantum system — they vary across realisations of the SYK Hamiltonian, giving each signature a distinct fingerprint.

	---

	## Quantum Attestation

	\| Field \| Value \|
	\|---\|---\|
	\| Backend \| `ibm_kingston` (Heron r2) \|
	\| Training corpus job \| `d853tcvtjchs73bqs890` \|
	\| Validation job \| `d85590mgbeec73aooreg` \|
	\| Corpus size \| 64 quantum signatures \|
	\| Qubits \| 4 \|
	\| Shots per circuit \| 1024 \|
	\| Signatures SHA-256 \| `77097900d634c77fa0928d7766da49a113e8dddeb0e73b308d88b11437995409` \|
	\| Collection time \| 136.12 seconds \|
	\| Collection date (UTC) \| 2026-05-17T22:20:59Z \|

	![syk2](https://cdn-uploads.huggingface.co/production/uploads/67329d3f69fded92d56ab41a/08f4kHhk237QQoaVfMv5V.png)

	Full attestation: [`quantum_attestation.json`](./quantum_attestation.json).

	### How to verify

	1. Look up the job IDs at [IBM Quantum](https://quantum.cloud.ibm.com)
	2. Retrieve the measurement bitstrings
	3. Concatenate, SHA-256, and compare to `signatures_sha256`
	4. The first 3 of 64 signatures are stored in plaintext in the attestation for quick spot-checks

	If all four match, the model is provably linked to those specific quantum computations.

	---

	## Evaluation results

	Hypnos-Q1 was evaluated on standard reasoning, knowledge, and document-parsing benchmarks. Eval results are also published as individual YAML records under [`.eval_results/`](./.eval_results) for leaderboard integration.

	\| Benchmark \| Score \| Notes \|
	\|---\|---\|---\|
	\| GPQA Diamond \| 79.4 \| Graduate-level science questions \|
	\| MMLU-Pro \| 81.1 \| Multi-task knowledge \|
	\| ParseBench (Text Content) \| 89.8 \| Document parsing \|
	\| ParseBench (Mean) \| 34.6 \| Across all categories \|
	\| ParseBench (Text Formatting) \| 58.6 \| Formatting retention / slight gain \|
	\| ParseBench (Layout) \| 18.8 \| Mild vision degradation \|
	\| ParseBench (Table) \| 7.4 \| Mild degradation \|
	\| ParseBench (Chart) \| 2.2 \| Mild degradation \|
	\| ScreenSpot-Pro (Overall) \| 58.4 \| GUI grounding \|

	For context, this places Hypnos-Q1 above its `Qwen3.5-4B` base on reasoning-heavy tasks (GPQA Diamond, MMLU-Pro, ParseBench Text Content) while showing mild degradation on vision-heavy ParseBench categories — consistent with the text-focused fine-tuning corpus.

	On the Artificial Analysis Intelligence Index, the Qwen3.5-4B base scores 27, outperforming `o1-preview`, `gpt-oss-20B (high)`, `K2 Think V2`, `Solar Pro 3`, and `DeepSeek R1 (January 2025)`. Hypnos-Q1 inherits this strong reasoning foundation.

	---

	## Training

	\| Field \| Value \|
	\|---\|---\|
	\| Base model \| `Qwen/Qwen3.5-4B` (qwen3_5 architecture, 4.66B params) \|
	\| Training data \| Hypnos Colossus Distillations (private, Merlin Research) \|
	\| Training samples \| 50,000 \|
	\| Method \| Full SFT + embedding-level quantum injection \|
	\| Precision \| bf16 \|
	\| Hardware \| 1× H100 80GB \|
	\| Max sequence length \| 1024 \|
	\| Effective batch size \| 16 (per_device=4 × grad_accum=4) \|
	\| Epochs \| 1 \|
	\| Optimizer \| AdamW (fused) \|
	\| Learning rate \| 1.5e-5, cosine schedule \|
	\| Warmup ratio \| 0.03 \|
	\| Weight decay \| 0.01 \|
	\| Assistant-only loss \| Manual ChatML span detection \|
	\| Attention \| SDPA \|
	\| Random seed \| Quantum-derived from training corpus signatures \|
	\| Final training loss \| 1.41 \|
	\| Training time \| 65.12 minutes \|

	---

	## Hypnos Series

	\| Model \| Base \| Distinguishing feature \|
	\|---\|---\|---\|
	\| Hypnos-i1-8B \| Llama-3 8B \| General reasoning \|
	\| Hypnos-i2-32B \| Qwen3-32B \| Quantum-regularized training \|
	\| Hypnos-Colossus-1T \| Kimi-K2 \| Scale + entropy injection (data source for Q-series distillations) \|
	\| Hypnos-Q1 \| Qwen3.5-4B \| Q-series · architectural quantum bonding \|

	The Q-series is the first Hypnos branch where quantum hardware participates in the model's forward pass, not just its training metadata.

	---

	## How to use

	Hypnos-Q1 can be loaded like a standard Qwen3.5-4B model, but to use it as intended you need to:

	1. Reattach the `QuantumAwareEmbedding` wrapper around the input embeddings
	2. Load `quantum_proj.pt` weights into the wrapper
	3. Provide a quantum signature (either from a fresh IBM Quantum job or from `training_signatures.npy`) before each generation

	```python
	import torch
	import torch.nn as nn
	import numpy as np
	from transformers import AutoProcessor, AutoModelForImageTextToText

	MODEL_ID = "squ11z1/Hypnos-Q1"

	# 1. Load processor & model
	processor = AutoProcessor.from_pretrained(MODEL_ID)
	tokenizer = processor.tokenizer
	model = AutoModelForImageTextToText.from_pretrained(
	MODEL_ID,
	dtype=torch.bfloat16,
	device_map="auto",
	)
	QUANTUM_TOKEN_ID = tokenizer.convert_tokens_to_ids("<\|quantum_sig\|>")
	HIDDEN_SIZE = model.get_input_embeddings().embedding_dim # 2560
	QUANTUM_SIG_DIM = 6

	# 2. Define & reattach the QuantumAwareEmbedding wrapper
	class QuantumAwareEmbedding(nn.Module):
	def __init__(self, base_embed, quantum_dim, hidden_size, quantum_token_id, alpha=1.0):
	super().__init__()
	self.base_embed = base_embed
	self.quantum_token_id = quantum_token_id
	self.alpha = alpha
	self.quantum_proj = nn.Linear(quantum_dim, hidden_size, bias=True, dtype=torch.bfloat16)
	self._current_sig = None

	def set_quantum_signature(self, sig):
	self._current_sig = sig

	@property
	def weight(self): return self.base_embed.weight
	@property
	def num_embeddings(self): return self.base_embed.num_embeddings
	@property
	def embedding_dim(self): return self.base_embed.embedding_dim

	def forward(self, input_ids):
	embeds = self.base_embed(input_ids)
	if self._current_sig is None:
	return embeds
	mask = (input_ids == self.quantum_token_id)
	if not mask.any():
	return embeds
	sig = self._current_sig.to(embeds.dtype).to(embeds.device)
	q_embed = self.quantum_proj(sig)
	mask_3d = mask.unsqueeze(-1).to(embeds.dtype)
	q_embed_3d = q_embed.unsqueeze(1) * self.alpha
	return embeds * (1 - mask_3d) + q_embed_3d * mask_3d

	base_embed = model.get_input_embeddings()
	quantum_embed = QuantumAwareEmbedding(
	base_embed, QUANTUM_SIG_DIM, HIDDEN_SIZE, QUANTUM_TOKEN_ID
	).to(base_embed.weight.device, dtype=torch.bfloat16)
	quantum_embed.quantum_proj.load_state_dict(
	torch.load("quantum_proj.pt", map_location=base_embed.weight.device)
	)
	model.set_input_embeddings(quantum_embed)

	# 3. Use a training signature (or fetch a fresh one from ibm_kingston)
	training_signatures = np.load("training_signatures.npy")
	sig = torch.tensor(training_signatures[0:1], dtype=torch.bfloat16, device=model.device)
	quantum_embed.set_quantum_signature(sig)

	# 4. Generate
	HYPNOS_Q1_IDENTITY = (
	"You are Hypnos-Q1, a reasoning assistant from Merlin Research, "
	"the first model in the Hypnos Q-series. Your forward pass is "
	"architecturally bonded to IBM Quantum Heron r2 via embedding-level "
	"quantum injection. This conversation operates under quantum "
	"signature <\|quantum_sig\|>. You reason step-by-step in <think>...</think> "
	"blocks before answering."
	)
	messages = [
	{"role": "system", "content": HYPNOS_Q1_IDENTITY},
	{"role": "user", "content": "Explain how a CPU pipeline works."},
	]
	inputs = tokenizer.apply_chat_template(
	messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
	).to(model.device)
	with torch.no_grad():
	out = model.generate(inputs, max_new_tokens=500, do_sample=True, temperature=0.7, top_p=0.9)
	print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=False))
	```

	For fresh quantum signatures, submit a 3-circuit batch (SYK scrambler at depths 1/2/3, 4 qubits) to `ibm_kingston` via Qiskit Runtime and compute the 6-dimensional signature the same way as the training corpus. See `quantum_attestation.json` for exact parameters.

	---

	## Intended use

	- Step-by-step reasoning tasks (math, science, code, analysis)
	- Multi-turn problem solving with explicit `<think>...</think>` traces
	- Research base for further Q-series experiments
	- Demonstrations of verifiable physical provenance for AI artifacts
	- Studies of how runtime hardware-bonding affects LLM behavior

	Not intended for: safety-critical decisions without human oversight, autonomous offensive operations, or unverified factual claims in regulated domains.

	---

	## Honest limitations

	- Provenance is not capability. Quantum bonding does not make the model smarter. It is an architectural and identity feature.
	- Single-point injection. Only one token's embedding is replaced. Multi-layer injection is left for Hypnos-Q2.
	- Fallback degrades silently. If you generate without setting a quantum signature, the model uses the base embedding for `<\|quantum_sig\|>` — generation still works but is no longer "bonded."
	- Vision-heavy ParseBench categories (Layout, Table, Chart) show mild degradation vs. the Qwen3.5-4B base. Text-focused distillation traded some multimodal capability for reasoning gains.
	- Inference latency for "true bond" mode. Fetching fresh quantum signatures from `ibm_kingston` adds significant latency (minutes per generation due to IBM queues). For local-only inference, use signatures from `training_signatures.npy` as a fallback.

	---

	## Acknowledgments

	- IBM Quantum for Open Plan access to `ibm_kingston` (Heron r2)
	- Qwen team for the Qwen3.5-4B base model
	- RIKEN + IBM for the Fugaku-Heron QCSC paper that inspired this small-scale counterpart

	---

	## Citation

	```bibtex
	@misc{shushman2026hypnosq1,
	title = {Hypnos-Q1: Architecturally Quantum-Resonance-Bonded Language Model},
	author = {Shushman, Mykhailo},
	year = {2026},
	institution = {Merlin Research},
	note = {IBM Quantum jobs d853tcvtjchs73bqs890 (training corpus) and
	d85590mgbeec73aooreg (validation), backend ibm\_kingston (Heron r2)},
	url = {https://huggingface.co/squ11z1/Hypnos-Q1}
	}
	```

	---

	First entry in the Hypnos Q-series. More to come.