Instructions to use squ11z1/Hypnos-Q1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use squ11z1/Hypnos-Q1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="squ11z1/Hypnos-Q1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("squ11z1/Hypnos-Q1")
model = AutoModelForMultimodalLM.from_pretrained("squ11z1/Hypnos-Q1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use squ11z1/Hypnos-Q1 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="squ11z1/Hypnos-Q1",
	filename="Hypnos-Q1.F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use squ11z1/Hypnos-Q1 with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf squ11z1/Hypnos-Q1:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf squ11z1/Hypnos-Q1:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf squ11z1/Hypnos-Q1:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf squ11z1/Hypnos-Q1:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf squ11z1/Hypnos-Q1:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf squ11z1/Hypnos-Q1:Q4_K_M

Use Docker

docker model run hf.co/squ11z1/Hypnos-Q1:Q4_K_M

LM Studio
Jan

vLLM

How to use squ11z1/Hypnos-Q1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "squ11z1/Hypnos-Q1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "squ11z1/Hypnos-Q1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/squ11z1/Hypnos-Q1:Q4_K_M

SGLang

How to use squ11z1/Hypnos-Q1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "squ11z1/Hypnos-Q1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "squ11z1/Hypnos-Q1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "squ11z1/Hypnos-Q1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "squ11z1/Hypnos-Q1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use squ11z1/Hypnos-Q1 with Ollama:
```
ollama run hf.co/squ11z1/Hypnos-Q1:Q4_K_M
```

Unsloth Studio

How to use squ11z1/Hypnos-Q1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for squ11z1/Hypnos-Q1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for squ11z1/Hypnos-Q1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for squ11z1/Hypnos-Q1 to start chatting

How to use squ11z1/Hypnos-Q1 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf squ11z1/Hypnos-Q1:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "squ11z1/Hypnos-Q1:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use squ11z1/Hypnos-Q1 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf squ11z1/Hypnos-Q1:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default squ11z1/Hypnos-Q1:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use squ11z1/Hypnos-Q1 with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf squ11z1/Hypnos-Q1:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "squ11z1/Hypnos-Q1:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use squ11z1/Hypnos-Q1 with Docker Model Runner:
```
docker model run hf.co/squ11z1/Hypnos-Q1:Q4_K_M
```

Lemonade

How to use squ11z1/Hypnos-Q1 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull squ11z1/Hypnos-Q1:Q4_K_M

Run and chat with the model

lemonade run user.Hypnos-Q1-Q4_K_M

List all available models

lemonade list

Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

💡 Check out Merlin-Agent! — A quantum-classical 9B coding agent with IBM Heron-baked weights.

Hypnos-Q1

by squ11z1 · Merlin Research

What is this?

Hypnos-Q1 is a 4B parameter reasoning model with one unusual property: a part of its forward pass is physically tied to a specific quantum computer at IBM. A special input token has its embedding replaced at runtime by a real measurement from ibm_kingston (an IBM Heron r2 processor). Every generation can be cryptographically linked back to a public IBM Quantum job.

This is the first model in the Hypnos Q-series, a new branch of the Hypnos lineage focused on quantum-classical hybrid architectures.

It is based on Qwen/Qwen3.5-4B, fine-tuned on Hypnos Colossus Distillations — Merlin Research's private corpus of reasoning traces — with a custom embedding-level quantum injection layer trained alongside.

What's new about it?

There are thousands of fine-tuned LLMs on HuggingFace. Hypnos-Q1 is different in three concrete ways:

1. Real hardware bonding. Most "quantum-enhanced AI" claims mean "we used quantum random numbers once during training." Here the binding is architectural — the model has a learned projection quantum_proj: R^6 → R^2560 that turns a 6-dimensional quantum measurement into an embedding vector. This projection is part of the model's weights (quantum_proj.pt). Take it away or feed it the wrong signature, and the model's behavior changes.

2. Verifiable provenance. Two IBM Quantum job IDs are embedded in the attestation file:

Training corpus: d853tcvtjchs73bqs890
Live validation: d85590mgbeec73aooreg

Anyone can look these up in IBM's public job index. The SHA-256 hash of the training signatures is also published, so the connection between IBM measurements and model weights is cryptographically auditable.

3. Built on accessible infrastructure. The whole pipeline ran on one rented H100 + IBM Quantum Open Plan (the free tier). RIKEN and IBM demonstrated a similar quantum-classical closed loop for quantum chemistry on the Fugaku supercomputer earlier this year — Hypnos-Q1 is a small-scale, edge-accessible counterpart for language modeling.

Resonance Architecture

A special token <|quantum_sig|> in the model's input has its embedding replaced at runtime by a learned projection of a real quantum measurement from ibm_kingston (IBM Heron r2). Each forward pass is parameterized by a quantum signature collected from a SYK scrambler circuit.

Input: ...tokens... <|quantum_sig|> ...tokens...
                       ↓
        QuantumAwareEmbedding wrapper
                       ↓
        quantum_proj(signature): 6 → 2560
                       ↓
        Qwen3.5-4B transformer stack
                       ↓
                    Output

The 6-dimensional quantum signature comes from three OTOC (out-of-time-order correlator) values at SYK scrambler depths 1, 2, and 3, plus the three pairwise absolute differences. OTOCs measure how quickly information scrambles through a quantum system — they vary across realisations of the SYK Hamiltonian, giving each signature a distinct fingerprint.

Quantum Attestation

Field	Value
Backend	`ibm_kingston` (Heron r2)
Training corpus job	`d853tcvtjchs73bqs890`
Validation job	`d85590mgbeec73aooreg`
Corpus size	64 quantum signatures
Qubits	4
Shots per circuit	1024
Signatures SHA-256	`77097900d634c77fa0928d7766da49a113e8dddeb0e73b308d88b11437995409`
Collection time	136.12 seconds
Collection date (UTC)	2026-05-17T22:20:59Z

Full attestation: quantum_attestation.json.

How to verify

Look up the job IDs at IBM Quantum
Retrieve the measurement bitstrings
Concatenate, SHA-256, and compare to signatures_sha256
The first 3 of 64 signatures are stored in plaintext in the attestation for quick spot-checks

If all four match, the model is provably linked to those specific quantum computations.

Evaluation results

Hypnos-Q1 was evaluated on standard reasoning, knowledge, and document-parsing benchmarks. Eval results are also published as individual YAML records under .eval_results/ for leaderboard integration.

Benchmark	Score	Notes
GPQA Diamond	79.4	Graduate-level science questions
MMLU-Pro	81.1	Multi-task knowledge
ParseBench (Text Content)	89.8	Document parsing
ParseBench (Mean)	34.6	Across all categories
ParseBench (Text Formatting)	58.6	Formatting retention / slight gain
ParseBench (Layout)	18.8	Mild vision degradation
ParseBench (Table)	7.4	Mild degradation
ParseBench (Chart)	2.2	Mild degradation
ScreenSpot-Pro (Overall)	58.4	GUI grounding

For context, this places Hypnos-Q1 above its Qwen3.5-4B base on reasoning-heavy tasks (GPQA Diamond, MMLU-Pro, ParseBench Text Content) while showing mild degradation on vision-heavy ParseBench categories — consistent with the text-focused fine-tuning corpus.

On the Artificial Analysis Intelligence Index, the Qwen3.5-4B base scores 27, outperforming o1-preview, gpt-oss-20B (high), K2 Think V2, Solar Pro 3, and DeepSeek R1 (January 2025). Hypnos-Q1 inherits this strong reasoning foundation.

Training

Field	Value
Base model	`Qwen/Qwen3.5-4B` (qwen3_5 architecture, 4.66B params)
Training data	Hypnos Colossus Distillations (private, Merlin Research)
Training samples	50,000
Method	Full SFT + embedding-level quantum injection
Precision	bf16
Hardware	1× H100 80GB
Max sequence length	1024
Effective batch size	16 (per_device=4 × grad_accum=4)
Epochs	1
Optimizer	AdamW (fused)
Learning rate	1.5e-5, cosine schedule
Warmup ratio	0.03
Weight decay	0.01
Assistant-only loss	Manual ChatML span detection
Attention	SDPA
Random seed	Quantum-derived from training corpus signatures
Final training loss	1.41
Training time	65.12 minutes

Hypnos Series

Model	Base	Distinguishing feature
Hypnos-i1-8B	Llama-3 8B	General reasoning
Hypnos-i2-32B	Qwen3-32B	Quantum-regularized training
Hypnos-Colossus-1T	Kimi-K2	Scale + entropy injection (data source for Q-series distillations)
Hypnos-Q1	Qwen3.5-4B	Q-series · architectural quantum bonding

The Q-series is the first Hypnos branch where quantum hardware participates in the model's forward pass, not just its training metadata.

How to use

Hypnos-Q1 can be loaded like a standard Qwen3.5-4B model, but to use it as intended you need to:

Reattach the QuantumAwareEmbedding wrapper around the input embeddings
Load quantum_proj.pt weights into the wrapper
Provide a quantum signature (either from a fresh IBM Quantum job or from training_signatures.npy) before each generation

import torch
import torch.nn as nn
import numpy as np
from transformers import AutoProcessor, AutoModelForImageTextToText

MODEL_ID = "squ11z1/Hypnos-Q1"

# 1. Load processor & model
processor = AutoProcessor.from_pretrained(MODEL_ID)
tokenizer = processor.tokenizer
model = AutoModelForImageTextToText.from_pretrained(
    MODEL_ID,
    dtype=torch.bfloat16,
    device_map="auto",
)
QUANTUM_TOKEN_ID = tokenizer.convert_tokens_to_ids("<|quantum_sig|>")
HIDDEN_SIZE = model.get_input_embeddings().embedding_dim  # 2560
QUANTUM_SIG_DIM = 6

# 2. Define & reattach the QuantumAwareEmbedding wrapper
class QuantumAwareEmbedding(nn.Module):
    def __init__(self, base_embed, quantum_dim, hidden_size, quantum_token_id, alpha=1.0):
        super().__init__()
        self.base_embed = base_embed
        self.quantum_token_id = quantum_token_id
        self.alpha = alpha
        self.quantum_proj = nn.Linear(quantum_dim, hidden_size, bias=True, dtype=torch.bfloat16)
        self._current_sig = None

    def set_quantum_signature(self, sig):
        self._current_sig = sig

    @property
    def weight(self): return self.base_embed.weight
    @property
    def num_embeddings(self): return self.base_embed.num_embeddings
    @property
    def embedding_dim(self): return self.base_embed.embedding_dim

    def forward(self, input_ids):
        embeds = self.base_embed(input_ids)
        if self._current_sig is None:
            return embeds
        mask = (input_ids == self.quantum_token_id)
        if not mask.any():
            return embeds
        sig = self._current_sig.to(embeds.dtype).to(embeds.device)
        q_embed = self.quantum_proj(sig)
        mask_3d = mask.unsqueeze(-1).to(embeds.dtype)
        q_embed_3d = q_embed.unsqueeze(1) * self.alpha
        return embeds * (1 - mask_3d) + q_embed_3d * mask_3d

base_embed = model.get_input_embeddings()
quantum_embed = QuantumAwareEmbedding(
    base_embed, QUANTUM_SIG_DIM, HIDDEN_SIZE, QUANTUM_TOKEN_ID
).to(base_embed.weight.device, dtype=torch.bfloat16)
quantum_embed.quantum_proj.load_state_dict(
    torch.load("quantum_proj.pt", map_location=base_embed.weight.device)
)
model.set_input_embeddings(quantum_embed)

# 3. Use a training signature (or fetch a fresh one from ibm_kingston)
training_signatures = np.load("training_signatures.npy")
sig = torch.tensor(training_signatures[0:1], dtype=torch.bfloat16, device=model.device)
quantum_embed.set_quantum_signature(sig)

# 4. Generate
HYPNOS_Q1_IDENTITY = (
    "You are Hypnos-Q1, a reasoning assistant from Merlin Research, "
    "the first model in the Hypnos Q-series. Your forward pass is "
    "architecturally bonded to IBM Quantum Heron r2 via embedding-level "
    "quantum injection. This conversation operates under quantum "
    "signature <|quantum_sig|>. You reason step-by-step in <think>...</think> "
    "blocks before answering."
)
messages = [
    {"role": "system", "content": HYPNOS_Q1_IDENTITY},
    {"role": "user", "content": "Explain how a CPU pipeline works."},
]
inputs = tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
with torch.no_grad():
    out = model.generate(inputs, max_new_tokens=500, do_sample=True, temperature=0.7, top_p=0.9)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=False))

For fresh quantum signatures, submit a 3-circuit batch (SYK scrambler at depths 1/2/3, 4 qubits) to ibm_kingston via Qiskit Runtime and compute the 6-dimensional signature the same way as the training corpus. See quantum_attestation.json for exact parameters.

Intended use

Step-by-step reasoning tasks (math, science, code, analysis)
Multi-turn problem solving with explicit <think>...</think> traces
Research base for further Q-series experiments
Demonstrations of verifiable physical provenance for AI artifacts
Studies of how runtime hardware-bonding affects LLM behavior

Not intended for: safety-critical decisions without human oversight, autonomous offensive operations, or unverified factual claims in regulated domains.

Honest limitations

Provenance is not capability. Quantum bonding does not make the model smarter. It is an architectural and identity feature.
Single-point injection. Only one token's embedding is replaced. Multi-layer injection is left for Hypnos-Q2.
Fallback degrades silently. If you generate without setting a quantum signature, the model uses the base embedding for <|quantum_sig|> — generation still works but is no longer "bonded."
Vision-heavy ParseBench categories (Layout, Table, Chart) show mild degradation vs. the Qwen3.5-4B base. Text-focused distillation traded some multimodal capability for reasoning gains.
Inference latency for "true bond" mode. Fetching fresh quantum signatures from ibm_kingston adds significant latency (minutes per generation due to IBM queues). For local-only inference, use signatures from training_signatures.npy as a fallback.

Acknowledgments

IBM Quantum for Open Plan access to ibm_kingston (Heron r2)
Qwen team for the Qwen3.5-4B base model
RIKEN + IBM for the Fugaku-Heron QCSC paper that inspired this small-scale counterpart

Citation

@misc{shushman2026hypnosq1,
  title         = {Hypnos-Q1: Architecturally Quantum-Resonance-Bonded Language Model},
  author        = {Shushman, Mykhailo},
  year          = {2026},
  institution   = {Merlin Research},
  note          = {IBM Quantum jobs d853tcvtjchs73bqs890 (training corpus) and 
                   d85590mgbeec73aooreg (validation), backend ibm\_kingston (Heron r2)},
  url           = {https://huggingface.co/squ11z1/Hypnos-Q1}
}

First entry in the Hypnos Q-series. More to come.