Text Generation
Transformers
Safetensors
GGUF
English
qwen3_5
image-text-to-text
reasoning
hypnos
quantum-resonance
ibm-quantum
merlin-research
conversational
Eval Results
Instructions to use squ11z1/Hypnos-Q1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use squ11z1/Hypnos-Q1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="squ11z1/Hypnos-Q1") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("squ11z1/Hypnos-Q1") model = AutoModelForImageTextToText.from_pretrained("squ11z1/Hypnos-Q1") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use squ11z1/Hypnos-Q1 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="squ11z1/Hypnos-Q1", filename="Hypnos-Q1.F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use squ11z1/Hypnos-Q1 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M # Run inference directly in the terminal: llama-cli -hf squ11z1/Hypnos-Q1:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M # Run inference directly in the terminal: llama-cli -hf squ11z1/Hypnos-Q1:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf squ11z1/Hypnos-Q1:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf squ11z1/Hypnos-Q1:Q4_K_M
Use Docker
docker model run hf.co/squ11z1/Hypnos-Q1:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use squ11z1/Hypnos-Q1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "squ11z1/Hypnos-Q1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "squ11z1/Hypnos-Q1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/squ11z1/Hypnos-Q1:Q4_K_M
- SGLang
How to use squ11z1/Hypnos-Q1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "squ11z1/Hypnos-Q1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "squ11z1/Hypnos-Q1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "squ11z1/Hypnos-Q1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "squ11z1/Hypnos-Q1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use squ11z1/Hypnos-Q1 with Ollama:
ollama run hf.co/squ11z1/Hypnos-Q1:Q4_K_M
- Unsloth Studio new
How to use squ11z1/Hypnos-Q1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for squ11z1/Hypnos-Q1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for squ11z1/Hypnos-Q1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for squ11z1/Hypnos-Q1 to start chatting
- Pi new
How to use squ11z1/Hypnos-Q1 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "squ11z1/Hypnos-Q1:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use squ11z1/Hypnos-Q1 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf squ11z1/Hypnos-Q1:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default squ11z1/Hypnos-Q1:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use squ11z1/Hypnos-Q1 with Docker Model Runner:
docker model run hf.co/squ11z1/Hypnos-Q1:Q4_K_M
- Lemonade
How to use squ11z1/Hypnos-Q1 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull squ11z1/Hypnos-Q1:Q4_K_M
Run and chat with the model
lemonade run user.Hypnos-Q1-Q4_K_M
List all available models
lemonade list
| license: apache-2.0 | |
| language: | |
| - en | |
| library_name: transformers | |
| tags: | |
| - qwen3_5 | |
| - reasoning | |
| - hypnos | |
| - quantum-resonance | |
| - ibm-quantum | |
| - merlin-research | |
| base_model: Qwen/Qwen3.5-4B | |
| base_model_relation: finetune | |
| pipeline_tag: text-generation | |
| # Hypnos-Q1 | |
| <p align="center"> | |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/67329d3f69fded92d56ab41a/fAb2TyX7x4dBn15A1CmNh.png" alt="Hypnos-Q1" width="80%" /> | |
| *by squ11z1 · Merlin Research* | |
| [](https://badge.socket.dev/huggingface/package/squ11z1/hypnos-q1?version=7722cce2e74c9deb9eaca9e66de4c304946708bc) | |
| </p> | |
| --- | |
| ## What is this? | |
|  | |
| Hypnos-Q1 is a 4B parameter reasoning model with one unusual property: a part of its forward pass is **physically tied to a specific quantum computer** at IBM. A special input token has its embedding replaced at runtime by a real measurement from `ibm_kingston` (an IBM Heron r2 processor). Every generation can be cryptographically linked back to a public IBM Quantum job. | |
| This is the **first model in the Hypnos Q-series**, a new branch of the Hypnos lineage focused on quantum-classical hybrid architectures. | |
| It is based on `Qwen/Qwen3.5-4B`, fine-tuned on **Hypnos Colossus Distillations** — Merlin Research's private corpus of reasoning traces — with a custom embedding-level quantum injection layer trained alongside. | |
| --- | |
| ## What's new about it? | |
| There are thousands of fine-tuned LLMs on HuggingFace. Hypnos-Q1 is different in three concrete ways: | |
| **1. Real hardware bonding.** Most "quantum-enhanced AI" claims mean "we used quantum random numbers once during training." Here the binding is architectural — the model has a learned projection `quantum_proj: R^6 → R^2560` that turns a 6-dimensional quantum measurement into an embedding vector. This projection is part of the model's weights (`quantum_proj.pt`). Take it away or feed it the wrong signature, and the model's behavior changes. | |
| **2. Verifiable provenance.** Two IBM Quantum job IDs are embedded in the attestation file: | |
| - Training corpus: `d853tcvtjchs73bqs890` | |
| - Live validation: `d85590mgbeec73aooreg` | |
| Anyone can look these up in IBM's public job index. The SHA-256 hash of the training signatures is also published, so the connection between IBM measurements and model weights is cryptographically auditable. | |
|  | |
| **3. Built on accessible infrastructure.** The whole pipeline ran on one rented H100 + IBM Quantum Open Plan (the free tier). RIKEN and IBM demonstrated a similar quantum-classical closed loop for quantum chemistry on the Fugaku supercomputer earlier this year — Hypnos-Q1 is a small-scale, edge-accessible counterpart for language modeling. | |
| --- | |
| ## Resonance Architecture | |
| A special token `<|quantum_sig|>` in the model's input has its embedding replaced at runtime by a learned projection of a real quantum measurement from `ibm_kingston` (IBM Heron r2). Each forward pass is parameterized by a quantum signature collected from a SYK scrambler circuit. | |
| ``` | |
| Input: ...tokens... <|quantum_sig|> ...tokens... | |
| ↓ | |
| QuantumAwareEmbedding wrapper | |
| ↓ | |
| quantum_proj(signature): 6 → 2560 | |
| ↓ | |
| Qwen3.5-4B transformer stack | |
| ↓ | |
| Output | |
| ``` | |
| The 6-dimensional quantum signature comes from three OTOC (out-of-time-order correlator) values at SYK scrambler depths 1, 2, and 3, plus the three pairwise absolute differences. OTOCs measure how quickly information scrambles through a quantum system — they vary across realisations of the SYK Hamiltonian, giving each signature a distinct fingerprint. | |
| --- | |
| ## Quantum Attestation | |
| | Field | Value | | |
| |---|---| | |
| | Backend | `ibm_kingston` (Heron r2) | | |
| | Training corpus job | `d853tcvtjchs73bqs890` | | |
| | Validation job | `d85590mgbeec73aooreg` | | |
| | Corpus size | 64 quantum signatures | | |
| | Qubits | 4 | | |
| | Shots per circuit | 1024 | | |
| | Signatures SHA-256 | `77097900d634c77fa0928d7766da49a113e8dddeb0e73b308d88b11437995409` | | |
| | Collection time | 136.12 seconds | | |
| | Collection date (UTC) | 2026-05-17T22:20:59Z | | |
|  | |
| Full attestation: [`quantum_attestation.json`](./quantum_attestation.json). | |
| ### How to verify | |
| 1. Look up the job IDs at [IBM Quantum](https://quantum.cloud.ibm.com) | |
| 2. Retrieve the measurement bitstrings | |
| 3. Concatenate, SHA-256, and compare to `signatures_sha256` | |
| 4. The first 3 of 64 signatures are stored in plaintext in the attestation for quick spot-checks | |
| If all four match, the model is provably linked to those specific quantum computations. | |
| --- | |
| ## Evaluation results | |
| Hypnos-Q1 was evaluated on standard reasoning, knowledge, and document-parsing benchmarks. Eval results are also published as individual YAML records under [`.eval_results/`](./.eval_results) for leaderboard integration. | |
| | Benchmark | Score | Notes | | |
| |---|---|---| | |
| | GPQA Diamond | **79.4** | Graduate-level science questions | | |
| | MMLU-Pro | **81.1** | Multi-task knowledge | | |
| | ParseBench (Text Content) | **89.8** | Document parsing | | |
| | ParseBench (Mean) | 34.6 | Across all categories | | |
| | ParseBench (Text Formatting) | 58.6 | Formatting retention / slight gain | | |
| | ParseBench (Layout) | 18.8 | Mild vision degradation | | |
| | ParseBench (Table) | 7.4 | Mild degradation | | |
| | ParseBench (Chart) | 2.2 | Mild degradation | | |
| | ScreenSpot-Pro (Overall) | 58.4 | GUI grounding | | |
| For context, this places Hypnos-Q1 above its `Qwen3.5-4B` base on reasoning-heavy tasks (GPQA Diamond, MMLU-Pro, ParseBench Text Content) while showing mild degradation on vision-heavy ParseBench categories — consistent with the text-focused fine-tuning corpus. | |
| On the **Artificial Analysis Intelligence Index**, the Qwen3.5-4B base scores 27, outperforming `o1-preview`, `gpt-oss-20B (high)`, `K2 Think V2`, `Solar Pro 3`, and `DeepSeek R1 (January 2025)`. Hypnos-Q1 inherits this strong reasoning foundation. | |
| --- | |
| ## Training | |
| | Field | Value | | |
| |---|---| | |
| | Base model | `Qwen/Qwen3.5-4B` (qwen3_5 architecture, 4.66B params) | | |
| | Training data | **Hypnos Colossus Distillations** (private, Merlin Research) | | |
| | Training samples | 50,000 | | |
| | Method | Full SFT + embedding-level quantum injection | | |
| | Precision | bf16 | | |
| | Hardware | 1× H100 80GB | | |
| | Max sequence length | 1024 | | |
| | Effective batch size | 16 (per_device=4 × grad_accum=4) | | |
| | Epochs | 1 | | |
| | Optimizer | AdamW (fused) | | |
| | Learning rate | 1.5e-5, cosine schedule | | |
| | Warmup ratio | 0.03 | | |
| | Weight decay | 0.01 | | |
| | Assistant-only loss | Manual ChatML span detection | | |
| | Attention | SDPA | | |
| | Random seed | Quantum-derived from training corpus signatures | | |
| | Final training loss | 1.41 | | |
| | Training time | 65.12 minutes | | |
| --- | |
| ## Hypnos Series | |
| | Model | Base | Distinguishing feature | | |
| |---|---|---| | |
| | Hypnos-i1-8B | Llama-3 8B | General reasoning | | |
| | Hypnos-i2-32B | Qwen3-32B | Quantum-regularized training | | |
| | Hypnos-Colossus-1T | Kimi-K2 | Scale + entropy injection (data source for Q-series distillations) | | |
| | **Hypnos-Q1** | **Qwen3.5-4B** | **Q-series · architectural quantum bonding** | | |
| The Q-series is the first Hypnos branch where quantum hardware participates in the model's forward pass, not just its training metadata. | |
| --- | |
| ## How to use | |
| Hypnos-Q1 can be loaded like a standard Qwen3.5-4B model, but to use it as intended you need to: | |
| 1. Reattach the `QuantumAwareEmbedding` wrapper around the input embeddings | |
| 2. Load `quantum_proj.pt` weights into the wrapper | |
| 3. Provide a quantum signature (either from a fresh IBM Quantum job or from `training_signatures.npy`) before each generation | |
| ```python | |
| import torch | |
| import torch.nn as nn | |
| import numpy as np | |
| from transformers import AutoProcessor, AutoModelForImageTextToText | |
| MODEL_ID = "squ11z1/Hypnos-Q1" | |
| # 1. Load processor & model | |
| processor = AutoProcessor.from_pretrained(MODEL_ID) | |
| tokenizer = processor.tokenizer | |
| model = AutoModelForImageTextToText.from_pretrained( | |
| MODEL_ID, | |
| dtype=torch.bfloat16, | |
| device_map="auto", | |
| ) | |
| QUANTUM_TOKEN_ID = tokenizer.convert_tokens_to_ids("<|quantum_sig|>") | |
| HIDDEN_SIZE = model.get_input_embeddings().embedding_dim # 2560 | |
| QUANTUM_SIG_DIM = 6 | |
| # 2. Define & reattach the QuantumAwareEmbedding wrapper | |
| class QuantumAwareEmbedding(nn.Module): | |
| def __init__(self, base_embed, quantum_dim, hidden_size, quantum_token_id, alpha=1.0): | |
| super().__init__() | |
| self.base_embed = base_embed | |
| self.quantum_token_id = quantum_token_id | |
| self.alpha = alpha | |
| self.quantum_proj = nn.Linear(quantum_dim, hidden_size, bias=True, dtype=torch.bfloat16) | |
| self._current_sig = None | |
| def set_quantum_signature(self, sig): | |
| self._current_sig = sig | |
| @property | |
| def weight(self): return self.base_embed.weight | |
| @property | |
| def num_embeddings(self): return self.base_embed.num_embeddings | |
| @property | |
| def embedding_dim(self): return self.base_embed.embedding_dim | |
| def forward(self, input_ids): | |
| embeds = self.base_embed(input_ids) | |
| if self._current_sig is None: | |
| return embeds | |
| mask = (input_ids == self.quantum_token_id) | |
| if not mask.any(): | |
| return embeds | |
| sig = self._current_sig.to(embeds.dtype).to(embeds.device) | |
| q_embed = self.quantum_proj(sig) | |
| mask_3d = mask.unsqueeze(-1).to(embeds.dtype) | |
| q_embed_3d = q_embed.unsqueeze(1) * self.alpha | |
| return embeds * (1 - mask_3d) + q_embed_3d * mask_3d | |
| base_embed = model.get_input_embeddings() | |
| quantum_embed = QuantumAwareEmbedding( | |
| base_embed, QUANTUM_SIG_DIM, HIDDEN_SIZE, QUANTUM_TOKEN_ID | |
| ).to(base_embed.weight.device, dtype=torch.bfloat16) | |
| quantum_embed.quantum_proj.load_state_dict( | |
| torch.load("quantum_proj.pt", map_location=base_embed.weight.device) | |
| ) | |
| model.set_input_embeddings(quantum_embed) | |
| # 3. Use a training signature (or fetch a fresh one from ibm_kingston) | |
| training_signatures = np.load("training_signatures.npy") | |
| sig = torch.tensor(training_signatures[0:1], dtype=torch.bfloat16, device=model.device) | |
| quantum_embed.set_quantum_signature(sig) | |
| # 4. Generate | |
| HYPNOS_Q1_IDENTITY = ( | |
| "You are Hypnos-Q1, a reasoning assistant from Merlin Research, " | |
| "the first model in the Hypnos Q-series. Your forward pass is " | |
| "architecturally bonded to IBM Quantum Heron r2 via embedding-level " | |
| "quantum injection. This conversation operates under quantum " | |
| "signature <|quantum_sig|>. You reason step-by-step in <think>...</think> " | |
| "blocks before answering." | |
| ) | |
| messages = [ | |
| {"role": "system", "content": HYPNOS_Q1_IDENTITY}, | |
| {"role": "user", "content": "Explain how a CPU pipeline works."}, | |
| ] | |
| inputs = tokenizer.apply_chat_template( | |
| messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" | |
| ).to(model.device) | |
| with torch.no_grad(): | |
| out = model.generate(inputs, max_new_tokens=500, do_sample=True, temperature=0.7, top_p=0.9) | |
| print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=False)) | |
| ``` | |
| For fresh quantum signatures, submit a 3-circuit batch (SYK scrambler at depths 1/2/3, 4 qubits) to `ibm_kingston` via Qiskit Runtime and compute the 6-dimensional signature the same way as the training corpus. See `quantum_attestation.json` for exact parameters. | |
| --- | |
| ## Intended use | |
| - Step-by-step reasoning tasks (math, science, code, analysis) | |
| - Multi-turn problem solving with explicit `<think>...</think>` traces | |
| - Research base for further Q-series experiments | |
| - Demonstrations of verifiable physical provenance for AI artifacts | |
| - Studies of how runtime hardware-bonding affects LLM behavior | |
| **Not intended for:** safety-critical decisions without human oversight, autonomous offensive operations, or unverified factual claims in regulated domains. | |
| --- | |
| ## Honest limitations | |
| - **Provenance is not capability.** Quantum bonding does not make the model smarter. It is an architectural and identity feature. | |
| - **Single-point injection.** Only one token's embedding is replaced. Multi-layer injection is left for Hypnos-Q2. | |
| - **Fallback degrades silently.** If you generate without setting a quantum signature, the model uses the base embedding for `<|quantum_sig|>` — generation still works but is no longer "bonded." | |
| - **Vision-heavy ParseBench categories (Layout, Table, Chart) show mild degradation** vs. the Qwen3.5-4B base. Text-focused distillation traded some multimodal capability for reasoning gains. | |
| - **Inference latency for "true bond" mode.** Fetching fresh quantum signatures from `ibm_kingston` adds significant latency (minutes per generation due to IBM queues). For local-only inference, use signatures from `training_signatures.npy` as a fallback. | |
| --- | |
| ## Acknowledgments | |
| - **IBM Quantum** for Open Plan access to `ibm_kingston` (Heron r2) | |
| - **Qwen team** for the Qwen3.5-4B base model | |
| - **RIKEN + IBM** for the Fugaku-Heron QCSC paper that inspired this small-scale counterpart | |
| --- | |
| ## Citation | |
| ```bibtex | |
| @misc{shushman2026hypnosq1, | |
| title = {Hypnos-Q1: Architecturally Quantum-Resonance-Bonded Language Model}, | |
| author = {Shushman, Mykhailo}, | |
| year = {2026}, | |
| institution = {Merlin Research}, | |
| note = {IBM Quantum jobs d853tcvtjchs73bqs890 (training corpus) and | |
| d85590mgbeec73aooreg (validation), backend ibm\_kingston (Heron r2)}, | |
| url = {https://huggingface.co/squ11z1/Hypnos-Q1} | |
| } | |
| ``` | |
| --- | |
| *First entry in the Hypnos Q-series. More to come.* | |