squ11z1 commited on
Commit
05840c5
·
verified ·
1 Parent(s): 3be2bce

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +312 -14
README.md CHANGED
@@ -2,23 +2,321 @@
2
  license: apache-2.0
3
  base_model: Qwen/Qwen3.5-9B-Base
4
  tags:
5
- - qwen3_5
6
- - lora
7
- - reasoning
8
- - merlin-research
9
- - pluto
10
- - quantum-entropy
 
 
 
 
 
 
11
  ---
12
 
13
- # MerlinSafety/Pluto-Qwen3.5-9B
14
 
15
- This repository contains the **merged full model** artifact for **Pluto**, a personalized Qwen3.5-9B model by **Merlin Research**.
16
 
17
- Training notes:
18
 
19
- - Base model: `Qwen/Qwen3.5-9B-Base`
20
- - Personalization: the model is instructed and reinforced to identify itself as **Pluto** by **Merlin Research**
21
- - Training data: local `messages` SFT dataset
22
- - Quantum entropy usage: raw IBM quantum bitstrings were used as a stochastic source for seeds, dataset shuffle, split, and crop jitter
23
 
24
- This entropy source was used as an external stochastic controller, not as fake supervised labels.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  base_model: Qwen/Qwen3.5-9B-Base
4
  tags:
5
+ - code
6
+ - reasoning
7
+ - distillation
8
+ - reinforcement-learning
9
+ - long-context
10
+ - claude-code
11
+ - openai-codex
12
+ - quantum-entropy
13
+ - merlin-research
14
+ language:
15
+ - en
16
+ pipeline_tag: text-generation
17
  ---
18
 
19
+ # Pluto
20
 
21
+ ![IMAGE 2026-03-22 02:04:31](https://cdn-uploads.huggingface.co/production/uploads/67329d3f69fded92d56ab41a/yEhR_aUdMvbHKMuhiXvB7.jpeg)
22
 
23
+ **Pluto** is a 9B parameter coding and reasoning model developed by [Merlin Research](https://huggingface.co/MerlinSafety), built for precision, robustness, and seamless deployment in agentic coding environments including Claude Code, OpenAI Codex, and local large-codebase workflows.
24
 
 
 
 
 
25
 
26
+ ---
27
+
28
+ ## Model Summary
29
+
30
+ | Property | Value |
31
+ |---|---|
32
+ | **Developer** | Merlin Research |
33
+ | **Base Model** | Qwen/Qwen3.5-9B-Base |
34
+ | **Parameters** | 9B |
35
+ | **Context Length** | 1,000,000 tokens |
36
+ | **Training** | SFT + RL with Adaptive Entropy Regularization |
37
+ | **Distillation** | Frontier coding models |
38
+ | **Compute** | Google Cloud (TPU/GPU via Google TRC Research Grant) |
39
+ | **Quantum** | IBM Quantum Kingston (Heron r2) — entropy noise injection |
40
+ | **License** | Apache 2.0 |
41
+
42
+ ---
43
+
44
+ ## Key Features
45
+
46
+ ### 🎯 Precision-First Design
47
+ Pluto is trained to minimize errors rather than maximize fluency. Every training signal — from distillation targets to RL reward shaping — is oriented around correctness, not surface-level coherence. This makes Pluto particularly effective for tasks where a single wrong line of code has downstream consequences.
48
+
49
+ ### 🔭 1M Token Context
50
+ Pluto supports up to **1,000,000 tokens** of context, enabling operation on large codebases without chunking or retrieval hacks. Feed it an entire repository, a multi-file diff, or a long conversation history — Pluto maintains coherent reasoning across the full window.
51
+
52
+ ### 🤖 Agentic Deployment Ready
53
+ Pluto is fine-tuned specifically for deployment in:
54
+ - **Claude Code** — system prompt formatting, tool call patterns, multi-turn agentic loops
55
+ - **OpenAI Codex / Assistants API** — compatible message structure and function calling behavior
56
+ - **Local deployment** — GGUF and quantized variants available for running against large local codebases without API latency
57
+
58
+ ### ⚛️ Quantum Entropy Regularization (AER)
59
+ During RL training, Pluto used **Adaptive Entropy Regularization (AER)** with quantum noise sourced from the **IBM Quantum Kingston** processor (Heron r2, 156 qubits). Bitstring measurements from entangled quantum states were used to modulate the per-token entropy coefficient λ(t) during GRPO training, providing:
60
+ - Resistance to entropy collapse and reward hacking
61
+ - Improved robustness on out-of-distribution inputs
62
+ - More stable training dynamics across long RL runs
63
+
64
+ This makes Pluto the first production coding model trained with quantum hardware-sourced entropy regularization.
65
+
66
+ ### 📚 Distillation from Frontier Models
67
+ Pluto was trained using knowledge distillation from multiple frontier coding models, combined with a curated private dataset of advanced reasoning traces. The distillation pipeline transfers deep reasoning chains from teacher models while keeping inference cost at the 9B scale.
68
+
69
+ ---
70
+
71
+ ## Quickstart
72
+
73
+ ### Transformers
74
+
75
+ ```python
76
+ from transformers import AutoModelForCausalLM, AutoTokenizer
77
+ import torch
78
+
79
+ model_id = "MerlinSafety/Pluto"
80
+
81
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
82
+ model = AutoModelForCausalLM.from_pretrained(
83
+ model_id,
84
+ torch_dtype=torch.bfloat16,
85
+ device_map="auto",
86
+ )
87
+
88
+ messages = [
89
+ {
90
+ "role": "user",
91
+ "content": "Write a Python function that parses a JWT token without external libraries and validates the expiry timestamp."
92
+ }
93
+ ]
94
+
95
+ text = tokenizer.apply_chat_template(
96
+ messages,
97
+ tokenize=False,
98
+ add_generation_prompt=True
99
+ )
100
+
101
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
102
+
103
+ with torch.no_grad():
104
+ outputs = model.generate(
105
+ **inputs,
106
+ max_new_tokens=2048,
107
+ temperature=0.6,
108
+ top_p=0.95,
109
+ do_sample=True,
110
+ repetition_penalty=1.1,
111
+ )
112
+
113
+ response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
114
+ print(response)
115
+ ```
116
+
117
+ ### With Unsloth (faster inference, 4-bit)
118
+
119
+ ```python
120
+ from unsloth import FastLanguageModel
121
+ import torch
122
+
123
+ model, tokenizer = FastLanguageModel.from_pretrained(
124
+ model_name="MerlinSafety/Pluto",
125
+ max_seq_length=131072, # adjust as needed
126
+ dtype=None,
127
+ load_in_4bit=True,
128
+ )
129
+
130
+ FastLanguageModel.for_inference(model)
131
+
132
+ messages = [
133
+ {"role": "user", "content": "Refactor this function to be async and add proper error handling:\n\ndef fetch_data(url):\n import requests\n return requests.get(url).json()"}
134
+ ]
135
+
136
+ inputs = tokenizer.apply_chat_template(
137
+ messages,
138
+ tokenize=True,
139
+ add_generation_prompt=True,
140
+ return_tensors="pt"
141
+ ).to("cuda")
142
+
143
+ outputs = model.generate(
144
+ input_ids=inputs,
145
+ max_new_tokens=1024,
146
+ temperature=0.6,
147
+ do_sample=True,
148
+ )
149
+
150
+ print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
151
+ ```
152
+
153
+ ### GGUF / llama.cpp (local deployment)
154
+
155
+ ```bash
156
+ # Download Q4_K_M (recommended, ~5.4GB)
157
+ huggingface-cli download MerlinSafety/Pluto \
158
+ Pluto-Q4_K_M.gguf \
159
+ --local-dir ./pluto
160
+
161
+ # Download Q8_0 (higher quality, ~9.4GB)
162
+ huggingface-cli download MerlinSafety/Pluto \
163
+ Pluto-Q8_0.gguf \
164
+ --local-dir ./pluto
165
+
166
+ # Run with llama.cpp
167
+ ./llama-cli \
168
+ -m ./pluto/Pluto-Q4_K_M.gguf \
169
+ -p "Explain the time complexity of this algorithm and suggest optimizations:\n[your code here]" \
170
+ -n 1024 \
171
+ --temp 0.6 \
172
+ --top-p 0.95 \
173
+ -c 8192
174
+ ```
175
+
176
+ ### Ollama
177
+
178
+ ```bash
179
+ cat > Modelfile << 'EOF'
180
+ FROM ./Pluto-Q4_K_M.gguf
181
+ PARAMETER temperature 0.6
182
+ PARAMETER top_p 0.95
183
+ PARAMETER num_ctx 8192
184
+ EOF
185
+
186
+ ollama create pluto -f Modelfile
187
+ ollama run pluto "Write a thread-safe singleton implementation in Python"
188
+ ```
189
+
190
+ ---
191
+
192
+ ## Claude Code Integration
193
+
194
+ Pluto is optimized for use as a local backend in Claude Code via the `--model` flag when pointing to a local OpenAI-compatible server:
195
+
196
+ ```bash
197
+ # Start local server (example with llama.cpp server)
198
+ ./llama-server \
199
+ -m pluto-9b-q4_k_m.gguf \
200
+ --port 8080 \
201
+ -c 32768 \
202
+ --chat-template qwen
203
+
204
+ # Use with Claude Code
205
+ claude --model http://localhost:8080 "Review this PR and identify potential bugs"
206
+ ```
207
+
208
+ ---
209
+
210
+ ## OpenAI Codex / Assistants API Integration
211
+
212
+ Pluto's instruction format is compatible with the OpenAI Chat Completions API when served through a compatible endpoint:
213
+
214
+ ```python
215
+ from openai import OpenAI
216
+
217
+ client = OpenAI(
218
+ base_url="http://localhost:8080/v1", # your local Pluto server
219
+ api_key="not-needed"
220
+ )
221
+
222
+ response = client.chat.completions.create(
223
+ model="pluto",
224
+ messages=[
225
+ {
226
+ "role": "user",
227
+ "content": "Write a SQL query to find the top 5 customers by revenue in the last 30 days, handling NULL values correctly."
228
+ }
229
+ ],
230
+ max_tokens=1024,
231
+ temperature=0.6,
232
+ )
233
+
234
+ print(response.choices[0].message.content)
235
+ ```
236
+
237
+ ---
238
+
239
+ ---
240
+
241
+ ## Training Details
242
+
243
+ ### Pipeline Overview
244
+
245
+ ```
246
+ Qwen/Qwen3.5-9B-Base
247
+
248
+
249
+ SFT on curated advanced reasoning + coding dataset
250
+ (private dataset, distillation from frontier models)
251
+
252
+
253
+ GRPO Reinforcement Learning
254
+ with Adaptive Entropy Regularization (AER)
255
+ + IBM Quantum Kingston entropy noise injection
256
+
257
+
258
+ Long-context fine-tuning (1M token extension)
259
+
260
+
261
+ Agentic deployment fine-tuning
262
+ (Claude Code + Codex format alignment)
263
+
264
+
265
+ Pluto 9B
266
+ ```
267
+
268
+ ### Adaptive Entropy Regularization (AER)
269
+
270
+ During RL training, the loss function was modified as:
271
+
272
+ ```
273
+ L_total = L_RL + λ(t) · L_entropy
274
+ ```
275
+
276
+ where `λ(t)` is a dynamic coefficient modulated by quantum bitstring measurements from the IBM Quantum Kingston (Heron r2) processor. GHZ-state measurements provided true quantum randomness that guided the per-token entropy targets, preventing entropy collapse and improving robustness.
277
+
278
+ ### Compute
279
+ Training was conducted on Google Cloud TPU/GPU infrastructure supported by a **Google TPU Research Cloud (TRC) grant** awarded to Merlin Research.
280
+
281
+ ---
282
+
283
+ ## Intended Use
284
+
285
+ - Complex code generation and refactoring
286
+ - Multi-file codebase analysis
287
+ - Agentic coding pipelines (Claude Code, Codex)
288
+ - Code review and bug detection
289
+ - Architecture planning and technical reasoning
290
+ - Local deployment with large private codebases
291
+
292
+ ---
293
+
294
+ ## Limitations
295
+
296
+ - Pluto is optimized for coding and technical reasoning — general conversation and creative tasks are outside its primary design goal
297
+ - Like all LLMs, Pluto can produce incorrect code; always review generated output before deploying to production
298
+ - Performance on very niche frameworks or proprietary APIs may be limited by training data coverage
299
+ - Quantum entropy component provides training-time benefits; inference behavior is classical
300
+
301
+ ---
302
+
303
+ ## Citation
304
+
305
+ ```bibtex
306
+ @misc{pluto-2026,
307
+ title={Pluto: Precision Coding and Reasoning Model with Quantum Entropy Regularization},
308
+ author={Merlin Research},
309
+ year={2026},
310
+ publisher={Merlin Research},
311
+ url={https://huggingface.co/MerlinSafety/Pluto}
312
+ }
313
+ ```
314
+
315
+ ---
316
+
317
+ ## About Merlin Research
318
+
319
+ [Merlin Research](https://huggingface.co/MerlinSafety) is an independent AI safety laboratory based in Stockholm, Sweden, focused on open-source model development, adaptive entropy regularization, and practical AI alignment. Our models are released publicly to advance accessible, safe, and high-quality AI for the research community.
320
+
321
+ **HuggingFace:** [huggingface.co/MerlinSafety](https://huggingface.co/MerlinSafety)
322
+ **Contact:** MerlinResearch@protonmail.com