Update README.md

Browse files

Files changed (1) hide show

README.md +312 -14

README.md CHANGED Viewed

@@ -2,23 +2,321 @@
 license: apache-2.0
 base_model: Qwen/Qwen3.5-9B-Base
 tags:
-- qwen3_5
-- lora
-- reasoning
-- merlin-research
-- pluto
-- quantum-entropy
 ---
-# MerlinSafety/Pluto-Qwen3.5-9B
-This repository contains the **merged full model** artifact for **Pluto**, a personalized Qwen3.5-9B model by **Merlin Research**.
-Training notes:
-- Base model: `Qwen/Qwen3.5-9B-Base`
-- Personalization: the model is instructed and reinforced to identify itself as **Pluto** by **Merlin Research**
-- Training data: local `messages` SFT dataset
-- Quantum entropy usage: raw IBM quantum bitstrings were used as a stochastic source for seeds, dataset shuffle, split, and crop jitter
-This entropy source was used as an external stochastic controller, not as fake supervised labels.

 license: apache-2.0
 base_model: Qwen/Qwen3.5-9B-Base
 tags:
+  - code
+  - reasoning
+  - distillation
+  - reinforcement-learning
+  - long-context
+  - claude-code
+  - openai-codex
+  - quantum-entropy
+  - merlin-research
+language:
+  - en
+pipeline_tag: text-generation
 ---
+# Pluto
+![IMAGE 2026-03-22 02:04:31](https://cdn-uploads.huggingface.co/production/uploads/67329d3f69fded92d56ab41a/yEhR_aUdMvbHKMuhiXvB7.jpeg)
+**Pluto** is a 9B parameter coding and reasoning model developed by [Merlin Research](https://huggingface.co/MerlinSafety), built for precision, robustness, and seamless deployment in agentic coding environments including Claude Code, OpenAI Codex, and local large-codebase workflows.
+---
+## Model Summary
+| Property | Value |
+|---|---|
+| **Developer** | Merlin Research |
+| **Base Model** | Qwen/Qwen3.5-9B-Base |
+| **Parameters** | 9B |
+| **Context Length** | 1,000,000 tokens |
+| **Training** | SFT + RL with Adaptive Entropy Regularization |
+| **Distillation** | Frontier coding models |
+| **Compute** | Google Cloud (TPU/GPU via Google TRC Research Grant) |
+| **Quantum** | IBM Quantum Kingston (Heron r2) — entropy noise injection |
+| **License** | Apache 2.0 |
+---
+## Key Features
+### 🎯 Precision-First Design
+Pluto is trained to minimize errors rather than maximize fluency. Every training signal — from distillation targets to RL reward shaping — is oriented around correctness, not surface-level coherence. This makes Pluto particularly effective for tasks where a single wrong line of code has downstream consequences.
+### 🔭 1M Token Context
+Pluto supports up to **1,000,000 tokens** of context, enabling operation on large codebases without chunking or retrieval hacks. Feed it an entire repository, a multi-file diff, or a long conversation history — Pluto maintains coherent reasoning across the full window.
+### 🤖 Agentic Deployment Ready
+Pluto is fine-tuned specifically for deployment in:
+- **Claude Code** — system prompt formatting, tool call patterns, multi-turn agentic loops
+- **OpenAI Codex / Assistants API** — compatible message structure and function calling behavior
+- **Local deployment** — GGUF and quantized variants available for running against large local codebases without API latency
+### ⚛️ Quantum Entropy Regularization (AER)
+During RL training, Pluto used **Adaptive Entropy Regularization (AER)** with quantum noise sourced from the **IBM Quantum Kingston** processor (Heron r2, 156 qubits). Bitstring measurements from entangled quantum states were used to modulate the per-token entropy coefficient λ(t) during GRPO training, providing:
+- Resistance to entropy collapse and reward hacking
+- Improved robustness on out-of-distribution inputs
+- More stable training dynamics across long RL runs
+This makes Pluto the first production coding model trained with quantum hardware-sourced entropy regularization.
+### 📚 Distillation from Frontier Models
+Pluto was trained using knowledge distillation from multiple frontier coding models, combined with a curated private dataset of advanced reasoning traces. The distillation pipeline transfers deep reasoning chains from teacher models while keeping inference cost at the 9B scale.
+---
+## Quickstart
+### Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model_id = "MerlinSafety/Pluto"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+messages = [
+    {
+        "role": "user",
+        "content": "Write a Python function that parses a JWT token without external libraries and validates the expiry timestamp."
+    }
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=2048,
+        temperature=0.6,
+        top_p=0.95,
+        do_sample=True,
+        repetition_penalty=1.1,
+    )
+response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
+print(response)
+```
+### With Unsloth (faster inference, 4-bit)
+```python
+from unsloth import FastLanguageModel
+import torch
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name="MerlinSafety/Pluto",
+    max_seq_length=131072,  # adjust as needed
+    dtype=None,
+    load_in_4bit=True,
+)
+FastLanguageModel.for_inference(model)
+messages = [
+    {"role": "user", "content": "Refactor this function to be async and add proper error handling:\n\ndef fetch_data(url):\n    import requests\n    return requests.get(url).json()"}
+]
+inputs = tokenizer.apply_chat_template(
+    messages,
+    tokenize=True,
+    add_generation_prompt=True,
+    return_tensors="pt"
+).to("cuda")
+outputs = model.generate(
+    input_ids=inputs,
+    max_new_tokens=1024,
+    temperature=0.6,
+    do_sample=True,
+)
+print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
+```
+### GGUF / llama.cpp (local deployment)
+```bash
+# Download Q4_K_M (recommended, ~5.4GB)
+huggingface-cli download MerlinSafety/Pluto \
+    Pluto-Q4_K_M.gguf \
+    --local-dir ./pluto
+# Download Q8_0 (higher quality, ~9.4GB)
+huggingface-cli download MerlinSafety/Pluto \
+    Pluto-Q8_0.gguf \
+    --local-dir ./pluto
+# Run with llama.cpp
+./llama-cli \
+    -m ./pluto/Pluto-Q4_K_M.gguf \
+    -p "Explain the time complexity of this algorithm and suggest optimizations:\n[your code here]" \
+    -n 1024 \
+    --temp 0.6 \
+    --top-p 0.95 \
+    -c 8192
+```
+### Ollama
+```bash
+cat > Modelfile << 'EOF'
+FROM ./Pluto-Q4_K_M.gguf
+PARAMETER temperature 0.6
+PARAMETER top_p 0.95
+PARAMETER num_ctx 8192
+EOF
+ollama create pluto -f Modelfile
+ollama run pluto "Write a thread-safe singleton implementation in Python"
+```
+---
+## Claude Code Integration
+Pluto is optimized for use as a local backend in Claude Code via the `--model` flag when pointing to a local OpenAI-compatible server:
+```bash
+# Start local server (example with llama.cpp server)
+./llama-server \
+    -m pluto-9b-q4_k_m.gguf \
+    --port 8080 \
+    -c 32768 \
+    --chat-template qwen
+# Use with Claude Code
+claude --model http://localhost:8080 "Review this PR and identify potential bugs"
+```
+---
+## OpenAI Codex / Assistants API Integration
+Pluto's instruction format is compatible with the OpenAI Chat Completions API when served through a compatible endpoint:
+```python
+from openai import OpenAI
+client = OpenAI(
+    base_url="http://localhost:8080/v1",  # your local Pluto server
+    api_key="not-needed"
+)
+response = client.chat.completions.create(
+    model="pluto",
+    messages=[
+        {
+            "role": "user",
+            "content": "Write a SQL query to find the top 5 customers by revenue in the last 30 days, handling NULL values correctly."
+        }
+    ],
+    max_tokens=1024,
+    temperature=0.6,
+)
+print(response.choices[0].message.content)
+```
+---
+---
+## Training Details
+### Pipeline Overview
+```
+Qwen/Qwen3.5-9B-Base
+    │
+    ▼
+SFT on curated advanced reasoning + coding dataset
+(private dataset, distillation from frontier models)
+    │
+    ▼
+GRPO Reinforcement Learning
+with Adaptive Entropy Regularization (AER)
++ IBM Quantum Kingston entropy noise injection
+    │
+    ▼
+Long-context fine-tuning (1M token extension)
+    │
+    ▼
+Agentic deployment fine-tuning
+(Claude Code + Codex format alignment)
+    │
+    ▼
+Pluto 9B
+```
+### Adaptive Entropy Regularization (AER)
+During RL training, the loss function was modified as:
+```
+L_total = L_RL + λ(t) · L_entropy
+```
+where `λ(t)` is a dynamic coefficient modulated by quantum bitstring measurements from the IBM Quantum Kingston (Heron r2) processor. GHZ-state measurements provided true quantum randomness that guided the per-token entropy targets, preventing entropy collapse and improving robustness.
+### Compute
+Training was conducted on Google Cloud TPU/GPU infrastructure supported by a **Google TPU Research Cloud (TRC) grant** awarded to Merlin Research.
+---
+## Intended Use
+- Complex code generation and refactoring
+- Multi-file codebase analysis
+- Agentic coding pipelines (Claude Code, Codex)
+- Code review and bug detection
+- Architecture planning and technical reasoning
+- Local deployment with large private codebases
+---
+## Limitations
+- Pluto is optimized for coding and technical reasoning — general conversation and creative tasks are outside its primary design goal
+- Like all LLMs, Pluto can produce incorrect code; always review generated output before deploying to production
+- Performance on very niche frameworks or proprietary APIs may be limited by training data coverage
+- Quantum entropy component provides training-time benefits; inference behavior is classical
+---
+## Citation
+```bibtex
+@misc{pluto-2026,
+  title={Pluto: Precision Coding and Reasoning Model with Quantum Entropy Regularization},
+  author={Merlin Research},
+  year={2026},
+  publisher={Merlin Research},
+  url={https://huggingface.co/MerlinSafety/Pluto}
+}
+```
+---
+## About Merlin Research
+[Merlin Research](https://huggingface.co/MerlinSafety) is an independent AI safety laboratory based in Stockholm, Sweden, focused on open-source model development, adaptive entropy regularization, and practical AI alignment. Our models are released publicly to advance accessible, safe, and high-quality AI for the research community.
+**HuggingFace:** [huggingface.co/MerlinSafety](https://huggingface.co/MerlinSafety)
+**Contact:** MerlinResearch@protonmail.com