--- license: other license_name: mistral-research-license license_link: https://mistral.ai/licenses/MRL-0.1.md base_model: - TheDrummer/Behemoth-X-123B-v2 - TheDrummer/Behemoth-R1-123B-v2 base_model_relation: merge tags: - mergekit - merge - sce - mistral - mistral-large - thinking - reasoning - roleplay - creative-writing language: - en pipeline_tag: text-generation ---
A thinking beast that writes like a poet.
An SCE merge of Behemoth-X and Behemoth-R1 — 123B parameters where prose voice meets reasoning mind in a single model. No retraining. No LoRA. Just principled weight arithmetic.
## ⚡ Two souls, one beast
🧠 The MindFrom Behemoth-R1-123B-v2 — the reasoning sibling that knows when to open |
🎭 The VoiceFrom Behemoth-X-123B-v2 — the top-rated creative writer on the UGI Leaderboard. Distinctive prose, deep character work, the reason people run 123B at home. |
Most "thinking" models sacrifice prose for reasoning. Most creative models can't reason their way out of a scene.
Behemoth-X-R1 refuses to compromise.
## 🧬 How it was made
Method: SCE — Select, Calculate, Erase
Unlike TIES or DARE, SCE doesn't prune deltas by density. It uses **variance-aware matrix-level selection with sign consensus** — meaning capability-bearing weight updates survive the merge even when they're small and diffuse. That matters here because reasoning is a *behavioral* trait encoded across many tiny parameter shifts, not a knowledge trait concentrated in a few big ones. This is the same recipe FuseAI used to preserve reasoning in [FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview). ### The recipe ```yaml models: - model: TheDrummer/Behemoth-X-123B-v2 parameters: weight: 0.55 - model: TheDrummer/Behemoth-R1-123B-v2 parameters: weight: 0.45 merge_method: sce base_model: mistralai/Mistral-Large-Instruct-2411 parameters: select_topk: 1.0 dtype: bfloat16 ```
## 📜 Prompt format
Standard **Mistral v7**, same as both parents:
```
[SYSTEM_PROMPT]{system}[/SYSTEM_PROMPT][INST]{user}[/INST]{assistant}
```
### 🎯 Trigger thinking
Prefill the assistant turn with a `
🔍 Analytical
For character-driven scenes where you need the model to reason about motivations, values, and goals. ``` |
✨ Creative
For when you want the model to surprise you. Narrative twists, unexpected beats, story craft. ``` |
|
🔥 Unhinged
No restraint, no pulling punches. The darkest fiction, visceral prose, raw character truth. ``` | |
## 🎚️ Samplers
Start with **Behemoth-X's** recommended settings — the merge leans heavily on X's prose tuning.
For thinking mode, drop temperature to **0.6–0.8**. The `| Setting | No thinking | With thinking |
|---|---|---|
| Temperature | 1.0 – 1.25 | 0.6 – 0.8 |
| Min-P | 0.05 | 0.05 |
| DRY | 0.8 / 1.75 / 4 | 0.8 / 1.75 / 4 |
| Smooth Sampling | Off | Off |
## 🧱 Lineage
```
Mistral-Large-Instruct-2411 (Mistral AI)
├─ Behemoth-X-123B-v2 (TheDrummer) ← the voice
└─ Behemoth-R1-123B-v2 (TheDrummer) ← the mind
└─ Behemoth-X-R1-123B ← the merge
```
## 🔍 Known behaviors
- **`| TheDrummer For Behemoth-X and Behemoth-R1, the two best Mistral Large fine-tunes in the creative space. |
Mistral AI For the foundation both parents are built on. |
Arcee AI For mergekit and the SCE implementation. |
| FuseAI — for proving SCE preserves reasoning. | ||
Merged with 💜 by tacodevs