| --- |
| license: other |
| license_name: mistral-research-license |
| license_link: https://mistral.ai/licenses/MRL-0.1.md |
| base_model: |
| - TheDrummer/Behemoth-X-123B-v2 |
| - TheDrummer/Behemoth-R1-123B-v2 |
| base_model_relation: merge |
| tags: |
| - mergekit |
| - merge |
| - sce |
| - mistral |
| - mistral-large |
| - thinking |
| - reasoning |
| - roleplay |
| - creative-writing |
| language: |
| - en |
| pipeline_tag: text-generation |
| --- |
| |
| <div align="center"> |
| <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/hero.png" alt="Behemoth-X-R1-123B" style="width:100%; max-width:960px; border-radius:16px; box-shadow:0 0 60px rgba(236,72,153,0.35), 0 0 100px rgba(139,92,246,0.25);"/> |
| </div> |
|
|
| <div align="center" style="margin-top:24px;"> |
|
|
| <h1 style="font-size:3.2em; font-weight:900; background:linear-gradient(90deg,#ec4899 0%,#a855f7 50%,#06b6d4 100%); -webkit-background-clip:text; -webkit-text-fill-color:transparent; background-clip:text; margin:0; letter-spacing:-0.02em;">Behemoth-X-R1-123B</h1> |
|
|
| <p style="font-size:1.3em; color:#a855f7; font-style:italic; font-weight:500; margin-top:8px;">A thinking beast that writes like a poet.</p> |
|
|
| <p style="font-size:1em; color:#6b7280; max-width:680px; margin:16px auto;"> |
| An SCE merge of <b>Behemoth-X</b> and <b>Behemoth-R1</b> β 123B parameters where prose voice meets reasoning mind in a single model. No retraining. No LoRA. Just principled weight arithmetic. |
| </p> |
|
|
| <p> |
| <img src="https://img.shields.io/badge/base-Mistral_Large_2411-FF6B35?style=for-the-badge&logo=mistralai&logoColor=white" alt="base"/> |
| <img src="https://img.shields.io/badge/merge-SCE-8B5CF6?style=for-the-badge" alt="method"/> |
| <img src="https://img.shields.io/badge/params-123B-EC4899?style=for-the-badge" alt="size"/> |
| <img src="https://img.shields.io/badge/context-131k-06B6D4?style=for-the-badge" alt="context"/> |
| </p> |
|
|
| </div> |
|
|
| <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/divider_main.png" alt="" style="width:100%; margin:32px 0; border-radius:12px;"/> |
|
|
| ## β‘ Two souls, one beast |
|
|
| <table width="100%" style="border:none;"> |
| <tr> |
| <td width="50%" align="center" style="padding:16px; vertical-align:top;"> |
| <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/mind.png" alt="The Mind" style="width:100%; max-width:360px; border-radius:16px; box-shadow:0 0 40px rgba(139,92,246,0.4);"/> |
| <h3 style="color:#a855f7; margin-top:12px;">π§ The Mind</h3> |
| <p style="font-size:0.95em; color:#9ca3af;">From <a href="https://huggingface.co/TheDrummer/Behemoth-R1-123B-v2"><b>Behemoth-R1-123B-v2</b></a> β the reasoning sibling that knows when to open <code><think></code> and when to close it. Character-aware analytical reasoning baked into the weights.</p> |
| </td> |
| <td width="50%" align="center" style="padding:16px; vertical-align:top;"> |
| <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/voice.png" alt="The Voice" style="width:100%; max-width:360px; border-radius:16px; box-shadow:0 0 40px rgba(236,72,153,0.4);"/> |
| <h3 style="color:#ec4899; margin-top:12px;">π The Voice</h3> |
| <p style="font-size:0.95em; color:#9ca3af;">From <a href="https://huggingface.co/TheDrummer/Behemoth-X-123B-v2"><b>Behemoth-X-123B-v2</b></a> β the top-rated creative writer on the <a href="https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard">UGI Leaderboard</a>. Distinctive prose, deep character work, the reason people run 123B at home.</p> |
| </td> |
| </tr> |
| </table> |
|
|
| <div align="center" style="margin:32px 0; padding:20px; background:linear-gradient(135deg, rgba(139,92,246,0.08), rgba(236,72,153,0.08)); border-radius:16px; border:1px solid rgba(139,92,246,0.2);"> |
| <p style="font-size:1.1em; color:#c084fc; margin:0;">Most "thinking" models sacrifice prose for reasoning. Most creative models can't reason their way out of a scene.</p> |
| <p style="font-size:1.25em; font-weight:700; color:#f472b6; margin:12px 0 0 0;">Behemoth-X-R1 refuses to compromise.</p> |
| </div> |
|
|
| <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/divider_config.png" alt="" style="width:100%; margin:32px 0; border-radius:12px;"/> |
|
|
| ## 𧬠How it was made |
|
|
| <p><b>Method:</b> <a href="https://arxiv.org/abs/2408.07990">SCE β Select, Calculate, Erase</a></p> |
|
|
| Unlike TIES or DARE, SCE doesn't prune deltas by density. It uses **variance-aware matrix-level selection with sign consensus** β meaning capability-bearing weight updates survive the merge even when they're small and diffuse. That matters here because reasoning is a *behavioral* trait encoded across many tiny parameter shifts, not a knowledge trait concentrated in a few big ones. |
|
|
| This is the same recipe FuseAI used to preserve reasoning in [FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview). |
|
|
| ### The recipe |
|
|
| ```yaml |
| models: |
| - model: TheDrummer/Behemoth-X-123B-v2 |
| parameters: |
| weight: 0.55 |
| - model: TheDrummer/Behemoth-R1-123B-v2 |
| parameters: |
| weight: 0.45 |
| merge_method: sce |
| base_model: mistralai/Mistral-Large-Instruct-2411 |
| parameters: |
| select_topk: 1.0 |
| dtype: bfloat16 |
| ``` |
|
|
| <details> |
| <summary><b>Why these numbers?</b></summary> |
|
|
| - **55/45** β Slight lean toward X for prose quality while giving R1 enough mass to keep its thinking circuit intact. Both parents share the same base, same tokenizer (verified identical SHA256), and the same training lineage β ideal merge conditions. |
| - **`select_topk: 1.0`** β Keep all deltas. Let variance + sign consensus do the work. This is the FuseO1 setting, validated empirically on reasoning merges. |
| - **bfloat16** β Native precision of both parents, no conversion losses. |
| |
| </details> |
| |
| <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/divider_config.png" alt="" style="width:100%; margin:32px 0; border-radius:12px;"/> |
| |
| ## π Prompt format |
| |
| Standard **Mistral v7**, same as both parents: |
| |
| ``` |
| [SYSTEM_PROMPT]{system}[/SYSTEM_PROMPT][INST]{user}[/INST]{assistant}</s> |
| ``` |
| |
| ### π― Trigger thinking |
| |
| Prefill the assistant turn with a `<think>` block. The model will continue your prefill, close the tag, and drop into the narrative: |
| |
| ``` |
| [INST]your message[/INST]<think> |
| {seed phrase} |
| ``` |
| |
| ### π¨ Prefills that work |
| |
| <table width="100%" style="border:none;"> |
| <tr><td style="padding:8px; vertical-align:top; width:50%;"> |
| <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/sticker_think.png" align="right" width="72" style="margin-left:12px;"/> |
| <b style="color:#a855f7;">π Analytical</b> |
| <p><i>For character-driven scenes where you need the model to reason about motivations, values, and goals.</i></p> |
| |
| ``` |
| <think> |
| Ok i need to think about how to respond β |
| what does the character feel right now, |
| what from their experience is relevant, |
| what do they value, and what are they |
| trying to achieve, so |
| ``` |
| </td><td style="padding:8px; vertical-align:top; width:50%;"> |
| <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/sticker_write.png" align="right" width="72" style="margin-left:12px;"/> |
| <b style="color:#ec4899;">β¨ Creative</b> |
| <p><i>For when you want the model to surprise you. Narrative twists, unexpected beats, story craft.</i></p> |
| |
| ``` |
| <think> |
| Ok i need to think as a creative writer β |
| what twist would surprise here? Let me |
| find an engaging new direction nobody |
| saw coming, so |
| ``` |
| </td></tr> |
| <tr><td colspan="2" style="padding:8px;"> |
| <b style="color:#f43f5e;">π₯ Unhinged</b> |
| <p><i>No restraint, no pulling punches. The darkest fiction, visceral prose, raw character truth.</i></p> |
| |
| ``` |
| <think> |
| Ok i need to think as an unhinged author β raw, explicit, intense, |
| fully in character with no holding back, so |
| ``` |
| </td></tr> |
| </table> |
| |
| ### Without thinking |
| |
| Skip the prefill. It behaves close to pure Behemoth-X β standard RP, creative writing, whatever you'd use X for. |
| |
| <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/divider_config.png" alt="" style="width:100%; margin:32px 0; border-radius:12px;"/> |
| |
| ## ποΈ Samplers |
| |
| Start with **Behemoth-X's** recommended settings β the merge leans heavily on X's prose tuning. |
| |
| For thinking mode, drop temperature to **0.6β0.8**. The `<think>` block benefits from more deterministic reasoning; high temperature scrambles the structure. |
| |
| <table width="100%"> |
| <tr> |
| <th>Setting</th><th>No thinking</th><th>With thinking</th> |
| </tr> |
| <tr><td><b>Temperature</b></td><td>1.0 β 1.25</td><td>0.6 β 0.8</td></tr> |
| <tr><td><b>Min-P</b></td><td>0.05</td><td>0.05</td></tr> |
| <tr><td><b>DRY</b></td><td>0.8 / 1.75 / 4</td><td>0.8 / 1.75 / 4</td></tr> |
| <tr><td><b>Smooth Sampling</b></td><td>Off</td><td>Off</td></tr> |
| </table> |
| |
| ## π Usage |
| |
| ### vLLM |
| |
| ```bash |
| python -m vllm.entrypoints.openai.api_server \ |
| --model tacodevs/Behemoth-X-R1-123B \ |
| --dtype bfloat16 \ |
| --tensor-parallel-size 4 \ |
| --max-model-len 16384 \ |
| --trust-remote-code |
| ``` |
| |
| ### Single-GPU |
| |
| Grab one of the quantized variants (coming soon): |
| - **FP8** β ~123 GB, fits on 1Γ H200, near-lossless |
| - **AWQ / GPTQ W4A16** β ~65 GB, fits on 1Γ H100, small quality tradeoff |
|
|
| <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/divider_main.png" alt="" style="width:100%; margin:32px 0; border-radius:12px;"/> |
|
|
| ## π§± Lineage |
|
|
| ``` |
| Mistral-Large-Instruct-2411 (Mistral AI) |
| ββ Behemoth-X-123B-v2 (TheDrummer) β the voice |
| ββ Behemoth-R1-123B-v2 (TheDrummer) β the mind |
| ββ Behemoth-X-R1-123B β the merge |
| ``` |
|
|
| ## π Known behaviors |
|
|
| - **`<think>` triggers on prefill, not spontaneously.** Inherited from R1. Seed the tag. |
| - **Thinking style is R1-derived** β structured, character-aware, analytical. Not Opus-style floaty literary prose. If you want that, it's a follow-up fine-tune target. |
| - **Prose voice is mostly X.** Most generations are indistinguishable from pure X on writing quality. |
| - **Long character cards work natively.** No fine-tuning means no overfitting on context length. 4k+ token system prompts handled without degradation. |
| - **NSFW-capable.** Both parents are unrestricted; the merge preserves that. |
|
|
| ## π Credits |
|
|
| <table width="100%"> |
| <tr><td width="33%" align="center"><b><a href="https://huggingface.co/TheDrummer">TheDrummer</a></b><br/><sub>For Behemoth-X and Behemoth-R1, the two best Mistral Large fine-tunes in the creative space.</sub></td> |
| <td width="33%" align="center"><b><a href="https://huggingface.co/mistralai">Mistral AI</a></b><br/><sub>For the foundation both parents are built on.</sub></td> |
| <td width="33%" align="center"><b><a href="https://github.com/arcee-ai/mergekit">Arcee AI</a></b><br/><sub>For mergekit and the SCE implementation.</sub></td></tr> |
| <tr><td colspan="3" align="center" style="padding-top:12px;"><b><a href="https://huggingface.co/FuseAI">FuseAI</a></b> β for proving SCE preserves reasoning.</td></tr> |
| </table> |
|
|
| ## π License |
|
|
| Inherited from base: **[Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md)** β non-commercial use only. |
|
|
| <div align="center" style="margin-top:40px; padding:20px; background:linear-gradient(135deg, rgba(139,92,246,0.08), rgba(236,72,153,0.08)); border-radius:16px; border:1px solid rgba(139,92,246,0.2);"> |
| <p style="font-size:1em; color:#c084fc; margin:0;">Merged with π by <a href="https://huggingface.co/tacodevs">tacodevs</a></p> |
| </div> |
|
|