Behemoth-X-R1-123B
A thinking beast that writes like a poet.
An SCE merge of Behemoth-X and Behemoth-R1 β 123B parameters where prose voice meets reasoning mind in a single model. No retraining. No LoRA. Just principled weight arithmetic.
β‘ Two souls, one beast
π§ The MindFrom Behemoth-R1-123B-v2 β the reasoning sibling that knows when to open |
π The VoiceFrom Behemoth-X-123B-v2 β the top-rated creative writer on the UGI Leaderboard. Distinctive prose, deep character work, the reason people run 123B at home. |
Most "thinking" models sacrifice prose for reasoning. Most creative models can't reason their way out of a scene.
Behemoth-X-R1 refuses to compromise.
𧬠How it was made
Method: SCE β Select, Calculate, Erase
Unlike TIES or DARE, SCE doesn't prune deltas by density. It uses variance-aware matrix-level selection with sign consensus β meaning capability-bearing weight updates survive the merge even when they're small and diffuse. That matters here because reasoning is a behavioral trait encoded across many tiny parameter shifts, not a knowledge trait concentrated in a few big ones.
This is the same recipe FuseAI used to preserve reasoning in FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview.
The recipe
models:
- model: TheDrummer/Behemoth-X-123B-v2
parameters:
weight: 0.55
- model: TheDrummer/Behemoth-R1-123B-v2
parameters:
weight: 0.45
merge_method: sce
base_model: mistralai/Mistral-Large-Instruct-2411
parameters:
select_topk: 1.0
dtype: bfloat16
Why these numbers?
- 55/45 β Slight lean toward X for prose quality while giving R1 enough mass to keep its thinking circuit intact. Both parents share the same base, same tokenizer (verified identical SHA256), and the same training lineage β ideal merge conditions.
select_topk: 1.0β Keep all deltas. Let variance + sign consensus do the work. This is the FuseO1 setting, validated empirically on reasoning merges.- bfloat16 β Native precision of both parents, no conversion losses.
π Prompt format
Standard Mistral v7, same as both parents:
[SYSTEM_PROMPT]{system}[/SYSTEM_PROMPT][INST]{user}[/INST]{assistant}</s>
π― Trigger thinking
Prefill the assistant turn with a <think> block. The model will continue your prefill, close the tag, and drop into the narrative:
[INST]your message[/INST]<think>
{seed phrase}
π¨ Prefills that work
π Analytical
For character-driven scenes where you need the model to reason about motivations, values, and goals.
|
β¨ Creative
For when you want the model to surprise you. Narrative twists, unexpected beats, story craft.
|
|
π₯ Unhinged
No restraint, no pulling punches. The darkest fiction, visceral prose, raw character truth.
| |
Without thinking
Skip the prefill. It behaves close to pure Behemoth-X β standard RP, creative writing, whatever you'd use X for.
ποΈ Samplers
Start with Behemoth-X's recommended settings β the merge leans heavily on X's prose tuning.
For thinking mode, drop temperature to 0.6β0.8. The <think> block benefits from more deterministic reasoning; high temperature scrambles the structure.
| Setting | No thinking | With thinking |
|---|---|---|
| Temperature | 1.0 β 1.25 | 0.6 β 0.8 |
| Min-P | 0.05 | 0.05 |
| DRY | 0.8 / 1.75 / 4 | 0.8 / 1.75 / 4 |
| Smooth Sampling | Off | Off |
π Usage
vLLM
python -m vllm.entrypoints.openai.api_server \
--model tacodevs/Behemoth-X-R1-123B \
--dtype bfloat16 \
--tensor-parallel-size 4 \
--max-model-len 16384 \
--trust-remote-code
Single-GPU
Grab one of the quantized variants (coming soon):
- FP8 β ~123 GB, fits on 1Γ H200, near-lossless
- AWQ / GPTQ W4A16 β ~65 GB, fits on 1Γ H100, small quality tradeoff
π§± Lineage
Mistral-Large-Instruct-2411 (Mistral AI)
ββ Behemoth-X-123B-v2 (TheDrummer) β the voice
ββ Behemoth-R1-123B-v2 (TheDrummer) β the mind
ββ Behemoth-X-R1-123B β the merge
π Known behaviors
<think>triggers on prefill, not spontaneously. Inherited from R1. Seed the tag.- Thinking style is R1-derived β structured, character-aware, analytical. Not Opus-style floaty literary prose. If you want that, it's a follow-up fine-tune target.
- Prose voice is mostly X. Most generations are indistinguishable from pure X on writing quality.
- Long character cards work natively. No fine-tuning means no overfitting on context length. 4k+ token system prompts handled without degradation.
- NSFW-capable. Both parents are unrestricted; the merge preserves that.
π Credits
| TheDrummer For Behemoth-X and Behemoth-R1, the two best Mistral Large fine-tunes in the creative space. |
Mistral AI For the foundation both parents are built on. |
Arcee AI For mergekit and the SCE implementation. |
| FuseAI β for proving SCE preserves reasoning. | ||
π License
Inherited from base: Mistral Research License β non-commercial use only.
Merged with π by tacodevs
- Downloads last month
- 839
π Analytical
β¨ Creative