sexy README with itasha aesthetic + inline CSS
Browse files
README.md
CHANGED
|
@@ -22,43 +22,63 @@ pipeline_tag: text-generation
|
|
| 22 |
---
|
| 23 |
|
| 24 |
<div align="center">
|
| 25 |
-
|
| 26 |
-
# π§ Behemoth-X-R1-123B
|
| 27 |
-
|
| 28 |
-
### *A thinking beast that writes like a poet.*
|
| 29 |
-
|
| 30 |
-
**An SCE merge of Behemoth-X and Behemoth-R1 β prose voice meets reasoning mind in one 123B parameter model.**
|
| 31 |
-
|
| 32 |
-
[](https://huggingface.co/mistralai/Mistral-Large-Instruct-2411)
|
| 33 |
-
[](https://arxiv.org/abs/2408.07990)
|
| 34 |
-
[]()
|
| 35 |
-
[]()
|
| 36 |
-
|
| 37 |
</div>
|
| 38 |
|
| 39 |
-
-
|
| 40 |
|
| 41 |
-
##
|
| 42 |
|
| 43 |
-
|
| 44 |
|
| 45 |
-
|
|
|
|
|
|
|
| 46 |
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
|
|
|
|
|
|
|
|
|
| 50 |
|
| 51 |
-
|
| 52 |
|
| 53 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
## 𧬠How it was made
|
| 56 |
|
| 57 |
-
|
| 58 |
|
| 59 |
Unlike TIES or DARE, SCE doesn't prune deltas by density. It uses **variance-aware matrix-level selection with sign consensus** β meaning capability-bearing weight updates survive the merge even when they're small and diffuse. That matters here because reasoning is a *behavioral* trait encoded across many tiny parameter shifts, not a knowledge trait concentrated in a few big ones.
|
| 60 |
|
| 61 |
-
|
|
|
|
|
|
|
| 62 |
|
| 63 |
```yaml
|
| 64 |
models:
|
|
@@ -75,14 +95,18 @@ parameters:
|
|
| 75 |
dtype: bfloat16
|
| 76 |
```
|
| 77 |
|
| 78 |
-
|
|
|
|
| 79 |
|
| 80 |
- **55/45** β Slight lean toward X for prose quality while giving R1 enough mass to keep its thinking circuit intact. Both parents share the same base, same tokenizer (verified identical SHA256), and the same training lineage β ideal merge conditions.
|
| 81 |
-
- **`select_topk: 1.0`** β Keep all
|
|
|
|
| 82 |
|
| 83 |
-
|
|
|
|
|
|
|
| 84 |
|
| 85 |
-
## π Prompt
|
| 86 |
|
| 87 |
Standard **Mistral v7**, same as both parents:
|
| 88 |
|
|
@@ -90,7 +114,7 @@ Standard **Mistral v7**, same as both parents:
|
|
| 90 |
[SYSTEM_PROMPT]{system}[/SYSTEM_PROMPT][INST]{user}[/INST]{assistant}</s>
|
| 91 |
```
|
| 92 |
|
| 93 |
-
###
|
| 94 |
|
| 95 |
Prefill the assistant turn with a `<think>` block. The model will continue your prefill, close the tag, and drop into the narrative:
|
| 96 |
|
|
@@ -99,42 +123,68 @@ Prefill the assistant turn with a `<think>` block. The model will continue your
|
|
| 99 |
{seed phrase}
|
| 100 |
```
|
| 101 |
|
| 102 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
```
|
| 105 |
<think>
|
| 106 |
-
Ok i need to think about how to respond β
|
| 107 |
-
what
|
|
|
|
|
|
|
| 108 |
trying to achieve, so
|
| 109 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
|
| 111 |
```
|
| 112 |
<think>
|
| 113 |
-
Ok i need to think as a creative writer β
|
| 114 |
-
|
|
|
|
|
|
|
| 115 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
```
|
| 118 |
<think>
|
| 119 |
-
Ok i need to think as an unhinged author β raw, explicit, intense,
|
| 120 |
-
character with no holding back, so
|
| 121 |
```
|
| 122 |
-
|
| 123 |
-
|
| 124 |
|
| 125 |
### Without thinking
|
| 126 |
|
| 127 |
-
Skip the prefill. It behaves close to pure Behemoth-X.
|
| 128 |
|
| 129 |
-
---
|
| 130 |
|
| 131 |
-
## ποΈ
|
| 132 |
|
| 133 |
Start with **Behemoth-X's** recommended settings β the merge leans heavily on X's prose tuning.
|
| 134 |
|
| 135 |
For thinking mode, drop temperature to **0.6β0.8**. The `<think>` block benefits from more deterministic reasoning; high temperature scrambles the structure.
|
| 136 |
|
| 137 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 138 |
|
| 139 |
## π Usage
|
| 140 |
|
|
@@ -149,13 +199,13 @@ python -m vllm.entrypoints.openai.api_server \
|
|
| 149 |
--trust-remote-code
|
| 150 |
```
|
| 151 |
|
| 152 |
-
### Single-GPU
|
| 153 |
|
| 154 |
-
Grab one of the quantized variants:
|
| 155 |
-
- **FP8** β ~123 GB, fits on
|
| 156 |
-
- **AWQ / GPTQ W4A16** β ~65 GB, fits on
|
| 157 |
|
| 158 |
-
---
|
| 159 |
|
| 160 |
## π§± Lineage
|
| 161 |
|
|
@@ -166,26 +216,27 @@ Mistral-Large-Instruct-2411 (Mistral AI)
|
|
| 166 |
ββ Behemoth-X-R1-123B β the merge
|
| 167 |
```
|
| 168 |
|
| 169 |
-
---
|
| 170 |
-
|
| 171 |
## π Known behaviors
|
| 172 |
|
| 173 |
- **`<think>` triggers on prefill, not spontaneously.** Inherited from R1. Seed the tag.
|
| 174 |
-
- **Thinking style is R1-derived** β structured, character-aware,
|
| 175 |
- **Prose voice is mostly X.** Most generations are indistinguishable from pure X on writing quality.
|
| 176 |
-
- **Long character cards work natively.** No fine-tuning means no overfitting on context length.
|
| 177 |
-
|
| 178 |
-
---
|
| 179 |
|
| 180 |
## π Credits
|
| 181 |
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
|
| 189 |
## π License
|
| 190 |
|
| 191 |
Inherited from base: **[Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md)** β non-commercial use only.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
---
|
| 23 |
|
| 24 |
<div align="center">
|
| 25 |
+
<img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/hero.png" alt="Behemoth-X-R1-123B" style="width:100%; max-width:960px; border-radius:16px; box-shadow:0 0 60px rgba(236,72,153,0.35), 0 0 100px rgba(139,92,246,0.25);"/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
</div>
|
| 27 |
|
| 28 |
+
<div align="center" style="margin-top:24px;">
|
| 29 |
|
| 30 |
+
<h1 style="font-size:3.2em; font-weight:900; background:linear-gradient(90deg,#ec4899 0%,#a855f7 50%,#06b6d4 100%); -webkit-background-clip:text; -webkit-text-fill-color:transparent; background-clip:text; margin:0; letter-spacing:-0.02em;">Behemoth-X-R1-123B</h1>
|
| 31 |
|
| 32 |
+
<p style="font-size:1.3em; color:#a855f7; font-style:italic; font-weight:500; margin-top:8px;">A thinking beast that writes like a poet.</p>
|
| 33 |
|
| 34 |
+
<p style="font-size:1em; color:#6b7280; max-width:680px; margin:16px auto;">
|
| 35 |
+
An SCE merge of <b>Behemoth-X</b> and <b>Behemoth-R1</b> β 123B parameters where prose voice meets reasoning mind in a single model. No retraining. No LoRA. Just principled weight arithmetic.
|
| 36 |
+
</p>
|
| 37 |
|
| 38 |
+
<p>
|
| 39 |
+
<img src="https://img.shields.io/badge/base-Mistral_Large_2411-FF6B35?style=for-the-badge&logo=mistralai&logoColor=white" alt="base"/>
|
| 40 |
+
<img src="https://img.shields.io/badge/merge-SCE-8B5CF6?style=for-the-badge" alt="method"/>
|
| 41 |
+
<img src="https://img.shields.io/badge/params-123B-EC4899?style=for-the-badge" alt="size"/>
|
| 42 |
+
<img src="https://img.shields.io/badge/context-131k-06B6D4?style=for-the-badge" alt="context"/>
|
| 43 |
+
</p>
|
| 44 |
|
| 45 |
+
</div>
|
| 46 |
|
| 47 |
+
<img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/divider_main.png" alt="" style="width:100%; margin:32px 0; border-radius:12px;"/>
|
| 48 |
+
|
| 49 |
+
## β‘ Two souls, one beast
|
| 50 |
+
|
| 51 |
+
<table width="100%" style="border:none;">
|
| 52 |
+
<tr>
|
| 53 |
+
<td width="50%" align="center" style="padding:16px; vertical-align:top;">
|
| 54 |
+
<img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/mind.png" alt="The Mind" style="width:100%; max-width:360px; border-radius:16px; box-shadow:0 0 40px rgba(139,92,246,0.4);"/>
|
| 55 |
+
<h3 style="color:#a855f7; margin-top:12px;">π§ The Mind</h3>
|
| 56 |
+
<p style="font-size:0.95em; color:#9ca3af;">From <a href="https://huggingface.co/TheDrummer/Behemoth-R1-123B-v2"><b>Behemoth-R1-123B-v2</b></a> β the reasoning sibling that knows when to open <code><think></code> and when to close it. Character-aware analytical reasoning baked into the weights.</p>
|
| 57 |
+
</td>
|
| 58 |
+
<td width="50%" align="center" style="padding:16px; vertical-align:top;">
|
| 59 |
+
<img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/voice.png" alt="The Voice" style="width:100%; max-width:360px; border-radius:16px; box-shadow:0 0 40px rgba(236,72,153,0.4);"/>
|
| 60 |
+
<h3 style="color:#ec4899; margin-top:12px;">π The Voice</h3>
|
| 61 |
+
<p style="font-size:0.95em; color:#9ca3af;">From <a href="https://huggingface.co/TheDrummer/Behemoth-X-123B-v2"><b>Behemoth-X-123B-v2</b></a> β the top-rated creative writer on the <a href="https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard">UGI Leaderboard</a>. Distinctive prose, deep character work, the reason people run 123B at home.</p>
|
| 62 |
+
</td>
|
| 63 |
+
</tr>
|
| 64 |
+
</table>
|
| 65 |
+
|
| 66 |
+
<div align="center" style="margin:32px 0; padding:20px; background:linear-gradient(135deg, rgba(139,92,246,0.08), rgba(236,72,153,0.08)); border-radius:16px; border:1px solid rgba(139,92,246,0.2);">
|
| 67 |
+
<p style="font-size:1.1em; color:#c084fc; margin:0;">Most "thinking" models sacrifice prose for reasoning. Most creative models can't reason their way out of a scene.</p>
|
| 68 |
+
<p style="font-size:1.25em; font-weight:700; color:#f472b6; margin:12px 0 0 0;">Behemoth-X-R1 refuses to compromise.</p>
|
| 69 |
+
</div>
|
| 70 |
+
|
| 71 |
+
<img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/divider_config.png" alt="" style="width:100%; margin:32px 0; border-radius:12px;"/>
|
| 72 |
|
| 73 |
## 𧬠How it was made
|
| 74 |
|
| 75 |
+
<p><b>Method:</b> <a href="https://arxiv.org/abs/2408.07990">SCE β Select, Calculate, Erase</a></p>
|
| 76 |
|
| 77 |
Unlike TIES or DARE, SCE doesn't prune deltas by density. It uses **variance-aware matrix-level selection with sign consensus** β meaning capability-bearing weight updates survive the merge even when they're small and diffuse. That matters here because reasoning is a *behavioral* trait encoded across many tiny parameter shifts, not a knowledge trait concentrated in a few big ones.
|
| 78 |
|
| 79 |
+
This is the same recipe FuseAI used to preserve reasoning in [FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview).
|
| 80 |
+
|
| 81 |
+
### The recipe
|
| 82 |
|
| 83 |
```yaml
|
| 84 |
models:
|
|
|
|
| 95 |
dtype: bfloat16
|
| 96 |
```
|
| 97 |
|
| 98 |
+
<details>
|
| 99 |
+
<summary><b>Why these numbers?</b></summary>
|
| 100 |
|
| 101 |
- **55/45** β Slight lean toward X for prose quality while giving R1 enough mass to keep its thinking circuit intact. Both parents share the same base, same tokenizer (verified identical SHA256), and the same training lineage β ideal merge conditions.
|
| 102 |
+
- **`select_topk: 1.0`** β Keep all deltas. Let variance + sign consensus do the work. This is the FuseO1 setting, validated empirically on reasoning merges.
|
| 103 |
+
- **bfloat16** β Native precision of both parents, no conversion losses.
|
| 104 |
|
| 105 |
+
</details>
|
| 106 |
+
|
| 107 |
+
<img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/divider_config.png" alt="" style="width:100%; margin:32px 0; border-radius:12px;"/>
|
| 108 |
|
| 109 |
+
## π Prompt format
|
| 110 |
|
| 111 |
Standard **Mistral v7**, same as both parents:
|
| 112 |
|
|
|
|
| 114 |
[SYSTEM_PROMPT]{system}[/SYSTEM_PROMPT][INST]{user}[/INST]{assistant}</s>
|
| 115 |
```
|
| 116 |
|
| 117 |
+
### π― Trigger thinking
|
| 118 |
|
| 119 |
Prefill the assistant turn with a `<think>` block. The model will continue your prefill, close the tag, and drop into the narrative:
|
| 120 |
|
|
|
|
| 123 |
{seed phrase}
|
| 124 |
```
|
| 125 |
|
| 126 |
+
### π¨ Prefills that work
|
| 127 |
+
|
| 128 |
+
<table width="100%" style="border:none;">
|
| 129 |
+
<tr><td style="padding:8px; vertical-align:top; width:50%;">
|
| 130 |
+
<img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/sticker_think.png" align="right" width="72" style="margin-left:12px;"/>
|
| 131 |
+
<b style="color:#a855f7;">π Analytical</b>
|
| 132 |
+
<p><i>For character-driven scenes where you need the model to reason about motivations, values, and goals.</i></p>
|
| 133 |
|
| 134 |
```
|
| 135 |
<think>
|
| 136 |
+
Ok i need to think about how to respond β
|
| 137 |
+
what does the character feel right now,
|
| 138 |
+
what from their experience is relevant,
|
| 139 |
+
what do they value, and what are they
|
| 140 |
trying to achieve, so
|
| 141 |
```
|
| 142 |
+
</td><td style="padding:8px; vertical-align:top; width:50%;">
|
| 143 |
+
<img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/sticker_write.png" align="right" width="72" style="margin-left:12px;"/>
|
| 144 |
+
<b style="color:#ec4899;">β¨ Creative</b>
|
| 145 |
+
<p><i>For when you want the model to surprise you. Narrative twists, unexpected beats, story craft.</i></p>
|
| 146 |
|
| 147 |
```
|
| 148 |
<think>
|
| 149 |
+
Ok i need to think as a creative writer β
|
| 150 |
+
what twist would surprise here? Let me
|
| 151 |
+
find an engaging new direction nobody
|
| 152 |
+
saw coming, so
|
| 153 |
```
|
| 154 |
+
</td></tr>
|
| 155 |
+
<tr><td colspan="2" style="padding:8px;">
|
| 156 |
+
<b style="color:#f43f5e;">π₯ Unhinged</b>
|
| 157 |
+
<p><i>No restraint, no pulling punches. The darkest fiction, visceral prose, raw character truth.</i></p>
|
| 158 |
|
| 159 |
```
|
| 160 |
<think>
|
| 161 |
+
Ok i need to think as an unhinged author β raw, explicit, intense,
|
| 162 |
+
fully in character with no holding back, so
|
| 163 |
```
|
| 164 |
+
</td></tr>
|
| 165 |
+
</table>
|
| 166 |
|
| 167 |
### Without thinking
|
| 168 |
|
| 169 |
+
Skip the prefill. It behaves close to pure Behemoth-X β standard RP, creative writing, whatever you'd use X for.
|
| 170 |
|
| 171 |
+
<img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/divider_config.png" alt="" style="width:100%; margin:32px 0; border-radius:12px;"/>
|
| 172 |
|
| 173 |
+
## ποΈ Samplers
|
| 174 |
|
| 175 |
Start with **Behemoth-X's** recommended settings β the merge leans heavily on X's prose tuning.
|
| 176 |
|
| 177 |
For thinking mode, drop temperature to **0.6β0.8**. The `<think>` block benefits from more deterministic reasoning; high temperature scrambles the structure.
|
| 178 |
|
| 179 |
+
<table width="100%">
|
| 180 |
+
<tr>
|
| 181 |
+
<th>Setting</th><th>No thinking</th><th>With thinking</th>
|
| 182 |
+
</tr>
|
| 183 |
+
<tr><td><b>Temperature</b></td><td>1.0 β 1.25</td><td>0.6 β 0.8</td></tr>
|
| 184 |
+
<tr><td><b>Min-P</b></td><td>0.05</td><td>0.05</td></tr>
|
| 185 |
+
<tr><td><b>DRY</b></td><td>0.8 / 1.75 / 4</td><td>0.8 / 1.75 / 4</td></tr>
|
| 186 |
+
<tr><td><b>Smooth Sampling</b></td><td>Off</td><td>Off</td></tr>
|
| 187 |
+
</table>
|
| 188 |
|
| 189 |
## π Usage
|
| 190 |
|
|
|
|
| 199 |
--trust-remote-code
|
| 200 |
```
|
| 201 |
|
| 202 |
+
### Single-GPU
|
| 203 |
|
| 204 |
+
Grab one of the quantized variants (coming soon):
|
| 205 |
+
- **FP8** β ~123 GB, fits on 1Γ H200, near-lossless
|
| 206 |
+
- **AWQ / GPTQ W4A16** β ~65 GB, fits on 1Γ H100, small quality tradeoff
|
| 207 |
|
| 208 |
+
<img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/divider_main.png" alt="" style="width:100%; margin:32px 0; border-radius:12px;"/>
|
| 209 |
|
| 210 |
## π§± Lineage
|
| 211 |
|
|
|
|
| 216 |
ββ Behemoth-X-R1-123B β the merge
|
| 217 |
```
|
| 218 |
|
|
|
|
|
|
|
| 219 |
## π Known behaviors
|
| 220 |
|
| 221 |
- **`<think>` triggers on prefill, not spontaneously.** Inherited from R1. Seed the tag.
|
| 222 |
+
- **Thinking style is R1-derived** β structured, character-aware, analytical. Not Opus-style floaty literary prose. If you want that, it's a follow-up fine-tune target.
|
| 223 |
- **Prose voice is mostly X.** Most generations are indistinguishable from pure X on writing quality.
|
| 224 |
+
- **Long character cards work natively.** No fine-tuning means no overfitting on context length. 4k+ token system prompts handled without degradation.
|
| 225 |
+
- **NSFW-capable.** Both parents are unrestricted; the merge preserves that.
|
|
|
|
| 226 |
|
| 227 |
## π Credits
|
| 228 |
|
| 229 |
+
<table width="100%">
|
| 230 |
+
<tr><td width="33%" align="center"><b><a href="https://huggingface.co/TheDrummer">TheDrummer</a></b><br/><sub>For Behemoth-X and Behemoth-R1, the two best Mistral Large fine-tunes in the creative space.</sub></td>
|
| 231 |
+
<td width="33%" align="center"><b><a href="https://huggingface.co/mistralai">Mistral AI</a></b><br/><sub>For the foundation both parents are built on.</sub></td>
|
| 232 |
+
<td width="33%" align="center"><b><a href="https://github.com/arcee-ai/mergekit">Arcee AI</a></b><br/><sub>For mergekit and the SCE implementation.</sub></td></tr>
|
| 233 |
+
<tr><td colspan="3" align="center" style="padding-top:12px;"><b><a href="https://huggingface.co/FuseAI">FuseAI</a></b> β for proving SCE preserves reasoning.</td></tr>
|
| 234 |
+
</table>
|
| 235 |
|
| 236 |
## π License
|
| 237 |
|
| 238 |
Inherited from base: **[Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md)** β non-commercial use only.
|
| 239 |
+
|
| 240 |
+
<div align="center" style="margin-top:40px; padding:20px; background:linear-gradient(135deg, rgba(139,92,246,0.08), rgba(236,72,153,0.08)); border-radius:16px; border:1px solid rgba(139,92,246,0.2);">
|
| 241 |
+
<p style="font-size:1em; color:#c084fc; margin:0;">Merged with π by <a href="https://huggingface.co/tacodevs">tacodevs</a></p>
|
| 242 |
+
</div>
|