tacodevs commited on
Commit
7933192
Β·
verified Β·
1 Parent(s): 8e57427

sexy README with itasha aesthetic + inline CSS

Browse files
Files changed (1) hide show
  1. README.md +109 -58
README.md CHANGED
@@ -22,43 +22,63 @@ pipeline_tag: text-generation
22
  ---
23
 
24
  <div align="center">
25
-
26
- # 🧠 Behemoth-X-R1-123B
27
-
28
- ### *A thinking beast that writes like a poet.*
29
-
30
- **An SCE merge of Behemoth-X and Behemoth-R1 β€” prose voice meets reasoning mind in one 123B parameter model.**
31
-
32
- [![Base](https://img.shields.io/badge/base-Mistral_Large_2411-orange)](https://huggingface.co/mistralai/Mistral-Large-Instruct-2411)
33
- [![Method](https://img.shields.io/badge/merge-SCE-purple)](https://arxiv.org/abs/2408.07990)
34
- [![Size](https://img.shields.io/badge/params-123B-red)]()
35
- [![Context](https://img.shields.io/badge/ctx-131k-blue)]()
36
-
37
  </div>
38
 
39
- ---
40
 
41
- ## ⚑ What makes this different
42
 
43
- Most "thinking" models sacrifice prose for reasoning. Most creative models can't think their way out of a scene. **Behemoth-X-R1 doesn't compromise** β€” it carries the distinctive voice and character depth of Behemoth-X into a model that can open a `<think>` tag and actually use it.
44
 
45
- No LoRA. No retraining. Just **principled weight arithmetic** using the same SCE merge recipe that FuseAI used to preserve reasoning in [FuseO1](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview).
 
 
46
 
47
- **The parents:**
48
- - **[Behemoth-X-123B-v2](https://huggingface.co/TheDrummer/Behemoth-X-123B-v2)** β€” the top-rated creative writer on the [UGI Leaderboard](https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard). Character voice, prose density, the reason people run 123B at home.
49
- - **[Behemoth-R1-123B-v2](https://huggingface.co/TheDrummer/Behemoth-R1-123B-v2)** β€” Behemoth's reasoning sibling. Knows when to open `<think>`, knows when to close it.
 
 
 
50
 
51
- **The child:** both, at once.
52
 
53
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
  ## 🧬 How it was made
56
 
57
- **Method:** [SCE β€” Select, Calculate, Erase](https://arxiv.org/abs/2408.07990)
58
 
59
  Unlike TIES or DARE, SCE doesn't prune deltas by density. It uses **variance-aware matrix-level selection with sign consensus** β€” meaning capability-bearing weight updates survive the merge even when they're small and diffuse. That matters here because reasoning is a *behavioral* trait encoded across many tiny parameter shifts, not a knowledge trait concentrated in a few big ones.
60
 
61
- **The recipe:**
 
 
62
 
63
  ```yaml
64
  models:
@@ -75,14 +95,18 @@ parameters:
75
  dtype: bfloat16
76
  ```
77
 
78
- **Why these numbers?**
 
79
 
80
  - **55/45** β€” Slight lean toward X for prose quality while giving R1 enough mass to keep its thinking circuit intact. Both parents share the same base, same tokenizer (verified identical SHA256), and the same training lineage β€” ideal merge conditions.
81
- - **`select_topk: 1.0`** β€” Keep all the deltas. Let variance + sign consensus do the work. This is the FuseO1 setting, validated empirically on reasoning merges.
 
82
 
83
- ---
 
 
84
 
85
- ## πŸ“œ Prompt Format
86
 
87
  Standard **Mistral v7**, same as both parents:
88
 
@@ -90,7 +114,7 @@ Standard **Mistral v7**, same as both parents:
90
  [SYSTEM_PROMPT]{system}[/SYSTEM_PROMPT][INST]{user}[/INST]{assistant}</s>
91
  ```
92
 
93
- ### To trigger thinking
94
 
95
  Prefill the assistant turn with a `<think>` block. The model will continue your prefill, close the tag, and drop into the narrative:
96
 
@@ -99,42 +123,68 @@ Prefill the assistant turn with a `<think>` block. The model will continue your
99
  {seed phrase}
100
  ```
101
 
102
- **Prefill examples that work well:**
 
 
 
 
 
 
103
 
104
  ```
105
  <think>
106
- Ok i need to think about how to respond β€” what does the character feel right now,
107
- what from their experience is relevant, what do they value, and what are they
 
 
108
  trying to achieve, so
109
  ```
 
 
 
 
110
 
111
  ```
112
  <think>
113
- Ok i need to think as a creative writer β€” what twist would surprise here?
114
- Let me find an engaging new direction nobody saw coming, so
 
 
115
  ```
 
 
 
 
116
 
117
  ```
118
  <think>
119
- Ok i need to think as an unhinged author β€” raw, explicit, intense, fully in
120
- character with no holding back, so
121
  ```
122
-
123
- The model inherits R1's thinking circuit but shares R1's preference for being prefilled rather than self-triggering. Seed the tag, let it cook.
124
 
125
  ### Without thinking
126
 
127
- Skip the prefill. It behaves close to pure Behemoth-X.
128
 
129
- ---
130
 
131
- ## 🎚️ Recommended Samplers
132
 
133
  Start with **Behemoth-X's** recommended settings β€” the merge leans heavily on X's prose tuning.
134
 
135
  For thinking mode, drop temperature to **0.6–0.8**. The `<think>` block benefits from more deterministic reasoning; high temperature scrambles the structure.
136
 
137
- ---
 
 
 
 
 
 
 
 
138
 
139
  ## πŸš€ Usage
140
 
@@ -149,13 +199,13 @@ python -m vllm.entrypoints.openai.api_server \
149
  --trust-remote-code
150
  ```
151
 
152
- ### Single-GPU inference
153
 
154
- Grab one of the quantized variants:
155
- - **FP8** β€” ~123 GB, fits on 1x H200, near-lossless quality
156
- - **AWQ / GPTQ W4A16** β€” ~65 GB, fits on 1x H100, slight quality tradeoff
157
 
158
- ---
159
 
160
  ## 🧱 Lineage
161
 
@@ -166,26 +216,27 @@ Mistral-Large-Instruct-2411 (Mistral AI)
166
  └─ Behemoth-X-R1-123B ← the merge
167
  ```
168
 
169
- ---
170
-
171
  ## πŸ” Known behaviors
172
 
173
  - **`<think>` triggers on prefill, not spontaneously.** Inherited from R1. Seed the tag.
174
- - **Thinking style is R1-derived** β€” structured, character-aware, useful but not floaty. If you want Opus-style literary pre-writing, that's a follow-up fine-tune target, not something this merge gives you for free.
175
  - **Prose voice is mostly X.** Most generations are indistinguishable from pure X on writing quality.
176
- - **Long character cards work natively.** No fine-tuning means no overfitting on context length.
177
-
178
- ---
179
 
180
  ## πŸ™ Credits
181
 
182
- - **[TheDrummer](https://huggingface.co/TheDrummer)** β€” for Behemoth-X and Behemoth-R1, the two best Mistral Large fine-tunes in the creative space.
183
- - **[Mistral AI](https://huggingface.co/mistralai)** β€” for the foundation both parents are built on.
184
- - **[Arcee AI](https://github.com/arcee-ai/mergekit)** β€” for mergekit and the SCE implementation.
185
- - **[FuseAI](https://huggingface.co/FuseAI)** β€” for proving SCE preserves reasoning.
186
-
187
- ---
188
 
189
  ## πŸ“„ License
190
 
191
  Inherited from base: **[Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md)** β€” non-commercial use only.
 
 
 
 
 
22
  ---
23
 
24
  <div align="center">
25
+ <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/hero.png" alt="Behemoth-X-R1-123B" style="width:100%; max-width:960px; border-radius:16px; box-shadow:0 0 60px rgba(236,72,153,0.35), 0 0 100px rgba(139,92,246,0.25);"/>
 
 
 
 
 
 
 
 
 
 
 
26
  </div>
27
 
28
+ <div align="center" style="margin-top:24px;">
29
 
30
+ <h1 style="font-size:3.2em; font-weight:900; background:linear-gradient(90deg,#ec4899 0%,#a855f7 50%,#06b6d4 100%); -webkit-background-clip:text; -webkit-text-fill-color:transparent; background-clip:text; margin:0; letter-spacing:-0.02em;">Behemoth-X-R1-123B</h1>
31
 
32
+ <p style="font-size:1.3em; color:#a855f7; font-style:italic; font-weight:500; margin-top:8px;">A thinking beast that writes like a poet.</p>
33
 
34
+ <p style="font-size:1em; color:#6b7280; max-width:680px; margin:16px auto;">
35
+ An SCE merge of <b>Behemoth-X</b> and <b>Behemoth-R1</b> β€” 123B parameters where prose voice meets reasoning mind in a single model. No retraining. No LoRA. Just principled weight arithmetic.
36
+ </p>
37
 
38
+ <p>
39
+ <img src="https://img.shields.io/badge/base-Mistral_Large_2411-FF6B35?style=for-the-badge&logo=mistralai&logoColor=white" alt="base"/>
40
+ <img src="https://img.shields.io/badge/merge-SCE-8B5CF6?style=for-the-badge" alt="method"/>
41
+ <img src="https://img.shields.io/badge/params-123B-EC4899?style=for-the-badge" alt="size"/>
42
+ <img src="https://img.shields.io/badge/context-131k-06B6D4?style=for-the-badge" alt="context"/>
43
+ </p>
44
 
45
+ </div>
46
 
47
+ <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/divider_main.png" alt="" style="width:100%; margin:32px 0; border-radius:12px;"/>
48
+
49
+ ## ⚑ Two souls, one beast
50
+
51
+ <table width="100%" style="border:none;">
52
+ <tr>
53
+ <td width="50%" align="center" style="padding:16px; vertical-align:top;">
54
+ <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/mind.png" alt="The Mind" style="width:100%; max-width:360px; border-radius:16px; box-shadow:0 0 40px rgba(139,92,246,0.4);"/>
55
+ <h3 style="color:#a855f7; margin-top:12px;">🧠 The Mind</h3>
56
+ <p style="font-size:0.95em; color:#9ca3af;">From <a href="https://huggingface.co/TheDrummer/Behemoth-R1-123B-v2"><b>Behemoth-R1-123B-v2</b></a> β€” the reasoning sibling that knows when to open <code>&lt;think&gt;</code> and when to close it. Character-aware analytical reasoning baked into the weights.</p>
57
+ </td>
58
+ <td width="50%" align="center" style="padding:16px; vertical-align:top;">
59
+ <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/voice.png" alt="The Voice" style="width:100%; max-width:360px; border-radius:16px; box-shadow:0 0 40px rgba(236,72,153,0.4);"/>
60
+ <h3 style="color:#ec4899; margin-top:12px;">🎭 The Voice</h3>
61
+ <p style="font-size:0.95em; color:#9ca3af;">From <a href="https://huggingface.co/TheDrummer/Behemoth-X-123B-v2"><b>Behemoth-X-123B-v2</b></a> β€” the top-rated creative writer on the <a href="https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard">UGI Leaderboard</a>. Distinctive prose, deep character work, the reason people run 123B at home.</p>
62
+ </td>
63
+ </tr>
64
+ </table>
65
+
66
+ <div align="center" style="margin:32px 0; padding:20px; background:linear-gradient(135deg, rgba(139,92,246,0.08), rgba(236,72,153,0.08)); border-radius:16px; border:1px solid rgba(139,92,246,0.2);">
67
+ <p style="font-size:1.1em; color:#c084fc; margin:0;">Most "thinking" models sacrifice prose for reasoning. Most creative models can't reason their way out of a scene.</p>
68
+ <p style="font-size:1.25em; font-weight:700; color:#f472b6; margin:12px 0 0 0;">Behemoth-X-R1 refuses to compromise.</p>
69
+ </div>
70
+
71
+ <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/divider_config.png" alt="" style="width:100%; margin:32px 0; border-radius:12px;"/>
72
 
73
  ## 🧬 How it was made
74
 
75
+ <p><b>Method:</b> <a href="https://arxiv.org/abs/2408.07990">SCE β€” Select, Calculate, Erase</a></p>
76
 
77
  Unlike TIES or DARE, SCE doesn't prune deltas by density. It uses **variance-aware matrix-level selection with sign consensus** β€” meaning capability-bearing weight updates survive the merge even when they're small and diffuse. That matters here because reasoning is a *behavioral* trait encoded across many tiny parameter shifts, not a knowledge trait concentrated in a few big ones.
78
 
79
+ This is the same recipe FuseAI used to preserve reasoning in [FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview).
80
+
81
+ ### The recipe
82
 
83
  ```yaml
84
  models:
 
95
  dtype: bfloat16
96
  ```
97
 
98
+ <details>
99
+ <summary><b>Why these numbers?</b></summary>
100
 
101
  - **55/45** β€” Slight lean toward X for prose quality while giving R1 enough mass to keep its thinking circuit intact. Both parents share the same base, same tokenizer (verified identical SHA256), and the same training lineage β€” ideal merge conditions.
102
+ - **`select_topk: 1.0`** β€” Keep all deltas. Let variance + sign consensus do the work. This is the FuseO1 setting, validated empirically on reasoning merges.
103
+ - **bfloat16** β€” Native precision of both parents, no conversion losses.
104
 
105
+ </details>
106
+
107
+ <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/divider_config.png" alt="" style="width:100%; margin:32px 0; border-radius:12px;"/>
108
 
109
+ ## πŸ“œ Prompt format
110
 
111
  Standard **Mistral v7**, same as both parents:
112
 
 
114
  [SYSTEM_PROMPT]{system}[/SYSTEM_PROMPT][INST]{user}[/INST]{assistant}</s>
115
  ```
116
 
117
+ ### 🎯 Trigger thinking
118
 
119
  Prefill the assistant turn with a `<think>` block. The model will continue your prefill, close the tag, and drop into the narrative:
120
 
 
123
  {seed phrase}
124
  ```
125
 
126
+ ### 🎨 Prefills that work
127
+
128
+ <table width="100%" style="border:none;">
129
+ <tr><td style="padding:8px; vertical-align:top; width:50%;">
130
+ <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/sticker_think.png" align="right" width="72" style="margin-left:12px;"/>
131
+ <b style="color:#a855f7;">πŸ” Analytical</b>
132
+ <p><i>For character-driven scenes where you need the model to reason about motivations, values, and goals.</i></p>
133
 
134
  ```
135
  <think>
136
+ Ok i need to think about how to respond β€”
137
+ what does the character feel right now,
138
+ what from their experience is relevant,
139
+ what do they value, and what are they
140
  trying to achieve, so
141
  ```
142
+ </td><td style="padding:8px; vertical-align:top; width:50%;">
143
+ <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/sticker_write.png" align="right" width="72" style="margin-left:12px;"/>
144
+ <b style="color:#ec4899;">✨ Creative</b>
145
+ <p><i>For when you want the model to surprise you. Narrative twists, unexpected beats, story craft.</i></p>
146
 
147
  ```
148
  <think>
149
+ Ok i need to think as a creative writer β€”
150
+ what twist would surprise here? Let me
151
+ find an engaging new direction nobody
152
+ saw coming, so
153
  ```
154
+ </td></tr>
155
+ <tr><td colspan="2" style="padding:8px;">
156
+ <b style="color:#f43f5e;">πŸ”₯ Unhinged</b>
157
+ <p><i>No restraint, no pulling punches. The darkest fiction, visceral prose, raw character truth.</i></p>
158
 
159
  ```
160
  <think>
161
+ Ok i need to think as an unhinged author β€” raw, explicit, intense,
162
+ fully in character with no holding back, so
163
  ```
164
+ </td></tr>
165
+ </table>
166
 
167
  ### Without thinking
168
 
169
+ Skip the prefill. It behaves close to pure Behemoth-X β€” standard RP, creative writing, whatever you'd use X for.
170
 
171
+ <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/divider_config.png" alt="" style="width:100%; margin:32px 0; border-radius:12px;"/>
172
 
173
+ ## 🎚️ Samplers
174
 
175
  Start with **Behemoth-X's** recommended settings β€” the merge leans heavily on X's prose tuning.
176
 
177
  For thinking mode, drop temperature to **0.6–0.8**. The `<think>` block benefits from more deterministic reasoning; high temperature scrambles the structure.
178
 
179
+ <table width="100%">
180
+ <tr>
181
+ <th>Setting</th><th>No thinking</th><th>With thinking</th>
182
+ </tr>
183
+ <tr><td><b>Temperature</b></td><td>1.0 – 1.25</td><td>0.6 – 0.8</td></tr>
184
+ <tr><td><b>Min-P</b></td><td>0.05</td><td>0.05</td></tr>
185
+ <tr><td><b>DRY</b></td><td>0.8 / 1.75 / 4</td><td>0.8 / 1.75 / 4</td></tr>
186
+ <tr><td><b>Smooth Sampling</b></td><td>Off</td><td>Off</td></tr>
187
+ </table>
188
 
189
  ## πŸš€ Usage
190
 
 
199
  --trust-remote-code
200
  ```
201
 
202
+ ### Single-GPU
203
 
204
+ Grab one of the quantized variants (coming soon):
205
+ - **FP8** β€” ~123 GB, fits on 1Γ— H200, near-lossless
206
+ - **AWQ / GPTQ W4A16** β€” ~65 GB, fits on 1Γ— H100, small quality tradeoff
207
 
208
+ <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/divider_main.png" alt="" style="width:100%; margin:32px 0; border-radius:12px;"/>
209
 
210
  ## 🧱 Lineage
211
 
 
216
  └─ Behemoth-X-R1-123B ← the merge
217
  ```
218
 
 
 
219
  ## πŸ” Known behaviors
220
 
221
  - **`<think>` triggers on prefill, not spontaneously.** Inherited from R1. Seed the tag.
222
+ - **Thinking style is R1-derived** β€” structured, character-aware, analytical. Not Opus-style floaty literary prose. If you want that, it's a follow-up fine-tune target.
223
  - **Prose voice is mostly X.** Most generations are indistinguishable from pure X on writing quality.
224
+ - **Long character cards work natively.** No fine-tuning means no overfitting on context length. 4k+ token system prompts handled without degradation.
225
+ - **NSFW-capable.** Both parents are unrestricted; the merge preserves that.
 
226
 
227
  ## πŸ™ Credits
228
 
229
+ <table width="100%">
230
+ <tr><td width="33%" align="center"><b><a href="https://huggingface.co/TheDrummer">TheDrummer</a></b><br/><sub>For Behemoth-X and Behemoth-R1, the two best Mistral Large fine-tunes in the creative space.</sub></td>
231
+ <td width="33%" align="center"><b><a href="https://huggingface.co/mistralai">Mistral AI</a></b><br/><sub>For the foundation both parents are built on.</sub></td>
232
+ <td width="33%" align="center"><b><a href="https://github.com/arcee-ai/mergekit">Arcee AI</a></b><br/><sub>For mergekit and the SCE implementation.</sub></td></tr>
233
+ <tr><td colspan="3" align="center" style="padding-top:12px;"><b><a href="https://huggingface.co/FuseAI">FuseAI</a></b> β€” for proving SCE preserves reasoning.</td></tr>
234
+ </table>
235
 
236
  ## πŸ“„ License
237
 
238
  Inherited from base: **[Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md)** β€” non-commercial use only.
239
+
240
+ <div align="center" style="margin-top:40px; padding:20px; background:linear-gradient(135deg, rgba(139,92,246,0.08), rgba(236,72,153,0.08)); border-radius:16px; border:1px solid rgba(139,92,246,0.2);">
241
+ <p style="font-size:1em; color:#c084fc; margin:0;">Merged with πŸ’œ by <a href="https://huggingface.co/tacodevs">tacodevs</a></p>
242
+ </div>