tacodevs commited on Apr 6

Commit

bbdd938

verified ·

1 Parent(s): 3127199

Initial upload: SCE merge of Behemoth-X-v2 and Behemoth-R1-v2

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

README.md +150 -0
config.json +26 -0
mergekit_config.yml +12 -0
model-00001-of-00051.safetensors +3 -0
model-00002-of-00051.safetensors +3 -0
model-00003-of-00051.safetensors +3 -0
model-00004-of-00051.safetensors +3 -0
model-00005-of-00051.safetensors +3 -0
model-00006-of-00051.safetensors +3 -0
model-00007-of-00051.safetensors +3 -0
model-00008-of-00051.safetensors +3 -0
model-00009-of-00051.safetensors +3 -0
model-00010-of-00051.safetensors +3 -0
model-00011-of-00051.safetensors +3 -0
model-00012-of-00051.safetensors +3 -0
model-00013-of-00051.safetensors +3 -0
model-00014-of-00051.safetensors +3 -0
model-00015-of-00051.safetensors +3 -0
model-00016-of-00051.safetensors +3 -0
model-00017-of-00051.safetensors +3 -0
model-00018-of-00051.safetensors +3 -0
model-00019-of-00051.safetensors +3 -0
model-00020-of-00051.safetensors +3 -0
model-00021-of-00051.safetensors +3 -0
model-00022-of-00051.safetensors +3 -0
model-00023-of-00051.safetensors +3 -0
model-00024-of-00051.safetensors +3 -0
model-00025-of-00051.safetensors +3 -0
model-00026-of-00051.safetensors +3 -0
model-00027-of-00051.safetensors +3 -0
model-00028-of-00051.safetensors +3 -0
model-00029-of-00051.safetensors +3 -0
model-00030-of-00051.safetensors +3 -0
model-00031-of-00051.safetensors +3 -0
model-00032-of-00051.safetensors +3 -0
model-00033-of-00051.safetensors +3 -0
model-00034-of-00051.safetensors +3 -0
model-00035-of-00051.safetensors +3 -0
model-00036-of-00051.safetensors +3 -0
model-00037-of-00051.safetensors +3 -0
model-00038-of-00051.safetensors +3 -0
model-00039-of-00051.safetensors +3 -0
model-00040-of-00051.safetensors +3 -0
model-00041-of-00051.safetensors +3 -0
model-00042-of-00051.safetensors +3 -0
model-00043-of-00051.safetensors +3 -0
model-00044-of-00051.safetensors +3 -0
model-00045-of-00051.safetensors +3 -0
model-00046-of-00051.safetensors +3 -0
model-00047-of-00051.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,150 @@

+---
+license: other
+license_name: mistral-research-license
+license_link: https://mistral.ai/licenses/MRL-0.1.md
+base_model:
+  - TheDrummer/Behemoth-X-123B-v2
+  - TheDrummer/Behemoth-R1-123B-v2
+base_model_relation: merge
+tags:
+  - mergekit
+  - merge
+  - sce
+  - mistral
+  - mistral-large
+  - thinking
+  - reasoning
+  - roleplay
+  - creative-writing
+language:
+  - en
+pipeline_tag: text-generation
+---
+<div align="center">
+# Behemoth-X-R1-123B
+### Behemoth-X's prose voice meets Behemoth-R1's thinking mind.
+*An SCE merge of TheDrummer's two flagship 123B Mistral Large fine-tunes.*
+</div>
+---
+## What is this?
+Behemoth-X-R1-123B is a 55/45 SCE merge of:
+- **[TheDrummer/Behemoth-X-123B-v2](https://huggingface.co/TheDrummer/Behemoth-X-123B-v2)** — the top-rated creative writing model on the [UGI Leaderboard](https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard), known for distinctive prose voice and deep character work.
+- **[TheDrummer/Behemoth-R1-123B-v2](https://huggingface.co/TheDrummer/Behemoth-R1-123B-v2)** — Behemoth-X's reasoning sibling, trained to emit structured `<think>` blocks before responding.
+The goal: a single model that writes like X and thinks like R1. No additional training, no LoRA — just principled weight arithmetic using the SCE merge method that FuseAI used to preserve reasoning in their [FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview).
+## How it was made
+**Method:** [SCE (Select, Calculate, Erase)](https://arxiv.org/abs/2408.07990) — a variance-aware merge that uses matrix-level selection and sign consensus to preserve capability-bearing deltas across input models. Unlike TIES, SCE does not prune by density, which tends to preserve fragile behavioral traits like structured thinking.
+**Config:**
+```yaml
+models:
+  - model: TheDrummer/Behemoth-X-123B-v2
+    parameters:
+      weight: 0.55
+  - model: TheDrummer/Behemoth-R1-123B-v2
+    parameters:
+      weight: 0.45
+merge_method: sce
+base_model: mistralai/Mistral-Large-Instruct-2411
+parameters:
+  select_topk: 1.0
+dtype: bfloat16
+```
+**Why 55/45?** Slight lean toward X for prose quality while giving R1 enough weight to carry its thinking behavior across. Both models share the same base (`mistralai/Mistral-Large-Instruct-2411`), the same tokenizer (verified identical SHA256), and the same training lineage — ideal conditions for a merge.
+**Why `select_topk: 1.0`?** Keep all deltas. Let SCE's variance + sign consensus do the selection, following the FuseO1 precedent. Reasoning behavior is encoded in many small parameter shifts — aggressive pruning (density < 0.8) tends to dilute it.
+## Prompt Format
+Uses Mistral v7 template (same as both parents):
+```
+[SYSTEM_PROMPT]{system_prompt}[/SYSTEM_PROMPT][INST]{user_message}[/INST]{assistant_response}</s>
+```
+### To trigger thinking
+Prefill the assistant turn with a `<think>` block. The model will continue the thinking, close the tag, and produce its response:
+```
+[INST]your message[/INST]<think>
+{optional seed phrase}
+```
+Example prefills from the [Telegai](https://telegai.com) edge function:
+```
+<think>
+Ok i need to think about how to respond — what does the character feel right now,
+what from their experience is relevant, what do they value, and what are they
+trying to achieve, so
+```
+```
+<think>
+Ok i need to think as a creative writer — what twist would surprise here?
+Let me find an engaging new direction nobody saw coming, so
+```
+The model reads the prefill, continues in the same stream-of-consciousness style, closes `</think>`, and writes the narrative.
+### Without thinking
+Skip the prefill and use it like any other Mistral-v7 model. It behaves close to pure Behemoth-X.
+## Recommended Samplers
+Start with Behemoth-X's recommended settings — the merge inherits most of X's prose tuning. Lower temperature (0.6-0.8) works better when thinking is enabled, since the thinking block benefits from more deterministic reasoning.
+## Usage with vLLM
+```bash
+python -m vllm.entrypoints.openai.api_server \
+  --model tacodevs/Behemoth-X-R1-123B \
+  --dtype bfloat16 \
+  --tensor-parallel-size 4 \
+  --max-model-len 16384 \
+  --trust-remote-code
+```
+For single-GPU inference, use one of the quantized variants (FP8 / AWQ / GPTQ) — see the collection.
+## Lineage
+```
+Mistral-Large-Instruct-2411 (123B, Mistral AI)
+  ├─ TheDrummer/Behemoth-X-123B-v2      (creative writing)
+  └─ TheDrummer/Behemoth-R1-123B-v2     (reasoning)
+       └─ tacodevs/Behemoth-X-R1-123B   (SCE merge, this model)
+```
+## Known Behaviors
+- **`<think>` block triggers on prefill.** The merge inherits R1's thinking circuit, but like R1 it doesn't reliably self-inject the tag — you need to prefill it.
+- **Thinking style is R1-derived.** Structured, bullet-ish, character-aware. Not the flowing pre-writing style of Opus or Grok. If you want literary author-planning thinking, that's a follow-up fine-tune target.
+- **Prose voice leans X.** The 55% X weight dominates prose style; most generations are indistinguishable from pure X on writing quality.
+- **Long character cards work.** Unlike `Behemoth-OpusX-123B` (our earlier LoRA experiment, which broke on 4k+ token system prompts), the merge handles long prompts natively since no new behavior was taught via fine-tuning.
+## Credits
+- **[TheDrummer](https://huggingface.co/TheDrummer)** — for Behemoth-X and Behemoth-R1, the two best Mistral Large fine-tunes in the creative/RP space.
+- **[Mistral AI](https://huggingface.co/mistralai)** — for Mistral-Large-Instruct-2411, the foundation both parents are built on.
+- **[Arcee AI / mergekit team](https://github.com/arcee-ai/mergekit)** — for the SCE implementation.
+- **[FuseAI](https://huggingface.co/FuseAI)** — for validating the SCE-reasoning-merge approach with FuseO1.
+- Merged by [tacodevs](https://huggingface.co/tacodevs) / [Telegai](https://telegai.com).
+## License
+Inherited from base model: **[Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md)** — non-commercial use only.

config.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "architectures": [
+    "MistralForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 1,
+  "dtype": "bfloat16",
+  "eos_token_id": 2,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 12288,
+  "initializer_range": 0.02,
+  "intermediate_size": 28672,
+  "max_position_embeddings": 131072,
+  "model_type": "mistral",
+  "num_attention_heads": 96,
+  "num_hidden_layers": 88,
+  "num_key_value_heads": 8,
+  "rms_norm_eps": 1e-05,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "transformers_version": "4.57.6",
+  "use_cache": true,
+  "vocab_size": 32768
+}

mergekit_config.yml ADDED Viewed

	@@ -0,0 +1,12 @@

+models:
+  - model: TheDrummer/Behemoth-X-123B-v2
+    parameters:
+      weight: 0.55
+  - model: TheDrummer/Behemoth-R1-123B-v2
+    parameters:
+      weight: 0.45
+merge_method: sce
+base_model: mistralai/Mistral-Large-Instruct-2411
+parameters:
+  select_topk: 1.0
+dtype: bfloat16

model-00001-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dc8b2a1d936b48378754c2eb746a9787c1ee77b41b8c6ca09a7345094f6b9f11
+size 4378928504

model-00002-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ea86b505819df539385b05102ecabc38c8956ffaff22acd35d9b6b2f22cec836
+size 4907411088

model-00003-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1822d89f412f08c908fc601925121eb805a18c523b6fe5517d24f6442618c13c
+size 4806747904

model-00004-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b892e802088b45ffd89a591544f88cd00eb14c14903bdc4688a0ca5209782090
+size 4831938544

model-00005-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a061cee47fa52ad6b94a2d1bdc6a4db7c24cd8a0d0860b4b078e49290eedf593
+size 4831938552

model-00006-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9ef10c2f82c56f2216162a071183bacec57d8de26ae4a30a970f3c98cd14c8dd
+size 4907411096

model-00007-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3f66ae4e2494dd2b0e01cdca2deb02b84a36de9e11ddf2521b59fd8a49065d8e
+size 4806747904

model-00008-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:68954fa20a69b7b96c0bb790e015b27edcbe9c5a4d1abf1e41aff079ce6c9cea
+size 4831938536

model-00009-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4aa10feb3d0bc7d7c3b50dbc33ad50cc634705b467be8658deb88eacac5883ac
+size 4831938552

model-00010-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9eb5c4e0629be77262710f37010c50ca35e20b309e3019574a862d4578d823ad
+size 4907411096

model-00011-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a2b00e2e18607f1f6ff7adb45de20c1b7fde090bd9c5bcf3edf97dfcec577e40
+size 4806747904

model-00012-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d160df5906e8fce5203800329affe5a8d59ba4309c05664728670742baa0ab5a
+size 4831938544

model-00013-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d529f74b137de1af73b8f5913b660b7849160e1a63e8796da2eddcf706f04ad6
+size 4831938552

model-00014-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:db7ae6c65ce957c3f0ab2df4d71c82fd36d73dc409f2ab75e8baf6c92953cf5f
+size 4907411088

model-00015-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cdbfe98458da9f0cbf5037f58aa76c00d0afad706bd08c2dbd06d755e47b9ac2
+size 4806747904

model-00016-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:118a2c9908016f935b2f7cc3fed83e0dd17d5b680749035856afb82e217886b2
+size 4831938544

model-00017-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4359df2e736603f90e2d4687f33c924a36a78787dd4729d87bc1599a9443246a
+size 4831938552

model-00018-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f57705bd14f5befdd93d8f1e1a3d0188a0942a77a8add68214bd67334780e1ea
+size 4907411096

model-00019-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0882ea4c4f3ce658cf256ef2a38834a6a7bc994b24c2ff9fd9cf631684078894
+size 4806747904

model-00020-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:47812241eec483a879cb2e511f2f50f533179403cb50359fb591302c2d7f4d0c
+size 4831938544

model-00021-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0a4f967846aace4fc9f32a50f4a28e4d2389b56614994a773226f32dc07ec29e
+size 4831938544

model-00022-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2df5dd3d36d882e3e22c17f9d670098123ae5fd29ca8020330cecf4ad5690c6a
+size 4907411096

model-00023-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7087dfce83b7b7df70324c818d4b7ad62e05feaecc751d2d3037d32db7f7109a
+size 4806747904

model-00024-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8eef997620d80acf35f2ea4349c267b60d916ebb1f3df674eba9d08304453f38
+size 4831938544

model-00025-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5482e5d09d5ff12805f65ca965ce6f0d38d2b56eab26dd5301308d5efad2194d
+size 4831938552

model-00026-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0483f5666739b95da6b462358a5341889ff6c4562c4b7d3be09175ccdb849278
+size 4907411096

model-00027-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cecd7f52f3854acd5bab208a73530c7cdd8fefabf4bdfe0eb8b65101ffb02129
+size 4806747896

model-00028-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66600a343dda728195c5d8c85af3d8ccb3215171138106ef6c148c512889821d
+size 4831938544

model-00029-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c8e9094709c1e02d90aa8a7902644d4379c2058ea0e57d3ddbb8a30b6f2a43b6
+size 4831938552

model-00030-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f1b25d71910cc7f1dfe453a1d6673c196c75b944337f7d68e98373d6934d5d64
+size 4907411096

model-00031-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cb69ddf8c689610de483347e0882ef60c4418e9e52705dbed5a802cf602d194c
+size 4806747904

model-00032-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5f2954711ab70ee738153c22bfba873d2fea6f1d8292c0a4d5af3431ba7d1652
+size 4831938544

model-00033-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c163e7e180ff05784158efb0a85a7a80e1d683d7e55ef4f3d6ac80bb6bcb68af
+size 4831938544

model-00034-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6f076c82df28267787b9dc87fee229b9750a2a47019ea7ad45edb46bfc2bdc20
+size 4907411096

model-00035-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0b751a72acc1aa1693e35772f0c596f0e8ad57be356d0212b0131a647a5f1941
+size 4806747904

model-00036-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f44b2b371e2718b5b09738e25666fa00d4b1dcd985717bfca01faeb48d38a83e
+size 4831938544

model-00037-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:05ac0fe9b5ebfb21841dbe70c3bf441f156d39d064530bb4ebffcc82ded216af
+size 4831938552

model-00038-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:899ef153b231dd71b546721aee77d17e4394a16bc2292b6b4f8865438ebebf9c
+size 4907411096

model-00039-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5faf18423505d4f132eb30981dc06021b0281b68f395fae0184ce5d6173b7145
+size 4806747904

model-00040-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6c253ba4daacada7309646cd8f7fa8e774504c919f8e73fa4d4e30bcc7bec41d
+size 4831938544

model-00041-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c4d8d97722182a750cbd0b2c638264129adc4fe8c9ca01205786da7d627b0791
+size 4831938552

model-00042-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:19aad80805ef49dadbb63bc360ae44afa2fea8794ff40f04dbcccdd1be9ed8de
+size 4907411096

model-00043-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:931616e8ba0a7bd3db50c188c2faa70f1adc09a9754e051ad8803dbc77309ee2
+size 4806747904

model-00044-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6a790b13362c357fa67c87c6dddb937e3c33287bb04105c51de694877af46415
+size 4831938544

model-00045-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:901386ca1fbd3fad2520f54af919f491de5fa752df1d34c10de933e591f294da
+size 4831938552

model-00046-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4e491b89cbb163fbbd302c7d1ebf1dc5d002a8d288c10ede3669fc65024e5c76
+size 4907411088

model-00047-of-00051.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2160134dd994d173886cf3496354c2aa4ab0a10fdeca1f1f74cb7791b973faa5
+size 4806747904