Not-For-All-Audiences

This is an attempt to create a MoE model based on the Mistral Nemo 12B.

NeMoE is a 74B MoE model, primarily focused on roleplay and therefore uses prominent models as Experts.

Due to hardware limitations, I was unable to test the model to its full potential. The best I could run was a GGUF Q4 version. Although it ran slowly on my computer, the test I performed provided a good, coherent, and consistent narrative.

NeMoE uses 8 random experts with 2 active experts per token. The model was created using mergekit-moe with the following recipe:

base_model: natong19/Mistral-Nemo-Instruct-2407-abliterated
gate_mode: random
dtype: bfloat16
experts_per_token: 2
experts:
  - source_model: TroyDoesAI/CreativeWriter-Personality-12B
  - source_model: Retreatcost/KansenSakura-Erosion-RP-12b
  - source_model: TheDrummer/Rivermind-12B-v1
  - source_model: mpasila/Mistral-freeLiPPA-12B
  - source_model: anthracite-org/magnum-v2.5-12b-kto
  - source_model: allura-org/Bigger-Body-12b
  - source_model: NeverSleep/Lumimaid-v0.2-12B
  - source_model: PocketDoc/Dans-PersonalityEngine-V1.3.0-12b

The model is in the original state of the models used. No further training has been performed.

Downloads last month: 8

Safetensors

Model size

74B params

Tensor type

BF16

Model tree for dinhosms/NeMoE-8x12B-Random-Experts

Base model

natong19/Mistral-Nemo-Instruct-2407-abliterated

Finetuned

(2)

this model