This is an attempt to create a MoE model based on the Mistral Nemo 12B.
NeMoE is a 74B MoE model, primarily focused on roleplay and therefore uses prominent models as Experts.
Due to hardware limitations, I was unable to test the model to its full potential. The best I could run was a GGUF Q4 version. Although it ran slowly on my computer, the test I performed provided a good, coherent, and consistent narrative.
NeMoE uses 8 random experts with 2 active experts per token. The model was created using mergekit-moe with the following recipe:
base_model: natong19/Mistral-Nemo-Instruct-2407-abliterated
gate_mode: random
dtype: bfloat16
experts_per_token: 2
experts:
- source_model: TroyDoesAI/CreativeWriter-Personality-12B
- source_model: Retreatcost/KansenSakura-Erosion-RP-12b
- source_model: TheDrummer/Rivermind-12B-v1
- source_model: mpasila/Mistral-freeLiPPA-12B
- source_model: anthracite-org/magnum-v2.5-12b-kto
- source_model: allura-org/Bigger-Body-12b
- source_model: NeverSleep/Lumimaid-v0.2-12B
- source_model: PocketDoc/Dans-PersonalityEngine-V1.3.0-12b
The model is in the original state of the models used. No further training has been performed.
- Downloads last month
- 8