πŸ§ͺ FrankenMoE β€” Proof of Concept (NOT production)

This is a technical experiment, not a useful model.

⚠️ Important Warning

This repository documents a proof-of-concept MoE pipeline. The model quality is NOT good β€” it produces incoherent / random outputs because:

  1. The router uses (no training)
  2. Experts were fine-tuned with only ~5K samples each
  3. Base model is only Qwen2.5-1.5B-Instruct

Do NOT use this model for anything serious. It exists purely to demonstrate that the FrankenMoE pipeline can be built end-to-end.

What We Actually Built

A working MoE pipeline from dense LoRA experts β†’ GGUF:

Qwen2.5-1.5B-Instruct (base)
  β”œβ”€β”€ Expert 0: Coding (LoRA fine-tuned)
  β”œβ”€β”€ Expert 1: Math (LoRA fine-tuned)  
  └── Shared Expert: Base model

Key Technical Discoveries

Discovery Detail
mergekit 0.1.4 bug param incompatible with transformers >= 4.40 β€” must patch
QwenMoE requirements Exactly 1 shared expert + 2^n routed experts (2, 4, 8)
Tied embeddings fix Qwen2.5 uses tied embeddings β†’ must clone β†’ before GGUF conversion, set
LoRA must be merged Adapters must be before MoE assembly

Repository Structure

πŸ“¦ frankenmoe_moe_v2-F16.gguf  β€” MoE GGUF (fixed, has output.weight)
πŸ“ moe_full/                   β€” Full safetensors model
πŸ“ coding/ math/ chat/         β€” Individual dense experts (LoRA + GGUF)
πŸ“„ FrankenMoE_Academic_Paper.pdf β€” Research paper
🐍 simple_router.py            β€” Keyword-based router (functional alternative)

Quick Test

wget https://huggingface.co/hotdogs/frankenmoe/resolve/main/frankenmoe_moe_v2-F16.gguf
llama-cli -m frankenmoe_moe_v2-F16.gguf -p "Write a Python function"
# Output: Random/incoherent β€” this is expected! See warning above.

Future: Real Model

The pipeline will be re-run with:

  • Larger base model (Qwen2.5-7B/14B)
  • Trained router (classification loss)
  • More training data per domain
  • 4 experts for proper 2^n routing

Stay tuned β€” the real model is coming.


Built by UKA πŸ‡ΉπŸ‡­ | May 2026

Downloads last month
174
GGUF
Model size
4B params
Architecture
qwen2moe
Hardware compatibility
Log In to add your hardware

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support